> Why hasn't this been done by anyone yet? What would be the point? Java program...

bullen · on Dec 23, 2022

Because 1) you don't want GC pauses 2) you don't want GC code to bloat the VM 3) You don't want memory leaks 4) You don't want people to not know what they are allocating 5) Static allocation is enough to make anything.

6) int arrays do not have cache misses (but you might need to pad them to avoid cache invalidation) 7) parallel atomic multicore works on int arrays out of the box!

sorokod · on Dec 23, 2022

I assume that by "you" you mean "I"

From my perspective I am happy with the tradeoff of hanving gc pauses and not needing to manage memory manually.

kaba0 · on Dec 23, 2022

1) Java has the very best GCs out of anything to the point that I very much question anyone claiming to suffer from GC pauses on most workloads. Do you really have an application which suffers from that or is that just the usual “GC bad” mantra? Your average C program may spend just as much time trying to malloc a given block in its fragmented heap than that.

2) Why exactly? 3) that’s why you have a GC. But if I really want to understand what you mean (I assume you meant non-memory resources not being explicitly closed?), try-with-resources and cleaners solve the problem quite well.

4) Does it really matter if the last 2 decades were spent on optimizing allocation to the point where it is literally a pointer bump and like 3 basic, thread-local instructions? Deallocation can be amortized to be practically zero cost (moving GC), at RAM’s expense. Where it really really matters though, you could always do ByteBuffers, the new MemorySegment’s or just straight sun.misc.Unsafe pointer arithmetics. That string allocation will be escape analyzed and be stack allocated either way.

I don’t even know what do you mean by static allocation. You mean like in embedded, having fix sized arrays and exploding when the user enters a 32+1 letter text? I really don’t miss that. 6) value classes are coming and solving the issue. Though it begs the question, what’s the size of your average list? Also, how come it doesn’t matter for all the linked lists used extensively in C? Also, see point 4. 7) I don’t get what you mean here, do you mean not doing atomic instructions, because java don’t have out-of-thin-air values? There are rare programs that can get away with that, but I think Java has quite a great toolkit for synchronization primitives to build everything (most concurrency books use it for a reason).

bullen · on Dec 23, 2022

Yes, I mean having limits on everything! Every input has a max value. Players, messages count, message length, weapons they can hold etc. You can add more by recompiling, you should reuse them (for size but also to keep data contiguous) even if that is challenging under multicore utilization and you can add MUUUUUCH more than the CPU can handle to compute anyway so this is not an issue.

The first and last bottleneck of computers is and will always be RAM. Both speed, memory size and energy. A 256GB RAM stick uses 80W!!!! Latency is increasing since DDR3 (2007) and we have had caches to accommodate for slow RAM since 386 (1985) (3 always seems to be the last version, HL3 confirmed? >.<):

You need to cache align everything perfectly: 1) all data has to be in an array (or vector which is a managed array but i digress) 2) you need your types to be atomic so multiple cores can write to them at the same time without segmentation fault (int/float). 3) You need your groups/objects/structs to perfectly fill (padded) 64 bytes. Because then multiple cores cannot invalidate each others cache unless they are writing to the same struct.

So SoA vs. AoS never was an argument! AoS where structures are exactly 64 bytes is the only thing all programmers must do for eternity! This is the law of both X86 and ARM.

So an array of float Mat4x4 is perfect and I suspect that is where the 64 bytes came from. But here is another struct just as an example:

  struct Node {
    int mesh, skin;
    Vec3 spot, pace;
    Quat look, spin;
  };

kaba0 · on Dec 23, 2022

But things don’t fit into 64 bits all the time, and then you get tearing. This is observable and now you have to pay for “proper” synchronization. Also, apple’s m1’s reason for speed is pretty much bigger cache, so I don’t think it’s a good choice to go down this road.

Most applications have plenty of objects all around that are rarely used and are perfectly fine with being managed by the GC as is, and a tiny performance critical core where you might have to care a tiny bit about what gets allocated. This segment can be optimized other ways as well, without hurting the maintainability, speed of progress etc of the rest of the codebase.

bullen · on Dec 23, 2022

64 bytes, 512 bits.

Cachelines are 64 bytes on all modern hardware.

They will probably never change this value ever.

Everything fits into 64 bytes if you make the effort.

And if it doesn't you have to use two Arrays of 64 byte Structures and pad the last.

This is non negotiable and I'm completely baffled nobody has mentioned this yet.

I call this law: Ao64bS (did I invent my first law?) :D

kaba0 · on Dec 23, 2022

It may be a language barrier thingy, but then we are talking about different things. Also, that is architecture dependent.

bullen · on Dec 23, 2022

Nope on all modern CPUs (X86 and ARM) this is 64 bytes and has been since a looong time...

astrange · on Dec 23, 2022

Cache lines are 128 bytes on M1.

But since it’s AMP and not SMP, sharing work across cores doesn’t necessarily work how you expect it to.

bullen · on Dec 23, 2022

Can you ask the OS to give you a certain core type?

128 bytes is perfect 2 x 64! So even if the risk of cache invalidation goes up even if two cores are not writing to the exact same structure the alignment still works!

Good job Apple!

fisf · on Dec 23, 2022

There absolutely are modern systems that have e.g. 128 byte cache lines (M1).

pirocks · on Dec 24, 2022

> A 256GB RAM stick uses 80W!!!!

This seems very high to me, since most high capacity server sticks have no heatsink. Have you got a source?

lmm · on Dec 25, 2022

None of that answers my question. Why would you want your VM to be Java compatible if it can't run Java programs? What are you even going to run on this VM, given that the only mainstream application languages without automatic memory management at runtime are C++ and Rust, to the extent that the latter qualifies as mainstream?

bullen · on Dec 25, 2022

Javac is the point. To be able to code for my own engine without seg. faults. And to build something long term that improves slowly over time.

trimbo · on Dec 23, 2022

Why wouldn't you use C++?

bullen · on Dec 23, 2022

1) You cannot deploy C++ across different architectures/OSes without recompile.

2) VMs avoid crashes upon failures that cause a segmentation fault in native opcodes, with a VM you can keep the process from crashing AND get exact information where and how the problem occurred avoiding debug compiles with symbols and reproducing the error on your local computer.

Right now I use this with my C++ code to get somewhere near the feedback I get from Java (but it requires you to compile with debug): http://move.rupy.se/file/stack.txt

The question you really should ask is why are people using C++? Performance is only required in some parts of engines, to have a VM without GC on top should be default by now (50 years after C and 25 years after Java).

wtetzner · on Dec 24, 2022

Any reason not to do something like Rust + Wasm? Seems like it’d be a better fit.

I don’t have anything against Java, but it seems like you lose a lot of the benefits of using when you take away allocation.

bullen · on Dec 26, 2022

C + WASM maybe but then I'm pretty sure the compiler will not be as good as javac to tell me about things that are wrong.

Java + WASM is probably what I'll end up with, don't like the AoT step from a distance we'll see.

I'm looking at all Risc/stack op/byte-code, and it's disheartening in how many ways humans can copy the same thing differently.

C#, RISC-V, Java, WASM, ARM, 6502 ASM, uxn, lox the list goes on and on...

singularity2001 · on Dec 23, 2022

> Java programs assume GC

only users of long running programs assume GC, the program itself doesnt care.

marginalia_nu · on Dec 23, 2022

The programmer most likely assumes short-lived objects are "free" (as they effectively are in Java), and can be allocated in loops and so on without filling up memory and without incurring any real penalty to performance (via TLAB or possibly elision).