In other words, Java desperately needs value types so that this kind of simple abstraction needn't cause any overhead.
Certainly it would be much easier to just go back to the (x, y, z) convention than to change the Java language.
Article also highlights the challenges of using a GC language for high performance games. GC works against you most of the time and you're better off statically allocating as much as possible.
Having seen his work, and his tendency to reinvent the wheel (his original iteration of the Minecraft website used flat files for storage, which is really weird when you're writing Java and H2 is right there), I don't think he's a terribly good Java programmer. Gets stuff done, but doesn't work Smart, to use a Spolskyism. He happened to be right in this case. And that's cool, but throwing away best practices so contemptuously because they aren't applicable in corner cases is at best foolish.
Example, the String.Split function takes an array of chars to split on. So every time split is called, normally, a new heap alloc happens for no good reason. Functions should be able to provide some simple "pure" annotation and let callers stackalloc. In fact, the CLR already supports this - it's just super cumbersome to get at. The JVM supposedly can do this in JIT, but this post seems to indicate it's not effective.
Secondly, it seems there it often a unit of work where you could allocate from an arena, and then throw it all away together. I know this is more involved, as something could leak out by accident in code. But perhaps the arena could be a hint to the GC "if nothing points into this entire arena dump it in one chunk". Maybe GCs are too fast to benefit from this.
Really, it's s bit if what Rust will accomplish with the borrow checker. I know for a fact (measured) that adding a bit of such management to .NET would be a major boon in certain high allocation scenarios.
Edit: The evangelism around these languages doesn't help. There's a big push to leave it to the JIT, that the runtime knows best. But in truth, they seem to still have fairly suboptimal codegen. Even inlining is poorly handled. For some idiotic reason, they still JIT, and have to make a time/speed tradeoff. Even if it's an program that you're going to execute repeatedly, the installer has to go out if it's way to pre compile. And even then, the pre compiler doesn't do a lot more, and MS warns people it might be worse, because the runtime knows best. I guess no credible competition leads to not putting tons of resources on things.
But yes, the CLR has a far superior design and can avoid some of Java's pitfalls.
Only in the premature optimization sense, and "familiarity with new programmers" sense. There's nothing wrong with using structs. In fact, XNA and many other game programming libraries do use structs to represent position vectors, which is the context of this article.
>And arrays are reference types too, so things like string.split have no workaround.
If you're okay with going unsafe, arrays can be stackalloc'd.
All the evidence points to the problem being sloppy programming and nothing to do with the runtime.
For other APIs, I'd create a state object for each request or piece of work that contains assorted buffers and other temp objects, then pass it around as needed. Ugly, but at high processing rates, every allocation counts.
The parent refers to the first, which you can avoid by creating your char of delimiter[s] once and reusing it.
Using an arena/buffer/cache in Java and .NET is not unheard of, and is a good way of avoiding allocations, and thus postponing collections. An arena will, however, increase your live memory which means that when a collection finally does happen, it will take longer. The arena will probably be promoted to the oldgen quickly though, so the collector shouldn't scan it to often.
"if nothing points into this entire arena dump it in one chunk" -> is more or less how a GC works, so you get that for free. The way the GC does it, is that it scans everything that is live and considers everything not scanned as garbage, which is then collected. The beautiful part of this, is that even if you have allocated 1Gb of memory, you only pay for what is live. If you have 200mb of live memory, collection will be constant regardless of how much garbage you have, whether it is 1Gb or 2Gb of garbage. Increasing the size of memory of course makes collections happens less often.
Your point regarding JIT is irrelevant in this particular case, as this case is exclusively related to the GC. We would be talking about the same potential speed improvements here with or without a JIT. That being said, you are also wrong.
A JIT can produce faster code than any pre-compiler, due to there being more information available, and because a JIT can easily "undo" compiled code, allowing it to experiment with code that can fail in certain rare cases, but most of the time is vastly faster. Mono, the open source .NET implementation, can be pre-compiled using LLVM (originally a C/C++ compiler), but using the JIT produces faster code.
The reason Java and .NET is "slow" for idiomatic (not higly optimized, which most code isn't) code, is due to the GC, which causes the application to freeze at certain intervals. This problem increases the larger live memory you have. A GC is probably faster overall compared to manual memory management though, the problem with GCs today is that they de-allocate (garbage collect) all at once, instead of de-allocating memory once said memory becomes garbage, which evens things out a bit.
That being said, there are of course downsides to a JIT. It takes time before the JIT is "warm", which is to say it takes time before most of the code in the application is compiled. If you have applications that run for a short amount of time (I think Java's JIT compiles functions that are run 32 times or more), like shell scripts, they get no performance benefit of using a JIT. Another thing is memory use. Since a JIT compiles code as it goes, and nothing is pre-compiled, the JIT has to store compiled code in memory. So a JIT will naturally use more memory than a pre-compiled program, and a GC normally ads to this.
The line about the JIT emitting better code is often repeated. I used to repeat it myself. But diving into the emitted code, I'm not overly impressed. So, sure, in theory, JIT codes going to be all amazing, taking advantage of my cache sizes, common flows, etc. In real life, this doesn't happen. As I mentioned, simply inlining critical code by hand results in improvements, meaning the JIT is doing a poor job in the first place. Structs even used to prevent optimizations in the CLR codegen!
Of course you don't have control whether to place an object in the nursery or the oldgen, that is an implementation detail, but that has nothing to do with the nursery being shared across all threads. Erlang is, as far as I know, the only VM that doesn't share GC space between threads (actually, processes) but has to rely on string-copying, which has it's own issues. Even here, you can't decide where to place an object.
"So your GC run ends up not being great anyways."
When I worked on a C/C++ project a couple of years ago, me and a colleague spent six weeks tracking down a memory leak. I've considered GC runs to be the most amazing thing in software ever since.
"Despite the great speed of the GCs, it still adds unnecessary work and as you note, can cause considerable slowdowns."
The work a GC does is strictly necessary, and the collection time isn't really bad (depends on implementation of course), the only places you might have a problem is in realtime apps like games. In this case however, the GC related problem seems to be quite easily remedied by rewriting some code, and thousands are using a several year old, and crappy, GC in unity and mono 2.x. For most applications these days, the GC really isn't the issue, the code is.
"But if there was a simple way to stack alloc objects when safe"
Value types, Coming Soon to Java, already available in .NET. Of course, using these in the wrong places can prevent or work against later optimizations. Go gives you greater control over this, but that is Go.
"But diving into the emitted code, I'm not overly impressed."
There are certain things to consider here, like tiered compilation. Another thing to point out is also that the exact code you were looking at was difficult for the VM to optimise, or that it didn't run so often that the VM found it necessary to optimise as much as other functions.
We had some developer tool we ported over from C++ to Java, more or less a straight copy of code, with changes to match the different APIs, the Java version was consistently faster (given enough runtime). Sure, you could probably make the C++ version run faster given enough skill and time, but then it wouldn't be idiomatic code anymore, and there the Java version excelled.
"As I mentioned, simply inlining critical code by hand results in improvements, meaning the JIT is doing a poor job in the first place."
There are several reasons why this might be the case. HotSpot has (I believe) an 'inline-budget' which means it only inlines functions less than a certain bytecode size. Similarly, there are other limitations on what code the JVM will optimise. Just because manual inlining worked in your particular example, does not mean you will get the same benefit every time.
"Structs even used to prevent optimizations in the CLR codegen!"
Who cares about 'used to'? It's normal that quirks are worked out over time, and that things improve. By the same logic, it's normal that Java produces faster code than .NET, and it's normal that GCC/LLVM produces faster native code today than ten years ago.
Any mature generational GC should handle this with literally zero overhead. Assuming it is not doing a nursery gc every millisecond, all those objects should die in the nursery almost immediately. It has been well understood for at least twenty years how to do that in O(live objects), and the JVM has for all its many faults a very good garbage collector. So I am quite skeptical. This is also at odds with empirical evidence which is that going from 1.7 to 1.8 with the same world improves framerate.
Now the JVM gc has about a million tuning parameters and most hardcore Minecrafters have a witches brew of tuning that they run with. It's far from impossible that those tuning parameters are totally inappropriate with 1.8. But the GC should handle this fine.
also, it sounds like the devs should be doing some testing on typical user machines, instead of higher powered dev boxes
People forget how sensitive framerate is to seemingly minor amounts of time.
(Even if you have a target of 30FPS (please don't!), that's still an effective target of 43FPS.)
First, he seems to believe that "size of allocated memory" == "longer collection time", this is not true, especially when, as he says, most of the allocated memory is short lived. A GC only scans live memory and considers the whatever hasn't been scanned as garbage. If most of your memory isn't live, as seems to be the case here, collection should be relatively consistent, regardless of allocated memory or the memory available to the JVM. Increasing the memory available should actually increase performance, because the JVM can run collections less often.
It seems to me that what the devs should do (instead of waiting for a proper struct implementation, like .NET has, on the JVM, which would avoid these problems) is to make BlockPos mutable and store unused objects in a cache/buffer. This might be (barely) slower than just allocating the memory, but used correctly it will trigger fewer collections as you allocate way less.
And it's worth noting that escape analysis, which the JVM currently has and .NET does not, is the best of both worlds -- the developer doesn't need to decide or make such macro optimizations, but the runtime can choose, based upon the lifetime of the object, whether it should be a stack or heap allocation.
I once thought the whole value type thing was a superior choice of the .NET team, but now it seems that the everything is an object, just make the VM smarter, was the better choice.
Yes, of course they explore options and alternatives, and no solution is a panacea, but this debate is well over a decade and a half old: This is not new ground. And while we're in a world where Microsoft is saying "give it up with the whole value / reference type thing" (the value/reference type is an implementation detail), Java went ahead and optimized the existing platform to a pretty good degree, in many (but not all) cases, leaving it to a pretty good GC to pick up the pieces.
Just to be clear, I'm no Java booster. I've spent a good chunk of my career heavily involved with .NET. But for all of the talk about the superiority of the .NET platform and its illustrious value types, it's Java that quite soundly takes the performance crown when they go head to head.
Evolving the JVM By Brian Goetz and John Rose
Also, take a look at project valhala: http://openjdk.java.net/projects/valhalla/
No. First, they are semantically different, but second -- it's an implementation detail that provides better performance when used correctly. So, exactly what the parent was describing.
This is not to say that the CLR couldn't also optimize better, but personally I'm just voting for Project N.
As an extremely high performance managed application that competes directly with native code, it seems to be quite a good proxy for similar applications, including Minecraft.
Actually, I wish Minecraft had been open-source like Notch was originally talking about. The community could fix most of these issues, in lieu of introducing new features.
We haven't seen enough of the code to really say whether any of this is even remotely sane, but we can say that creating tons of immutable objects that are temporary and short-lived is going to have a performance impact.
So at best, this sounds like an enhancement to API utility involving a trade-off against performance, and at worst incompetent programming. I'm betting on the former, only the performance trade-off may have been more severe than anticipated. It's also worth keeping in mind that this is probably a "death by a thousand cuts" scenario where this specific case isn't the root cause of the entire performance problem, but one of the contributors.
It's just a sad effect that the JIT is too stupid to optimize. There's no good reason dist(pointA, pointB) should be any slower than dist(ax,ay,az,bx,by,bz).
With mutable objects, like libgdx does for its vector classes and similar, you can do object pooling to reuse them and avoid the GC thrash.
In Java, you can pass XYZ and parameters or you can create an object (in this case, BlockPos). Every object is heap-allocated, and the heavy use of transitory objects (like, say, every frame) creates a lot of garbage that'll need to be cleaned up.
MyVoxel here = new MyVoxel(x,y,z,normal...);
MyVoxel there = new MyVoxel(); // "null" voxel
for // some x,y,z
here.xyz(x, y, z);
for // some offset_x,y,z
here.distance(there) // or whatever
The idea of having lots of "new" in (inner) loops... just sounds really weird. Even outside of sub-ms-game-land. I'm guessing even assigning (as opposed to just using) primitive values to an object probably can have some nasty effects wrt. needless copying -- but it should probably be a lot more predictable than a GC hit -- possible to optimize, and might not trash CPU cache as bad as one might fear.
I suppose it's quite easy to end up in a mess when trying to do a straightforward rewrite from a static/non-java-oo style to a more java-oo-style -- without some careful thought as to what's actually going on...
[edit: looking at teamonkey's reply further down... I think I've seen some code in this style, creating lots of short-lived objects. Is it considered idiomatic java, or an anti-pattern? (or both? ;)]
In non-game contexts, I honestly only start caring about GC behavior when things go bad. If I'm writing Scala (as I generally do on the JVM), I'm creating a lot of objects as intermediate steps here and there. But for the most part, my computer is fast enough and the JVM's escape analysis smart enough that it isn't a huge deal.
This is also handy: http://stackoverflow.com/a/18411275
That sure was news to me, and I see it never addressed outside of games. Which is understandably in a way, but I really think there should be awareness, so that it can be an actual choice to let the GC do it, and not just the only way we know how.
> With a default memory limit of 1GB (1000 MB) and working memory of about 200 MB Java has to make a full garbage collection every 4 seconds otherwise it would run out of memory.
Only if all of the 200 MB make it to old gen.
> Why not use incremental garbage collection?
Nobody should be using -XX:+CMSIncrementalMode it exists for platforms with only one hardware thread.
> the real memory usage is almost double the memory visible in Java
Huh? Yes, Java uses more memory than heap and a copy-collector means half the memory of the heap is unused but I have trouble understanding this.
I was in a JavaOne presentation 2013 when the presenter mentioned that Minecraft runs System.gc() in a thread all 500ms and decided I'll never touch this.
Yet Java becomes more and more popular everywhere, sadly. Used for anything and far from best for anything.
edit: they only put pressure if you go edit an older object and put a backwards pointers from an older generation towards a younger.
Possibly it's better for mid/high-end systems, but harder on low-end?
for( large )
object n = new object //heap allocation
if( n == array[i]) //do something with n
//n is not needed anymore
I imagine the code went from
v1 = Cell.testPos(x-1,y,z);
v2 = Cell.testPos(x+1,y,z);
v3 = Cell.testPos(x,y-1,z);
v1 = Cell.testPos(new BlockPos(block.x-1,block.y,block.z));
v2 = Cell.testPos(new BlockPos(block.x+1,block.y,block.z));
v3 = Cell.testPos(new BlockPos(block.x,block.y-1,block.z));
>So if you need to check another position around the current one you have to allocate a new BlockPos or invent some object cache which will probaby be slower. This alone is a huge memory waste.
 or expose .equals(x,y,z) (assuming BlockPos is just an object wrapper around [x,y,z]). Granted, it kinda defeats the purpose of a completely-encapsulated object, but ya gotta do what ya gotta do when it comes to performance.
There are two ways to make look-up or other utility calls with this fact: by either feeding in raw integer values based on the original values, or by allocating 6 additional objects. In both cases, you need to maintain your original reference position when generating the additional values, because the position is needed to generate all the values, so can't be changed to generate the first one. The integers being passed at a low level saves on the amount of objects being allocated.
This has the potential to end up generating a lot of objects in response to certain kinds of events in the game world.
Key word "around". If a coordinate object is immutable, then doing operations on a neighboring one requires another allocation for the new coordinate object.
Unless you're suggesting having a coordinate version of every function that operates on a BlockPos, in which case there's no point at all in BlockPos existing.
Is this type of system already implemented in java (or similar languages), if not, what are the drawbacks of this approach.
There's no concept of reference ownership at all (Because GC makes it unnecessary). Meaning, the syntax doesn't enforce (and in some ways encourages) you to pass and store your references anywhere you want, giving them any lifetime you want. The issue with this is that if you want to insert your own deallocate call, you have to guarantee that every place the reference is passed to never attempts to store this reference. This includes already compiled code, so it already becomes basically impossible to use this optimization when you use any std library functions. When it's not compiled code, then the compiler still has to attempt to do static-analysis on the source code to attempt to figure out if it's going to store this reference somewhere or not. If the static-analysis fails to be conclusive then you have to give-up.
See this for why it's slow compared to other schemes:
Compile time memory management refers to things like escape and ownership analysis. Escape analysis finds locals that never escape the scope they're allocated in, directly or indirectly, and allocates them on the stack rather than the heap. It's used in openJDK and probably other major JVMs, and required by the Go standard. Ownership analysis verifies that there only ever exists one live reference to an object in memory, so that a deallocation can be statically inserted whenever it leaves scope, so it doesn't become garbage. I have only seen it used in languages where there are explicit ownership annotations, such as Rust and C++.
To be sure, these are not the only forms of compile time memory management, but they're probably the most versatile.
I wonder if it's common to run Minecraft in a profiler or something like that regularly over at Mojang. I used to do that a lot with this one app I used to work on and would routinely be surprised at what was actually going on under the hood.
That's a sucky attitude towards early adopters. Especially since most Minecraft players are young and don't get to control which machines they use.
Especially when you combine it with Intel bugs in graphics drivers that cause Minecraft to crash on opening - this can happen on reasonably powerful laptops.
Makes sense to me.
"Silicon is cheap while carbon is expensive."
i.e. Machine time is cheaper than developer time (to a point).
It makes absolutely no sense when you are selling a program to users that will use it on a wide variety of hardware, from high-end to decade old. That's a great example of a selfish externality, asking millions of players all around the world pay for expensive new machines (even top of the line ones are going to have problems managing 200 megabytes of garbage per second without dropping frames) just to make your job easier.
It's only selfish if they had the resources from the beginning which they didn't. Players also constantly want a slew of new features. Maybe the developers just couldn't juggle both performance improvements and new features simultaneously. Maybe not enough people cared about the lag vs new features? It's not like Minecraft is a competitive FPS where lag matters a lot more.
> asking millions of players all around the world pay for expensive new machines
First the Minecraft still doesn't have heavy hardware requirements.
Second the cost of computing has been trending downward for decades.
> even top of the line ones are going to have problems managing 200 megabytes of garbage per second without dropping frames) just to make your job easier.
Programming Java isn't easy especially when you have an existing code base. It's even messier when you give precedence to performance.
If this is so easy, why not just make a better clone to fix the problem? Minecraft isn't the only sandbox game anymore. If people are that unhappy there are plenty of alternatives today.
As the post mentions, early versions of Minecraft allocated significantly less garbage. You know, the versions programmed by one guy, before they had millions and millions of dollars and a whole team working on it.
Yes, Minecraft has changed quite a bit since 1.3, when Notch abdicated the development throne, but really not that much considering it's been over 2 years and that Minecraft is a generation's Mario. Most of the "slew of new features" came from mods (still no official modding API, either!)
>Maybe not enough people cared about the lag vs new features? It's not like Minecraft is a competitive FPS where lag matters a lot more.
You don't understand. The "lag" you described is a constant delay between input and response. An example would be vsync, which in most PC implementations trades an extra frame or two of latency for smooth, tearless video. The problem I was describing is jitter, which is variable delay. One frame may take 10ms, the next may take 100ms+, due to the garbage collector. This is extremely noticeable and annoying when playing Minecraft, because the game locks up temporarily every few seconds and makes even simple actions like walking in a straight line extremely choppy and disorienting. Aside from being unpleasant and breaking "immersion," it's even gotten me killed when a GC cycle kicks in just before a jump and the game misses the jump input. It absolutely affects the experience.
>First the Minecraft still doesn't have heavy hardware requirements.
It kinda does, if you want smooth performance. You can play it on a toaster, sure, if you ratchet the draw distance down and are ok with sub-30 FPS and lots of GC-induced jitter.
My point was simply that if you make your game perform worse than it could, you're pushing the cost onto the players, and while this is a normal part of the march of computing, at some point it stops being acceptable and you're hurting the experience for the average player. IMHO, considering how ridiculously profitable Minecraft is and how little the fundamental gameplay has changed over the years, it stopped being acceptable a long time ago.
>Second the cost of computing has been trending downward for decades.
That's nice, but little kids from all backgrounds love Minecraft. You can't just ask an elementary school kid in the projects playing Minecraft at his community center on a donated computer past its prime to upgrade.
>Programming Java isn't easy especially when you have an existing code base. It's even messier when you give precedence to performance.
Again, the code performed reasonably well before. The new devs "cleaned it up" without measuring the performance impact of their style changes, and now everyone is paying for it.
>If this is so easy, why not just make a better clone to fix the problem?
Ah, the old "you can't complain because I don't see you doing any better" schtick. Take it easy, I don't think it's so controversial to be perplexed that one of the most successful video game franchises of all time is managed so poorly; for example, you'd think hiring the Optifine guy would be a no brainer.
> It absolutely affects the experience.
1. See above.
2. MS will probably fix it eventually once they fully transition
> Again, the code performed reasonably well before. The new devs "cleaned it up" without measuring the performance impact of their style changes, and now everyone is paying for it.
See my first line.
> Ah, the old "you can't complain because I don't see you doing any better" schtick.
It's a lot easier to criticize than it is to build something.
1.8 performs a lot better than the old version.