Hacker News new | past | comments | ask | show | jobs | submit login
Optifine dev on performance problems in Minecraft 1.8 (minecraftforum.net)
98 points by 10098 on Oct 20, 2014 | hide | past | favorite | 92 comments



> - All internal methods which used parameters (x, y, z) are now converted to one parameter (BlockPos) which is immutable. So if you need to check another position around the current one you have to allocate a new BlockPos or invent some object cache which will probaby be slower. This alone is a huge memory waste.

In other words, Java desperately needs value types so that this kind of simple abstraction needn't cause any overhead.


My takeaway was more like "Developers need to understand the strengths and weaknesses of their chosen platform, and be cautious not to degrade performance during refactoring."

Certainly it would be much easier to just go back to the (x, y, z) convention than to change the Java language.


I agree with this. Sounds like the newer devs are not as experienced/knowledgable optimizing java as notch was.

Article also highlights the challenges of using a GC language for high performance games. GC works against you most of the time and you're better off statically allocating as much as possible.


I dunno, in this particular case, naivete and experience actually look kind of the same. =) It's in that middle ground where you know idiomatic Java where you might make this mistake. The naive programmer who isn't comfortable with OO and the experienced programmer who knows when to decompose OO might make very similar code here.


yeah, it's kinda hilarious. Notch has been bashed before cause he didn't write the code to the "Java Community Standards". It gets rewritten more towards the standards? Boom, performance sucks. So much for those "Standards"!


Well, no. Those standards exist for a reason. For the overwhelming majority ofJVM projects, they are a wise set of choices.

Having seen his work, and his tendency to reinvent the wheel (his original iteration of the Minecraft website used flat files for storage, which is really weird when you're writing Java and H2 is right there), I don't think he's a terribly good Java programmer. Gets stuff done, but doesn't work Smart, to use a Spolskyism. He happened to be right in this case. And that's cool, but throwing away best practices so contemptuously because they aren't applicable in corner cases is at best foolish.


This kind of thing is where I really hate the big GC systems, the JVM and the CLR. Many times, a ton of efficiency could be gained with some simple tricks. For instance, "OO style" code really loves newing up objects, even if they're simple containers. In .NET, this means a heap allocation and garbage, even if the little object is immediately picked apart and never used again.

Example, the String.Split function takes an array of chars to split on. So every time split is called, normally, a new heap alloc happens for no good reason. Functions should be able to provide some simple "pure" annotation and let callers stackalloc. In fact, the CLR already supports this - it's just super cumbersome to get at. The JVM supposedly can do this in JIT, but this post seems to indicate it's not effective.

Secondly, it seems there it often a unit of work where you could allocate from an arena, and then throw it all away together. I know this is more involved, as something could leak out by accident in code. But perhaps the arena could be a hint to the GC "if nothing points into this entire arena dump it in one chunk". Maybe GCs are too fast to benefit from this.

Really, it's s bit if what Rust will accomplish with the borrow checker. I know for a fact (measured) that adding a bit of such management to .NET would be a major boon in certain high allocation scenarios.

Edit: The evangelism around these languages doesn't help. There's a big push to leave it to the JIT, that the runtime knows best. But in truth, they seem to still have fairly suboptimal codegen. Even inlining is poorly handled. For some idiotic reason, they still JIT, and have to make a time/speed tradeoff. Even if it's an program that you're going to execute repeatedly, the installer has to go out if it's way to pre compile. And even then, the pre compiler doesn't do a lot more, and MS warns people it might be worse, because the runtime knows best. I guess no credible competition leads to not putting tons of resources on things.


That's actually not true for .net. The tiny local variables can be structs rather than objects. For example if structs could be used for coordinates in this change instead of full objects, they would likely never touch the heap.


Except it is true because structs are often frowned on, and plenty of things end up as reference types. And arrays are reference types too, so things like string.split have no workaround. Also note that until v4 or so, the CLR had limitations optimizing methods using structs.

But yes, the CLR has a far superior design and can avoid some of Java's pitfalls.


>structs are often frowned on

Only in the premature optimization sense, and "familiarity with new programmers" sense. There's nothing wrong with using structs. In fact, XNA and many other game programming libraries do use structs to represent position vectors, which is the context of this article.

>And arrays are reference types too, so things like string.split have no workaround.

If you're okay with going unsafe, arrays can be stackalloc'd.


Anyone writing Minecraft in C# would use a struct for the position vector. There's no good reason to frown upon it in this case.


It's not C#, but here's Minecraft in C++ and Lua: https://github.com/minetest/minetest


This is a cultural problem. We use structs all over the place in Roslyn. You need to train your developers on how to appropriately use the features available to them on the platform.


Structs are frowned upon? Since when?


Hopefully Java will eventually get "value types" too: http://cr.openjdk.java.net/~jrose/values/values-0.html


Hopefully they make a way for arrays to be stack-allocated and passed by value too and take a step ahead of C# in this department. :)


In C#, you can kind of do this with stackalloc and fixed arrays, but it is still not as ideal as could be.


Tangentially related, but there's a Minecraft clone being written in Rust. Still very early stages, but pretty cool.

https://github.com/PistonDevelopers/hematite


Your String.Split example seems like more of an indictment of the language/library design than the GC. For instance, the language could compile uses of an array literal to a static value that's reused across invocations.


> This kind of thing is where I really hate the big GC systems, the JVM and the CLR.

All the evidence points to the problem being sloppy programming and nothing to do with the runtime.


Whoa. I'm assuming I'm not doing enough String.Splits to worry about performance, but what's my alternative? Looping through the string myself?


That's just one example. There's plenty of APIs that force garbage to be created for no good reason. For string.split what I've done when it was critical, is to statically allocate an array for each type of split.

For other APIs, I'd create a state object for each request or piece of work that contains assorted buffers and other temp objects, then pass it around as needed. Ugly, but at high processing rates, every allocation counts.


There can be two allocations when you call String.Split, one at the calling site where you create the array of delimiters to pass in, and one inside that creates the String[] to return.

The parent refers to the first, which you can avoid by creating your char[] of delimiter[s] once and reusing it.


Some people have built string processing libraries which operate on struct which have a string object reference and an index to the first character and a length. This lets you slice etc the strings without forcing any heap allocations.


Your comment is absolutely adorable, for anyone writing C code. :-)


For .NET you should really read about structs, which allocates on the stack instead of the heap, and thus avoids the GC most of the time. Oracle is working on a struct implementation for Java 9 or Java 10.

Using an arena/buffer/cache in Java and .NET is not unheard of, and is a good way of avoiding allocations, and thus postponing collections. An arena will, however, increase your live memory which means that when a collection finally does happen, it will take longer. The arena will probably be promoted to the oldgen quickly though, so the collector shouldn't scan it to often.

"if nothing points into this entire arena dump it in one chunk" -> is more or less how a GC works, so you get that for free. The way the GC does it, is that it scans everything that is live and considers everything not scanned as garbage, which is then collected. The beautiful part of this, is that even if you have allocated 1Gb of memory, you only pay for what is live. If you have 200mb of live memory, collection will be constant regardless of how much garbage you have, whether it is 1Gb or 2Gb of garbage. Increasing the size of memory of course makes collections happens less often.

Your point regarding JIT is irrelevant in this particular case, as this case is exclusively related to the GC. We would be talking about the same potential speed improvements here with or without a JIT. That being said, you are also wrong.

A JIT can produce faster code than any pre-compiler, due to there being more information available, and because a JIT can easily "undo" compiled code, allowing it to experiment with code that can fail in certain rare cases, but most of the time is vastly faster. Mono, the open source .NET implementation, can be pre-compiled using LLVM (originally a C/C++ compiler), but using the JIT produces faster code.

The reason Java and .NET is "slow" for idiomatic (not higly optimized, which most code isn't) code, is due to the GC, which causes the application to freeze at certain intervals. This problem increases the larger live memory you have. A GC is probably faster overall compared to manual memory management though, the problem with GCs today is that they de-allocate (garbage collect) all at once, instead of de-allocating memory once said memory becomes garbage, which evens things out a bit.

That being said, there are of course downsides to a JIT. It takes time before the JIT is "warm", which is to say it takes time before most of the code in the application is compiled. If you have applications that run for a short amount of time (I think Java's JIT compiles functions that are run 32 times or more), like shell scripts, they get no performance benefit of using a JIT. Another thing is memory use. Since a JIT compiles code as it goes, and nothing is pre-compiled, the JIT has to store compiled code in memory. So a JIT will naturally use more memory than a pre-compiled program, and a GC normally ads to this.


Except since everything is sharing the nursery (or everything on that core or what have you), you've got no control when you need to allocate slightly longer term objects. So your GC run ends up not being great anyways. Despite the great speed of the GCs, it still adds unnecessary work and as you note, can cause considerable slowdowns. But if there was a simple way to stack alloc objects when safe, that could mean a lot, and is certainly faster, and is hassle free- barely worth calling manual management. In fact, there is a way to do this, by using unsafe code and setting up your own object headers; except the GC won't scan your object, so you gotta be extra careful.

The line about the JIT emitting better code is often repeated. I used to repeat it myself. But diving into the emitted code, I'm not overly impressed. So, sure, in theory, JIT codes going to be all amazing, taking advantage of my cache sizes, common flows, etc. In real life, this doesn't happen. As I mentioned, simply inlining critical code by hand results in improvements, meaning the JIT is doing a poor job in the first place. Structs even used to prevent optimizations in the CLR codegen!


"Except since everything is sharing the nursery (or everything on that core or what have you), you've got no control when you need to allocate slightly longer term objects."

Of course you don't have control whether to place an object in the nursery or the oldgen, that is an implementation detail, but that has nothing to do with the nursery being shared across all threads. Erlang is, as far as I know, the only VM that doesn't share GC space between threads (actually, processes) but has to rely on string-copying, which has it's own issues. Even here, you can't decide where to place an object.

"So your GC run ends up not being great anyways."

When I worked on a C/C++ project a couple of years ago, me and a colleague spent six weeks tracking down a memory leak. I've considered GC runs to be the most amazing thing in software ever since.

"Despite the great speed of the GCs, it still adds unnecessary work and as you note, can cause considerable slowdowns."

The work a GC does is strictly necessary, and the collection time isn't really bad (depends on implementation of course), the only places you might have a problem is in realtime apps like games. In this case however, the GC related problem seems to be quite easily remedied by rewriting some code, and thousands are using a several year old, and crappy, GC in unity and mono 2.x. For most applications these days, the GC really isn't the issue, the code is.

"But if there was a simple way to stack alloc objects when safe"

Value types, Coming Soon to Java, already available in .NET. Of course, using these in the wrong places can prevent or work against later optimizations. Go gives you greater control over this, but that is Go.

"But diving into the emitted code, I'm not overly impressed."

There are certain things to consider here, like tiered compilation. Another thing to point out is also that the exact code you were looking at was difficult for the VM to optimise, or that it didn't run so often that the VM found it necessary to optimise as much as other functions.

We had some developer tool we ported over from C++ to Java, more or less a straight copy of code, with changes to match the different APIs, the Java version was consistently faster (given enough runtime). Sure, you could probably make the C++ version run faster given enough skill and time, but then it wouldn't be idiomatic code anymore, and there the Java version excelled.

"As I mentioned, simply inlining critical code by hand results in improvements, meaning the JIT is doing a poor job in the first place."

There are several reasons why this might be the case. HotSpot has (I believe) an 'inline-budget' which means it only inlines functions less than a certain bytecode size. Similarly, there are other limitations on what code the JVM will optimise. Just because manual inlining worked in your particular example, does not mean you will get the same benefit every time.

"Structs even used to prevent optimizations in the CLR codegen!"

Who cares about 'used to'? It's normal that quirks are worked out over time, and that things improve. By the same logic, it's normal that Java produces faster code than .NET, and it's normal that GCC/LLVM produces faster native code today than ten years ago.


From OP: "There are huge amounts of objects which are allocated and discarded milliseconds later."

Any mature generational GC should handle this with literally zero overhead. Assuming it is not doing a nursery gc every millisecond, all those objects should die in the nursery almost immediately. It has been well understood for at least twenty years how to do that in O(live objects), and the JVM has for all its many faults a very good garbage collector. So I am quite skeptical. This is also at odds with empirical evidence which is that going from 1.7 to 1.8 with the same world improves framerate.

Now the JVM gc has about a million tuning parameters and most hardcore Minecrafters have a witches brew of tuning that they run with. It's far from impossible that those tuning parameters are totally inappropriate with 1.8. But the GC should handle this fine.


So, the nursery is a certain size. If you are filling it up continually you're going to be spending a lot of time managing it. Not only that, but if your rate of allocation is high enough you might inadvertently promote a number of objects that have not gone out of scope yet to the tenured section which in a less demanding allocation scenario could have been collected from the nursery.


nursery gc should be very fast, particularly if most objects die; this issue sounds like it needs more investigation

also, it sounds like the devs should be doing some testing on typical user machines, instead of higher powered dev boxes


Nursery gc on my machine (Macbook Air 2013) is 10ms.


So, what you're saying is that in order to keep 60FPS consistantly you need to cram the entire rest of the frame into 6.7ms? (That's an equivalent of running at 150FPS "normally".)

People forget how sensitive framerate is to seemingly minor amounts of time.

(Even if you have a target of 30FPS (please don't!), that's still an effective target of 43FPS.)


Nursery gc is proportional to the size of the live object set (here alleged to be 0) plus a constant overhead. It's not a simple thing to benchmark.


Live memory 25Mb in my case, sorry, kinda important number to mention :P


You still have to trace the object graph - it's not "zero overhead" unless the escape analysis can determine that the object can be stack-allocated (and hence collected without tracing the graph).


If there are no live objects in the nursery, a Cheney-style collector copies nothing and is done. The modification to mark-and-sweep to make it O(live objects) similarly is a bit more sophisticated but routine among high performance GCs.


This post seems to be based on certain erronous assumptions.

First, he seems to believe that "size of allocated memory" == "longer collection time", this is not true, especially when, as he says, most of the allocated memory is short lived. A GC only scans live memory and considers the whatever hasn't been scanned as garbage. If most of your memory isn't live, as seems to be the case here, collection should be relatively consistent, regardless of allocated memory or the memory available to the JVM. Increasing the memory available should actually increase performance, because the JVM can run collections less often.

It seems to me that what the devs should do (instead of waiting for a proper struct implementation, like .NET has, on the JVM, which would avoid these problems) is to make BlockPos mutable and store unused objects in a cache/buffer. This might be (barely) slower than just allocating the memory, but used correctly it will trigger fewer collections as you allocate way less.


instead of waiting for a proper struct implementation, like .NET has, on the JVM, which would avoid these problems

http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-t...

And it's worth noting that escape analysis, which the JVM currently has and .NET does not, is the best of both worlds -- the developer doesn't need to decide or make such macro optimizations, but the runtime can choose, based upon the lifetime of the object, whether it should be a stack or heap allocation.

I once thought the whole value type thing was a superior choice of the .NET team, but now it seems that the everything is an object, just make the VM smarter, was the better choice.


Problem is that escape analysis can't always do a good enough job, it doesn't catch everything. Lead JVM developers have stated this themselves, which is why Orcale is exploring how to implement value types on the JVM. Naturally, value types doesn't magically solve things either (I'm well aware of the shortcomings as described in that article), but for this particular use case, it seems like the perfect solution.


Lead JVM developers have stated this themselves

Where? Who?

Yes, of course they explore options and alternatives, and no solution is a panacea, but this debate is well over a decade and a half old: This is not new ground. And while we're in a world where Microsoft is saying "give it up with the whole value / reference type thing" (the value/reference type is an implementation detail), Java went ahead and optimized the existing platform to a pretty good degree, in many (but not all) cases, leaving it to a pretty good GC to pick up the pieces.

Just to be clear, I'm no Java booster. I've spent a good chunk of my career heavily involved with .NET. But for all of the talk about the superiority of the .NET platform and its illustrious value types, it's Java that quite soundly takes the performance crown when they go head to head.


This years JVM Language Summit

http://www.oracle.com/technetwork/java/javase/community/jlss...

Evolving the JVM By Brian Goetz and John Rose

Also, take a look at project valhala: http://openjdk.java.net/projects/valhalla/


And while we're in a world where Microsoft is saying "give it up with the whole value / reference type thing" (the value/reference type is an implementation detail)

No. First, they are semantically different, but second -- it's an implementation detail that provides better performance when used correctly. So, exactly what the parent was describing.

This is not to say that the CLR couldn't also optimize better, but personally I'm just voting for Project N.


It's an implementation detail that may provide better performance, but often provides no measurable difference at all (yes there is a profound semantic difference, and such is the foundation for the mechanism in the first place, but that hasn't been the context of this discussion). The parent was contrasting it against a JVM that does both better garbage collection, and actual escape analysis, so the comparison borders on absurd, and it is a statement made with zero context.


I can tell you that in Roslyn it is a very important component of our optimization strategy. System.Collections.Immutable.ImmutableArray is an obvious example.

As an extremely high performance managed application that competes directly with native code, it seems to be quite a good proxy for similar applications, including Minecraft.


Escape analysis can't handle the case when a short-lived object is returned from the scope it's created in. It will be heap allocated and become garbage soon after. Value types can be returned and still not become garbage. A further optimization of reference types might be ownership analysis, like rust's borrow checker, but I'm pretty sure that would require an analysis of the entire program, and so would not be possible without AOT hints


Don't worry, Microsoft will fix it when it's rewritten in C#. :) I kid, I kid <I hope>.

Actually, I wish Minecraft had been open-source like Notch was originally talking about. The community could fix most of these issues, in lieu of introducing new features.


According to the post, the problem is passing objects as parameters instead of individual values as separate arguments. Each time they do this, that allocates an object. C# has value types, unlike Java. So the BlockPos structure wouldn't need an allocation, just some stack space.


Why the heck did they decide to pass around objects instead of parameters? This sounds weird, sort of like an internal a framework that has gone too far.


Java takes OO to heart, so when you see that you're passing around (x,y,z) a lot, you're probably also doing a lot of similar things with (x,y,z) inside your method definitions. The OO answer to this is to wrap (x,y,z) in a class; in this case BlockPos.

We haven't seen enough of the code to really say whether any of this is even remotely sane, but we can say that creating tons of immutable objects that are temporary and short-lived is going to have a performance impact.

So at best, this sounds like an enhancement to API utility involving a trade-off against performance, and at worst incompetent programming. I'm betting on the former, only the performance trade-off may have been more severe than anticipated. It's also worth keeping in mind that this is probably a "death by a thousand cuts" scenario where this specific case isn't the root cause of the entire performance problem, but one of the contributors.


Sounds a lot cleaner to me to pass around Point or BlockInfo than separate xyz. Imagine a function that determines which of two blocks wins when sitting on top of a third one. 3 args instead of 12 or something. I can quickly see it being cleaner.

It's just a sad effect that the JIT is too stupid to optimize. There's no good reason dist(pointA, pointB) should be any slower than dist(ax,ay,az,bx,by,bz).


It sounds like something coming from a really weird misunderstanding of Java and its perf characteristics. For most Java projects, immutable, semantically-useful objects make a lot of sense (indeed, Scala does this all over the place). For a game, not so much; you're going to end up object-thrashing all day and unlike on the web you have very hard latency requirements that make this extremely undesirable.


I've not really looked much at how java handles memory, but I thought the basic pattern for all Objects was pass-by-reference: the method gets a pointer to the object. Is it a means of guaranteeing immutable access that such a pointer becomes a pointer to a (deep?) copy of the object allocated on the heap?


They're pass-by-ref, but if you have to create an object every time you want to refer to a block, anywhere, you're gonna have a bad time.

With mutable objects, like libgdx does for its vector classes and similar, you can do object pooling to reuse them and avoid the GC thrash.


I thought the general idea with a voxel/block world was that "world state" could/should be a 3d array/"matrix" of blocks (possibly with empty blocks collapsed to null references)? So of you're using objects, that's what you'd put in the array. I can see how an array of 8 bit ints (basically an enum of block type) would be a lot more compact and/or that some kind of packed representation would be (on the face of it) vastly more compact - but if that's what they're doing, wouldn't it make much more sense to stuff the abstraction into static methods (functions) rather than marshall (full) objects left and right?


Yeah, now how do you define a location in that voxel space?

In Java, you can pass XYZ and parameters or you can create an object (in this case, BlockPos). Every object is heap-allocated, and the heavy use of transitory objects (like, say, every frame) creates a lot of garbage that'll need to be cleaned up.


Right. I was just thinking that the most straightforward, naive, approach in java was to have an array of Objects (so you'd pass around those references), and if you were going to pack everything into a more "primitive" structure (like an array of int8s) it wouldn't make much sense to marshal those into objects, but rather go for a more traditional data structure approach and stuff the complexity into some static methods... or failing that at least use some local, long-lived "proxy"-objects with raw attributes:

    MyVoxel here = new MyVoxel(x,y,z,normal...);
    MyVoxel there = new MyVoxel(); // "null" voxel

    for // some x,y,z
      here.xyz(x, y, z);
      for // some offset_x,y,z
        there.xyz(x+1,...)
        here.distance(there) // or whatever
Which I suppose is essentially caching objects.

The idea of having lots of "new" in (inner) loops... just sounds really weird. Even outside of sub-ms-game-land. I'm guessing even assigning (as opposed to just using) primitive values to an object probably can have some nasty effects wrt. needless copying -- but it should probably be a lot more predictable than a GC hit -- possible to optimize, and might not trash CPU cache as bad as one might fear.

I suppose it's quite easy to end up in a mess when trying to do a straightforward rewrite from a static/non-java-oo style to a more java-oo-style -- without some careful thought as to what's actually going on...

[edit: looking at teamonkey's reply further down... I think I've seen some code in this style, creating lots of short-lived objects. Is it considered idiomatic java, or an anti-pattern? (or both? ;)]


When I write Java games, I use libgdx and its (trivial) object pooling, so yeah, I find this really really dumb. I can't give you a satisfactory answer, because I don't know. :-)

In non-game contexts, I honestly only start caring about GC behavior when things go bad. If I'm writing Scala (as I generally do on the JVM), I'm creating a lot of objects as intermediate steps here and there. But for the most part, my computer is fast enough and the JVM's escape analysis smart enough that it isn't a huge deal.


Slightly off-topic but not really: to achieve a stable 60 fps in Javascript, you pretty much have to avoid creating temporary objects as much as you can. For example, I found this talk both very scary and interesting, and would even say anyone who codes Javascript (or maybe even any language that has a GC) should watch it or a similar one:

http://www.youtube.com/watch?v=Op52liUjvSk ("The Joys of Static Memory Javascript", by Colt McAnlis)

This is also handy: http://stackoverflow.com/a/18411275

That sure was news to me, and I see it never addressed outside of games. Which is understandably in a way, but I really think there should be awareness, so that it can be an actual choice to let the GC do it, and not just the only way we know how.


So much bad advice, in general run Flight Recorder or Censum.

> With a default memory limit of 1GB (1000 MB) and working memory of about 200 MB Java has to make a full garbage collection every 4 seconds otherwise it would run out of memory.

Only if all of the 200 MB make it to old gen.

> Why not use incremental garbage collection?

Nobody should be using -XX:+CMSIncrementalMode it exists for platforms with only one hardware thread.

> the real memory usage is almost double the memory visible in Java

Huh? Yes, Java uses more memory than heap and a copy-collector means half the memory of the heap is unused but I have trouble understanding this.

I was in a JavaOne presentation 2013 when the presenter mentioned that Minecraft runs System.gc() in a thread all 500ms and decided I'll never touch this.


This is why I'm not so quick to dismiss manual memory management. The way it's done in C++ is has always made more sense to me in terms of preventing leaks while introducing minimal overhead. It's true that Java can outperform standard allocators due to issues like fragmentation, however for a lot of use cases, especially in games, it's better to write your own (simpler and more efficient) allocators anyway (i.e. stack allocator or object pool).


Frankly I never even understood why would you want to write any performance-critical code and leave memory management to someone else. It's like it's bound to be slower.

Yet Java becomes more and more popular everywhere, sadly. Used for anything and far from best for anything.


Java becomes more and more popular because the cases where you would benefit greatly from manual memory management, are getting fewer. It's also more easier to avoid mistakes in Java than C/C++, in my personal opinion. Rust could change this though.


Java usually outperform standard allocators because allocation in Java is more or less just bumping a pointer. The JVM GC pre-allocates memory, so this is trivial. De-fragmentation is something you pay for with GC runs.


I'm extremely skeptical of this explanation, because new objects don't put pressure on the GC if they don't survive because of the scavenger on the first generation.

edit: they only put pressure if you go edit an older object and put a backwards pointers from an older generation towards a younger.


It's an interesting post. In my experience 1.8 loads extremely quickly (chunk loading), and my framerate is much higher than 1.7. I haven't noticed memory usage being any different but I haven't watched closely.

Possibly it's better for mid/high-end systems, but harder on low-end?


So they are writing code similar to this:

  object array
  for( large )
  {
    object n = new object //heap allocation
    if( n == array[i]) //do something with n
        //do stuff
    //n is not needed anymore
  }


I haven't seen the code, obviously, but Minecraft has a lot of functionality similar to cellular automata. That is, checking the contents of adjacent cells.

I imagine the code went from

  v1 = Cell.testPos(x-1,y,z);
  v2 = Cell.testPos(x+1,y,z);
  v3 = Cell.testPos(x,y-1,z);
  ...
To

  v1 = Cell.testPos(new BlockPos(block.x-1,block.y,block.z));
  v2 = Cell.testPos(new BlockPos(block.x+1,block.y,block.z));
  v3 = Cell.testPos(new BlockPos(block.x,block.y-1,block.z));
  ...


Not quite. The post describes the usage of a "BlockPos" vector that describes the x,y,z (and presumably rotation/yaw/pitch) for a world position. I think previously they were using primitive integers and floats, and have migrated to using an immutable object instead. Because of the amount of coordinate lookups each engine tick, this generates a vast number of objects.


It also makes this claim, which seems suspect[1]:

>So if you need to check another position around the current one you have to allocate a new BlockPos or invent some object cache which will probaby be slower. This alone is a huge memory waste.

[1] or expose .equals(x,y,z) (assuming BlockPos is just an object wrapper around [x,y,z]). Granted, it kinda defeats the purpose of a completely-encapsulated object, but ya gotta do what ya gotta do when it comes to performance.


The problem is that if a block updates, it triggers something like 6 additional block updates - those +/- 1 in each of the cardinal directions. These block updates in turn may generate yet more block updates.

There are two ways to make look-up or other utility calls with this fact: by either feeding in raw integer values based on the original values, or by allocating 6 additional objects. In both cases, you need to maintain your original reference position when generating the additional values, because the position is needed to generate all the values, so can't be changed to generate the first one. The integers being passed at a low level saves on the amount of objects being allocated.

This has the potential to end up generating a lot of objects in response to certain kinds of events in the game world.


> So if you need to check another position around the current one

Key word "around". If a coordinate object is immutable, then doing operations on a neighboring one requires another allocation for the new coordinate object.

Unless you're suggesting having a coordinate version of every function that operates on a BlockPos, in which case there's no point at all in BlockPos existing.


That's basically what I'm suggesting. Use it only on hot paths you've profiled, maintain purity elsewhere. Java pretty much requires you to make concessions if you're trying to get maximum performance. Similarly: it has a garbage collector, but there are plenty of places where an object pool will out-perform it. Deliberately breaking / re-implementing part of the language for performance reasons.


I have minimal experience with compilers, but couldn't alot of these objects be dealt with using some sort of 'compile time memory management'. Essentially have the compiler notice a point in the code after which point a given object provably has no references to it, and insert an instruction in the bytecode to immidietly dealocate that object. If the compiler can prove that an object will be dealocated this way, it can also mark it such that the GC knows to ignore it.

Is this type of system already implemented in java (or similar languages), if not, what are the drawbacks of this approach.


There's a combo of things in Java that combine to make this hard to do unfortunately.

There's no concept of reference ownership at all (Because GC makes it unnecessary). Meaning, the syntax doesn't enforce (and in some ways encourages) you to pass and store your references anywhere you want, giving them any lifetime you want. The issue with this is that if you want to insert your own deallocate call, you have to guarantee that every place the reference is passed to never attempts to store this reference. This includes already compiled code, so it already becomes basically impossible to use this optimization when you use any std library functions. When it's not compiled code, then the compiler still has to attempt to do static-analysis on the source code to attempt to figure out if it's going to store this reference somewhere or not. If the static-analysis fails to be conclusive then you have to give-up.


This is called reference counting, and it's nigh-impossible to do quickly in a nondeterministic system.

https://en.wikipedia.org/wiki/Reference_counting

See this for why it's slow compared to other schemes: http://www.cecs.uci.edu/~papers/ipdps06/pdfs/1568974892-IPDP...


Reference counting is a runtime operation, an alternative to mark-and-sweep as a GC algorithm that allows for deterministic deallocation at the cost of memory overhead. While the net time overhead will probably always be greater than mark-and-sweep's (although there are optimizations which can make them quite similar), mark-and-sweep has the disadvantage of causing infrequent, long pauses rather than a predictable uniform slowness. Reference counting is used in Python (which also has mark-and-sweep to detect reference cycles), C++ in the form of shared_ptr<T>, Rust as Rc<T>, and objective c/swift (and certianly many more).

Compile time memory management refers to things like escape and ownership analysis. Escape analysis finds locals that never escape the scope they're allocated in, directly or indirectly, and allocates them on the stack rather than the heap. It's used in openJDK and probably other major JVMs, and required by the Go standard. Ownership analysis verifies that there only ever exists one live reference to an object in memory, so that a deallocation can be statically inserted whenever it leaves scope, so it doesn't become garbage. I have only seen it used in languages where there are explicit ownership annotations, such as Rust and C++.

To be sure, these are not the only forms of compile time memory management, but they're probably the most versatile.


personZ mentions [1] escape analysis [2] which is kind of similar.

[1] https://news.ycombinator.com/item?id=8485522

[2] http://stackoverflow.com/questions/8222561/escape-analysis-i...


The fact that BlockPos is immutable is unfortunate, otherwise one could just pop an instance in a ThreadLocal and mutate it whenever it needed it. Better yet just make it an interface.

I wonder if it's common to run Minecraft in a profiler or something like that regularly over at Mojang. I used to do that a lot with this one app I used to work on and would routinely be surprised at what was actually going on under the hood.


A summary of a Minecraft developer's responses on Reddit:

http://www.minecraftforum.net/forums/mapping-and-modding/min...


> Well then people are going to be even more pd when we finally cut support for people running hardware that doesn't support anything better than GL 1.x. At some point you have to make the hard decision to stop supporting hardware that is anemic by the standards of 3-4 years ago, let alone today, and quite frankly I don't feel that Minecraft should have to bear a burden of technical debt and a lack of forward progress in the code base just because a handful of people are unable or unwilling to upgrade their machines.

That's a sucky attitude towards early adopters. Especially since most Minecraft players are young and don't get to control which machines they use.

Especially when you combine it with Intel bugs in graphics drivers that cause Minecraft to crash on opening - this can happen on reasonably powerful laptops.


> This is the best part - over 90% of the memory allocation is not needed at all. Most of the memory is probably allocated to make the life of the developers easier.

Makes sense to me.

"Silicon is cheap while carbon is expensive."

i.e. Machine time is cheaper than developer time (to a point).


It could make sense if you are developing a server-side application where you own all the machines that it will run on and can take a cost-benefit analysis of dev time vs. server cost.

It makes absolutely no sense when you are selling a program to users that will use it on a wide variety of hardware, from high-end to decade old. That's a great example of a selfish externality, asking millions of players all around the world pay for expensive new machines (even top of the line ones are going to have problems managing 200 megabytes of garbage per second without dropping frames) just to make your job easier.


> That's a great example of a selfish externality

It's only selfish if they had the resources from the beginning which they didn't. Players also constantly want a slew of new features. Maybe the developers just couldn't juggle both performance improvements and new features simultaneously. Maybe not enough people cared about the lag vs new features? It's not like Minecraft is a competitive FPS where lag matters a lot more.

> asking millions of players all around the world pay for expensive new machines

First the Minecraft still doesn't have heavy hardware requirements.

Second the cost of computing has been trending downward for decades.

http://www.freeby50.com/2009/04/cost-of-computers-over-time....

> even top of the line ones are going to have problems managing 200 megabytes of garbage per second without dropping frames) just to make your job easier.

Programming Java isn't easy especially when you have an existing code base. It's even messier when you give precedence to performance.

If this is so easy, why not just make a better clone to fix the problem? Minecraft isn't the only sandbox game anymore. If people are that unhappy there are plenty of alternatives today.


>It's only selfish if they had the resources from the beginning which they didn't.

As the post mentions, early versions of Minecraft allocated significantly less garbage. You know, the versions programmed by one guy, before they had millions and millions of dollars and a whole team working on it.

Yes, Minecraft has changed quite a bit since 1.3, when Notch abdicated the development throne, but really not that much considering it's been over 2 years and that Minecraft is a generation's Mario. Most of the "slew of new features" came from mods (still no official modding API, either!)

>Maybe not enough people cared about the lag vs new features? It's not like Minecraft is a competitive FPS where lag matters a lot more.

You don't understand. The "lag" you described is a constant delay between input and response. An example would be vsync, which in most PC implementations trades an extra frame or two of latency for smooth, tearless video. The problem I was describing is jitter, which is variable delay. One frame may take 10ms, the next may take 100ms+, due to the garbage collector. This is extremely noticeable and annoying when playing Minecraft, because the game locks up temporarily every few seconds and makes even simple actions like walking in a straight line extremely choppy and disorienting. Aside from being unpleasant and breaking "immersion," it's even gotten me killed when a GC cycle kicks in just before a jump and the game misses the jump input. It absolutely affects the experience.

>First the Minecraft still doesn't have heavy hardware requirements.

It kinda does, if you want smooth performance. You can play it on a toaster, sure, if you ratchet the draw distance down and are ok with sub-30 FPS and lots of GC-induced jitter.

My point was simply that if you make your game perform worse than it could, you're pushing the cost onto the players, and while this is a normal part of the march of computing, at some point it stops being acceptable and you're hurting the experience for the average player. IMHO, considering how ridiculously profitable Minecraft is and how little the fundamental gameplay has changed over the years, it stopped being acceptable a long time ago.

>Second the cost of computing has been trending downward for decades.

That's nice, but little kids from all backgrounds love Minecraft. You can't just ask an elementary school kid in the projects playing Minecraft at his community center on a donated computer past its prime to upgrade.

>Programming Java isn't easy especially when you have an existing code base. It's even messier when you give precedence to performance.

Again, the code performed reasonably well before. The new devs "cleaned it up" without measuring the performance impact of their style changes, and now everyone is paying for it.

>If this is so easy, why not just make a better clone to fix the problem?

Ah, the old "you can't complain because I don't see you doing any better" schtick. Take it easy, I don't think it's so controversial to be perplexed that one of the most successful video game franchises of all time is managed so poorly; for example, you'd think hiring the Optifine guy would be a no brainer.


How do you know how difficult it is to fix the technical debt of the code base? Have you seen the source?

> It absolutely affects the experience.

1. See above.

2. MS will probably fix it eventually once they fully transition

> Again, the code performed reasonably well before. The new devs "cleaned it up" without measuring the performance impact of their style changes, and now everyone is paying for it.

See my first line.

> Ah, the old "you can't complain because I don't see you doing any better" schtick.

It's a lot easier to criticize than it is to build something.


That applies to a certain amount of throughput, for parallelizable applications. Latency in a game isn't "machine time" per se, it's a core requirement and not something you can get away with via "more silicon".


Actually it is something you can get away with if users upgrade their machines.


This seems like a situation that Scala's value classes were designed for. You could have something like a BlockPos that's represented by a long. Accessing x, y, or z would use bit arithmetic to extract the values from the long.


In other words, every kid who knows how to code thinks they can make Minecraft faster and better than the actual Minecraft developers. Pretty old story.

1.8 performs a lot better than the old version.


The person who wrote this post is the developer of OptiFine, a Minecraft mod that significantly improves and stabilizes performance by optimizing the game at many levels. In order to make it, they absolutely needed to understand the game engine at a level comparable to the Mojang developers.


For me, the performance has been getting worse with every new release, and I'm not even on weak hardware, I have an ASUS gaming laptop that I bought in 2012.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: