> During the period when 1,000 players were online, we reached a maximum of ~7.9GB/s heap allocation and our GC was hovering around 2-3GB/s when averaged over a minute.
As someone who dealt with soft-realtime telephony stuff, this makes me want to scream in horror. It seems like the platform really hurts the performance here. In an average second with that many players, most (all?) of them will not do any action apart from changing their position. (Just moving around would ideally only use preallocated per-player space only; extra in-game events, npc AI, etc. could still use some allocs) It's great Folia can get more out of the bad situation, but... oofff.
(not a criticism of Minecraft by the way, it's better to be popular and inefficient than not exist - I just didn't realise quite how much overhead there is due to the runtime)
> In an average second with that many players, most (all?) of them will not do any action apart from changing their position
The world is not static though. Each player loads in a lot of living entities (monsters/animals/...) and block entities (furnaces, redstone, ...) that all need to update. There's some overlap of course, though in a game like Minecraft where there's a near infinite world to explore that overlap is smaller than you would think.
> Each player loads in a lot of living entities (monsters/animals/...) and block entities (furnaces, redstone, ...) that all need to update.
Sure, but that list of entities in the area is close to static (apart from crazy redstone magic). One would expect them to be pooled and not have many allocations for each tick.
But each has to do a lot of stuff. People on such big servers build farms with hundreds of mobs dropping thousands of items which then go through redstone sorting systems.
I wouldn't say "each has to do a lot". There's a few trivial behaviours usually. Especially if you know what behaviours are common, you can optimise entity lists for that.
But that's not even what we're talking about here. This was a short-lived test server with new environment, not huge farms. The super high GC stats we see here are the baseline behaviour. More complex scenarios will push this even more.
~Normally, yeah, but Folia is based on Paper which, AFAIK, is known for its unreliability when it comes to the technical aspects of Minecraft (ie redstone and farms.) Even your bog-standard simple item sorter is apparently a bit hit-or-miss on Paper.~
Edit: Possibly outdated info, things sound better re: technical Minecraft support on Paper now according to 'ocelotpotpie.
The hardcore technical folks seem to prefer things that don't alter the vanilla mechanics quite as much as Paper, it's true. But FWIW there are lots of farm and redstone designs that work reliably on Paper.
Remember Minecraft is written in Java, not C++. The Bedrock version is written C++ but has only experimental, very rough native server software that is almost never used.
> Factorio doesn't need that amount of RAM when it routinely handle dozen of thousands of items every tick.
It might not inefficiently allocate all that RAM every tick, but it still has to scan a lot of RAM, and write to a bunch of locations to update the game state.
The world is 100k blocks to a side, with players split up in groups of 20 over the whole thing. Since allocation rose steadily from 35 min and peaked from 47 to 63 minutes, I'm guessing they were spread out pretty widely.
There were 113 regions in one of the pictures- call it 9 players per region, with ~927 live unique chunks (8x8 chunks for 9 players ~=566, plus a 19x19 area around the world origin thats always loaded). From what I can tell a Minecraft block is a bit over 10 kB. So ~10 MB per region.
If each thread is allocating 10 MB for each region on each tick (which the screenshot shows is 8 ticks per second) *that would work out to 8.6 gigabytes per second*. IMO, that's too close to be a coincidence- I'm thinking there are a lot of shared chunks per region, but the threads are copying significant amounts of data by value or they just have an absurd overhead.
Block entities (those with complex data, like inventories) can be read/interacted with by redstone (a regular-ish set of blocks in terms of data), which can then trigger a piston or dispenser (which changes the world), and said piston or dispenser can then interact with the players movement, pushing them, or hitting them with an arrow, or dispensing water that changes their movement.
Same again for non player entities, but also factor in their AI having to adjust pathing based on the world changing.
> can be a highly asynchronous and paralellised process
It really can't, though. Parallelization is the enemy of consistent game mechanics, especially in a complex sandbox game like minecraft. There are tons of things that interact with each other in minecraft's world - if you just tick everything at arbitrary times, those interactions cease to be deterministic and might break entirely unless you write a ton of spaghetti code to deal with every edge case. Redstone in particular is the best example of a mechanic that heavily relies on the synchronous nature of the game logic loop.
The Minecraft Java version is notorious for thrashing memory. It allocates and discards objects at an incredible rate, even when the player isn't doing anything. The server has a GUI where you can watch the memory allocation with its distinctive sawtooth pattern.
I'm excited for Project Valhalla to affect Minecraft performance; a lot of the temporary allocations are things like BlockPos objects (a Vec3i, basically) and VoxelShapes (a tree of axis aligned bounding boxes; so, like, an array of a struct that's six f64s) which seem ripe for becoming value objects that are inlined by HotSpot instead of living on the heap
Why not pool these objects? I'm primarily an Android developer and it's a well-known easy optimization on Android to avoid short-lived objects as much as possible, but especially during drawing and other actions that run every frame. You just don't use the word "new" in onDraw and other related methods. Android Studio would even warn you if you do.
But then the new APIs in JDK itself are designed such that you have to allocate loads of short-lived small objects. I was told that HotSpot does deal with them reasonably well to avoid them degrading the performance, but apparently it isn't very good at it?
Allocation is typically really cheap, maintaining pools for objects would likely have more overhead. And while collecting garbage takes resources too, it's heavily concurrently, especially in GCs like Shenandoah and ZGC. So instead of more overhead due to pooling on the thread that uses the objects, you have more overhead on a different thread during garbage collection.
So while it makes sense to avoid unnecessary allocations by using different APIs (e.g. not creating Streams in hot paths), pooling brings far more new problems with it. It might make sense for large objects, but generally requires in-depth analysis to make sure it actually helps.
Also, when plugins come into the equation, you need to make sure that those can't modify objects they aren't meant to modify, which involves copying of objects. Additionally, some objects have different representations in the API (what's used by plugins) vs the implementation (what's used by vanilla minecraft), so converting between those representations is another source of allocations.
Pooling would have been more overhead than allocation + deallocation? Do you have any relevant readings?
No idea how to do it in java but a few pointers ought to be enough. You can also omit clearing the memory area between allocations for things that aren't security sensitive, if that is done in java, which I would assume.
Allocation + deallocation might have more overhead together. I'll try to rephrase: Requesting an object from a pool might have more overhead than allocating a fresh object, and deallocating the object doesn't happen on an application thread but on a GC thread (depending on the GC, obviously).
Alexksey Shipilev has good resources how GCs in HotSpot typically allocate objects (https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/).
There definitely are scenarios where pooling might make sense, but basically the low hanging fruits in that area in minecraft are already reaped.
I got that, and thanks for the link. But I'm not convinced a pool would have more overhead. It is basically the same thing (not really, but then again, pretty much) just you can encode more information about the usage than a general GC can possible do.
For sure there are other tradeoffs, and whether it is worth it. But when we actually see many GB/s that is not cheap even if you are able to offload to other cores.
The article also concludes:
>It is funny to consider that having TLABs is the way to experience more frequent GC pauses, just because the allocation is so damn cheap!
Sounds like a nightmare, the problem just snowballs, because now you'll be tempted to get into GC tuning.
Seems like there should be a performance benefit from not constantly blowing out your L2 cache. If you can keep your object pool hot there should be quite a bit of performance to be gained. The downside is that this would require active memory management (explicitly freeing the objects when you're done with them) and if you have that why bother programming in a GC language in the first place?
For short lived objects, heap allocation is probably about as fast as allocating on the stack. By pooling you can end up moving objects out of the fast eden space into older generations.
I worked on optimising a java library a few years back and one of the bigger speed ups was removing all the object pooling code.
But that means that the pools still use the GC? If so then I fully agree. But the point of a pool in my mind would be to reuse the memory rather than allocate+deallocate it.
> not a criticism of Minecraft by the way, it's better to be popular and inefficient than not exist
I'm glad you said that, it's a very wise statement. It's the origin story of so many working-but-inefficient projects. We just don't see the graveyard of perfect-but-never-finished projects on the other side of the scale.
An age old question... Is the code wrong for not being optimal, or is the compiler/JIT wrong for not optimizing it? I've seen optimizing compilers like clang, gcc, rust optimize constructs similar to this to the same machine code. Ideally you want your high-level stuff that makes code easier to understand to be completely transparent and not have a performance impact when the code actually runs.
Neither is wrong, and there's very likely a lot of low hanging JVM performance fruit here still to claim.
Right now they're in a bit of a transitional period. Since a few weeks ago the most obvious thing to try is using the latest Oracle GraalVM with ZGC. ZGC is a pauseless GC like Shenandoah and the Graal compiler is a lot better at escape analysis and removing allocations than the stock C2 compiler especially now Oracle made the enterprise compiler free to use.
The main problem is going to be that ZGC Generational is not quite launched yet. Without generational GC it may not be able to keep up with those very high allocation rates, even with a better compiler reducing them down again. Generational ZGC should be in Java 21 but then you'd have to wait for GraalVM to catch up. So, probably it's worth trying that combination in the next six months or so.
Whilst I don't know about ZGC vs Shenandoah, the performance improvements from using Graal EE over C2 can be large even for Java (the wins are much bigger still for other higher level/more dynamic languages). But the Minecraft community isn't really known for adopting the latest JVM tech. They're pretty conservative.
For my simple case, it would. Generally, the code around the allocation is complicated so maybe it's not getting optimized for some reason. Or maybe the object is passed through some complicated path... or maybe it's added to a list and then cleared at the end of the frame...
The code couldn't been written better to allocate less. But the runtime encourages cheap allocations you don't think about. But the runtime could provide/encourage tooling that makes it hard to make that mistake. But...
It all overlaps. Sure, JVM is cool and handles it, but also what JVM is influenced how people instinctively used it.
Most people don't write Java games this way. If you look at any libgdx code examples people cache objects and use polling - you don't know how modern a VM the user will have especially on Android.
Minecraft Java used to allocate over 200MB/s per player when moving, and just 50MB/s when static. A lot of it stems back from Minecraft 1.8, where code was updated to take a BlockPos(float x, float y, float z) class instead of just (float x, float y, float z). Pretty much every single object in the game has to deal with its position. Millions of objects allocated per frame, only to be discarded later. Along with a looooot of other questionable choices. It's not even a runtime problem: you'd have the same issues free()'ing that memory on your own. Just stop allocating so damn much.
But then again, Notch never claimed to be a great programmer.
Notchian code ran way better than modern MC - even though it used completely obsolete OpenGL 1.1 fixed-function calls.
They've upgraded to OpenGL 3 and shaders now; but the game's memory and CPU usage is through the roof with inconsistent performance and huge lagspikes.
Most of this is caused by utterly spurious and unneeded allocations; stream code on hot paths and complete overengineering and abstractions totally unfit for purpose. By that I mean heavy usage of inheritance and indirection in rendering code, and horrible memory wasting with vertex data. (Most things are full floats even if they could be packed and in some cases, even duplication of the same data due to bad data design)
Do you have any source for that? It seems questionable at best. Also, depending on where it is used, Java deals very well with indirect calls, it doesn’t have the same cost as, say, a C++ virtual call.
Source for which section? I am basing this on mostly my experience and codedigging - I don't think there are many written sources about observing this other than anecdotes on forums or reddit or whatever.
With that being said, I'm happy to investigate in detail if you can tell which parts you are interested in.
You are right, the JVM has many optimisations built in to de-virtualise calls and avoid allocations and the such, but if you abuse it, the JIT will give up. I don't think any JIT will de-virtualise or perform escape analysis on 5-6 layers of indirection with branching.
Notch was well aware of the issues around allocating a lot of small objects in Java. He used to write 4k games with a byte code assembler. Implying technical knowledge of the platform was not the problem.
the post-1.8 codebase awfulness has nothing to do with the source programmer. hell, by 1.13, the game had practically been rewritten at least once over.
Do you think a minecraft server (and multiplayer game servers in general) can benefit for running on a soft-realtime OS like Linux with the PREEMPT_RT patch?
Games often use their tick rate (iterations of the event loop per second) as a quality metric, and an rt OS should favor preemption and low scheduling latency over throughput... so maybe that should give a more fluid experience?
But the bottleneck for minecraft seems to be in memory allocations, so an OS that can schedule threads rapidly may not change much after all.
First, Minecraft would have to actually support multithreaded work properly. Here the issue is just plain throughput / computation. RT usually makes things slower, but more predictable. MC needs to go faster first.
That's a sample size of one, but I did try it once and it ended up being basically the same. HotSpot took a measurable amount of time to reach GraalVM performance (in the minute if memory serves), but once the hot path got JITed, they were basically on par. Since a minecraft server is (usually) a long-lived process, it does not matter in the end.
Java is absolutely used for perf critical workloads, hell, I am fairly sure that there are no runtime in existence that would handle that load of memory pressure better - of course the problem is sort of self-made, not every allocation should have to be done in the first place in this case.
But Java is heavily used in high-frequency trading, is the backbone of many cloud infrastructures and it is such a large platform that there are plenty of small niches where it is used and performance is important.
I'd rephrase that as "Java usually isn't the first choice for perf critical workloads". It's still a very performant language if used correctly (outperforming C in some benchmarks where the JIT can shine), and used for lots of performance critical stuff.
Always fun to continue to see innovation within the Minecraft Java space.
At Hypixel we ran (and I believe they still do run) a custom fork of Spigot from 2014-ish, with features from each sequential update to Minecraft being added to our fork via our Spigot fork. This let us diverge greatly and customize the Minecraft protocol to our own needs, saving hugely on internal bandwidth and letting us optimize crap out of the L7 "BungeeCord" reverse-proxy we used.
Oh god that's a throwback. I remember hanging out in the bukket IRC before all the legal debacles. I worked on MCSG and Wynncraft. It honestly was a blast slamming together patches on spigot(craftbukkit(nms))) onion; I remember writing a patch to run entities off main-thread among other things. I hope you're doing well and I hope Hytale is doing alright! I feel like everyone I intereacted with back then ended up doing well in tech in some form.
That can't be true, minecraft already had multiplayer in 2009. He didn't make it intentionally painful and never said anything like that. I remember there being huge issues with multiplayer but that was mostly because of bad architecture, not intentional.
Correct, I don't think he hesitated even a little bit on adding multiplayer support. The very first announcement of Minecraft's existence mentioned two planned game modes (fortress and team survival) that would require multiplayer and that all gamemodes would support it https://forums.tigsource.com/index.php?topic=6273.0
yep, just to add to that I remember there was a big deal to rearchitect the entire engine to use a client-server architecture even when running singleplayer. This also enabled the "Open to LAN" feature in which you can start a server from your singleplayer world with a single click.
I wonder if we’re going to look back on how YouTubers talk in the same way we look back at stereotypical forced accents on corporate news channels. It feels so unnatural
I am interested in multithreading and parallelism. So I have a journal entry to explore which is about deliberately desynchronizing and resynchronizing game loops for performance.
* For the number of clients can use epoll or liburing.
* Can multiplex multiple sockets per thread my epoll-server does this.
* Can split out recv and send across threads so you can send while receiving and receive while sending
* For the game loops can divide the territory covered to a different game loop.
My question becomes how do you synchronize game loops across threads, you could latch on each thread for partial causality between game loops.
> deliberately desynchronizing and resynchronizing game loops for performance.
> latch on each thread for partial causality between game loops.
This sounds like a recipe for misery. You have to start thinking about ""light cones"" if causality propagates at finite speed. Do all observers in the game universe witness events in the same order? What if they don't?
You get a little bit of this with rollback netcode, on the level of a few frames, but in that case there's always an authoritative causality and the client's guess is subordinate to it.
It also sounds like fertile ground for exploits, both of the duplicating-object type and the glitch-through-geometry type.
> Is the bottleneck for networked games the network of broadcasting updates? Or the CPU usage of updating object states?
Bit of both depending on precisely what's happening.
Ultimately the problem is that, for N "agents" (players or NPCs or active blocks or monsters, etc) in a space, if you're not careful you end up with O(N ^ 2) checks of the form "has X collided/interacted with Y". Octree systems can spread this out, but you're still vulnerable to "what if all the players go to the same spot?"
Instancing lets you back off the worst-case situation by limiting the maximum number of players in a particular spot.
>The server was prepared with a 100k x 100k block pre-generated world. Our custom plugin distributed new players to the least-occupied region.
Minecraft chunk generation is notoriously though, and even a few players generating new chunks will bring the TPS down to an unplayable state very fast.
(Disclaimer: I'm on the Paper team, and Paper is the org under which Folia sits.)
This is true to an extent, yes. Chunk generation can be real tough. That said, chunk generation has been re-written in Paper, so it's substantially faster than vanilla Minecraft.
When we did a large scale real player test on a much earlier build of Folia we had ~327 players at peak and we did not generate chunks because we specifically wanted to see how it ran. We didn't max out at ~327, we just didn't have more joins than that.
The project improved a bunch already so that number without chunk generation is very doable to beat if you didn't pre-gen any chunks.
Generally it's free performance to pre-gen your world though. Highly recommended for regular players.
> Minecraft chunk generation is notoriously though, and even a few players generating new chunks will bring the TPS down to an unplayable state very fast.
And that's knowing a big part of the chunk generation already happens off the main thread.
In reality, big Minecraft servers like Hypixel that handle 100k+ players also shard. You jump between many different servers seamlessly using something called Bungeecord.
Planetside 2 has hundreds of players on one map and can theoretically scale to thousands, though the networking gets very noticeably worse as you crowd more players into one place.
Sorry to self-promote, but, Angeldust (https://angeldu.st) is a single, persistent, real-time voxel world going for about eight years now. Can handle 300+K players on a single server due to massive parallelism. Unfortunately we don't get those player numbers right now, but hopefully in the future!
Planetside 2 has them on on huge map, but as soon as more player are nearer in one area the system got problems too. Lags, Jumps and wrong running clocks (different speeds an the map). They shard within a server. But PS2 Networking is really good and optimized, better as most current games, and Planetside 1 + 2 (and parts of the Engine) are 20 years old. Most new games are overwhelmed with 64 Players. PS2 can handle 200 on small areas without any problem.
Planetside 2 does not have good network code, it's very inaccurate and prone to cheating, I mean when you run at 20hz everything is possible. It's really not a good example of modern "netcode".
Shards in UO are called that way because of the lore (gem shattered into shards), and each shard is just a regional server completely isolated from other shards. So they are not very different from Minecraft or Planetside servers. Sharding in other games split the user base based on active load. This is not the case with UO.
I think you misunderstand the comment about 150 players. It's not the max. UO is simply not that popular nowadays. Back in the day there were thousands of players online per shard.
Well, at the smallest level (if I recall correctly) it's a single solar system; they have a few beefy servers on stand-by, one is always in use for Jita, the main and most active trading hub, and others can be spun up and a whole solar system transferred over if it gets busy. And as someone else pointed out, you can organize a large fight in advance so they can transfer it over.
But there's the bottleneck, because they can only do the calculations of ship movement & actions on a single node. I'm sure it's been optimized to no end as well. IIRC it's written in Python, but that's not going to be the main performance bottleneck.
I have only the smallest of clues about distributed systems, the only way they could scale it up is to somehow make it so they can run a single solar system or cluster of ships on multiple servers, but for that you get the overhead of inter-server communication or you need an asynchronous, eventually-consistent game instead of something realtime.
Yes, so in that sense it's sharded. In another sense it's all one server, as you can warp between different systems and meet all the players there in the game. Everything is run on one giant server.
Not sure how it is now, but back in 2014-2016 you could inform them of big battles ahead of time for a specific system, at which point they would, in their own words, reinforce the node (moving that particular system to its own allocation of resources). This often was the difference between being able to duke it out in an epic space battle or wait for the system to load while everyone lagged to death.
(Disclaimer: I'm on the Paper team, and Paper is the org under which Folia sits)
AgentK20 explained the technical bits along with the other reply to you, but basically the "actually all in one world" is the big part. For sharding you're handing players off to different instances. With Folia someone can just walk from one person to another without any issue or lag.
Folia dynamically groups people into regions depending on their distance. So it was designed to have people spread out to be able to support many regions across many threads. You can put 1000 people in one spot, it just gets very unhappy and you're now not really taking advantage of the whole point of Folia.
It's not a solution for every server or person but it's definitely cool because it's another tool in the Minecraft tool box. And the API is very similar to Paper, whereas the sharded options start to get kinda tricky quickly.
As I understand, Folia can't help you if all your players are bunched together, but unlike having separate servers connected by Velocity or Bungee, it is a seamless experience -- everyone is in the same world and can visit anyone else without having to use commands, go through portals, or even wait for a loading screen. The "bubbles" that make up the world merge and split seamlessly, with special attention paid to avoid breaking vanilla game mechanics.
[Speculation, I don't have the data and wasn't involved in the test] The test placed each 'team' 10240 blocks apart, which is close enough for a ~30 minute walk to get to another team (~4 minutes if you take the nether), so I assume a lot of bubble merging/splitting happened. It's also far enough away that, if the regionizing logic allows for it, you're split off onto another thread relatively early into your journey to visit another 'team' spawnpoint. This is a relatively realistic scenario for an SMP, where towns are usually 2,000 to 20,000 blocks apart.
Hah. On 2b2t, an "anarchy" server where any player can blow up any build and type whatever garbage into chat they want, players often opt to travel out tens of millions of blocks in an effort to make their builds extremely difficult to find. Can take many hours of nether travel to get to where you want to go.
And inevitably, people still find them, usually through advanced techniques -- hacks that tell you whether a chunk is newly generated or part of an existing 'chunk trail' can be used to hunt down players that have taken great effort to hide their location.
Anyways, a distance of 10240 is just how this particular test was set up. The regionizing is still useful in a much cozier world. As I understand the only way for it to be guaranteed that everyone is inside the same bubble (and therefore the ticking is all on a single thread) is for everyone to be in the same 768 block radius, or for there to be a line of players spaced this distance apart. It's rather atypical for a server to organically develop this way, since people really like to explore for hours before settling in the perfect spot. But some heavily planned/curated server are like this.
Keep in mind that the entities-in-viewing-range scales quadratically with players. 10 players in view distance is (at 20 ticks per second) 180 packets per second *per player*, or ~2k pps on the server. 100 players, ~200k pps. 1000 players, ~20m pps. All on a single Java thread, hence why Minecraft typically performs better on high clock-speed low(er) core count CPUs. Folia splits the load for the world into multiple threads, but only on a per-region basis, with each 512x512 being (potentially) on a different thread.
EDIT: Correction, as doctor_phil points out that's a quadratic scale not an exponential scale. Derp.
> Folia splits the load for the world into multiple threads, but only on a per-region basis, with each 512x512 being (potentially) on a different thread.
A small correction - Folia regions are not Minecraft regions. They don't necessarily align with the 512x512 region grid, and can take different shapes and sizes depending on what the regionizing algorithm is doing. The term "bubble" is used in some of the documentation and may be a more accurate description.
Ohhh, very interesting. I'll admit I only took a cursory glance at the code when it was first announced on the Paper discord since I'm not as much involved with MC anymore, but that's good to know. I'll need to dive in again one of these days.
That is a completely different problem that is mostly bottlenecked by bandwidth and not CPU time. Every players data has to be sent to every other player. This requires a CDN like architecture to increase bandwidth.
Sure you need fast tick updates for dynamic models right in front of you, but something ~100 virtual meters away? can at least halve it, 500m maybe even 1/10 will suffice.
Has anyone made a comparison of Minecraft server performance on ZGC vs Shenandoah vs G1GC on a recent JDK version? I'm curious whether the amazing GC pause numbers hyped by the latest versions of ZGC hold up (max pause times of 500 microseconds and averages of 50 microseconds).
Also what's going on with the generational Shenandoah GC spikes in that graph? There's some pretty crazy spikes that completely dominate the pause numbers and make it pretty hard to read the graph. It looks like GC pauses kept increasing up until the server crashed?
Just wondering: Is Paper consuming less resources than the vanilla Minecraft server for a few players (1 to 10 players). In other words, is it less costly to operate a Paper server on the cloud?
Absolutely. Depending on your hardware using Paper can be the difference of the server not even starting and a playable game. Paper is an absolutely incredible effort.
Mineplex has won the Guinness World Records award on January 28, 2015 for having 34,434 concurrent players, the most on a Minecraft server at the time.
Mineplex had the most concurrent players but they weren't on a single individual bare metal server or single server jar instance.
A lot of large "servers" like Mineplex, Hypixel, etc run a proxy which sits in front of a bunch of other servers. The concept of a "server" can have many meanings in Minecraft so it gets fuzzy quickly.
I'm not sure if this 1,000 person test is a record but I'm not aware of any other test running so many concurrent players on a single dedicated box with a single instance of Minecraft running. Folia takes advantage of more CPU threads so everyone is in sitting in the same server instance.
Ha, nice, thanks. I've never looked into this deeper than accepting the fact that it's doable. Now, if only I had the spare time to look into how did they shard the servers...
We tend to call those "networks", when they have a bunch of different gametypes and servers interconnected. At a smaller scale of a few hundred players it's pretty easily handled by off the shelf software like Velocity for the proxy and a few server instances running Paper.
You generally want good performing (not VPS or "cloud" instances) but you can run a bunch of different worlds and have people with cross server talk, the ability to warp between worlds, etc. Most of the larger networks and nearly all of the smaller networks use a variation of this kind of model.
* https://papermc.io/software/velocity - Newer, more performant, maintained by the team that makes Paper, one of the leading performance MC server implementations.
For context of scale btw, when Hypixel had 200k+ players online at peak we had something like 2,000+ 1U E3-1271v3's each with 32GB RAM, all colo'd in a single DC in Chicago. Egress is 70-80gbps or so 95th percentile, with most months (at high peak) egressing 10PB/mo+ of real-time, uncacheable data.
I used to play Minecraft as a child years ago, I also got into mod and plugin development as a kid.
I thoroughly enjoyed creating different game modes, similar to Hypixel. Whenever I would play on Hypixel I’d think about how the games had been implemented.
Seeing this thread brings back a lot of nostalgia. Reading through these comments makes me realise that a lot more work went into these servers than I had ever imagined. Naive me thinking it was just a bunch of spigot servers with bungeecord thrown on top.
I’d love to revisit and get back into it all. Alas, I’m stuck working on software nowadays instead.
That is 100 players per server = 12.5 players per hardware thread. The linked article got 31 players per available hardware thread, but was only able to saturate part of the CPU landing at about 70 players per used hardware thread. The TPS also dropped by a factor of 3. If you were running at 50% utilisation and maintained 20 TPS, then I don't see a significant increase in performance. I imagine I'm missing the point ;)
On a side note, I've been trying to figure out how to write Java mods for server-side vanilla Minecraft server, but there is basically no documentation at all.
Is it safe to say that this isn't possible? Which flavor of minecraft server is the best documented to write mods for, Spigot?
- Fabric (and Quilt), which uses a mixins system, a really flexible modding API that can do client and server modification
- Paper (and Purpur, Pufferfish, Folia), which is similar to the old-school Bukkit/Spigot ecosystem and supports an intuitive API for server-side plugins. If you're aiming for >30 concurrent players you really need the sorts of performance patching that Paper comes with out of the box, or some custom-developed equivalent to it.
- Forge, which is kinda a general purpose modding API, good for heavy client+server modification like FTB packs
I'd love to green-field everything in Fabric, but the ecosystem is not quite there yet for more serious server setups.
> If you're aiming for >30 concurrent players you really need the sorts of performance patching that Paper comes with out of the box, or some custom-developed equivalent to it.
Fabric has quite a few performance mods that can achieve the same thing. I launched my new 1.20 map last week, and TPS was hanging on at 40 players without most of them even enabled.
Yep, there are some great performance mods for Fabric. This ecosystem of mods is not quite as thorough as paper, but it is improving over time and there's no technical reason Fabric modding can't be used to bring about scaled-up Minecraft without hiccups.
I would say out of those, Paper, mainly due to lack of crazy toolchain complexity or unusual concepts like mixins -- while still being quite capable at more advanced things, including an 'escape hatch' to access raw Minecraft objects in the rare cases where that's needed. It's a very intuitive API. I'd say it's a great introduction to Java.
Pneumaticcraft, Psi, and ProjectRed are three other very logic-heavy Minecraft mods
Minecraft Pi Edition was an ARM-only release of Minecraft Pocket Edition that is scriptable using TCP sockets, with APIs for various programming languages. Unfortunately, it was dead by the time it was finished, as it never got any updates after the initial release, doesn't have ARM64 support, and is a very limited version of the game https://www.minecraft.net/en-us/edition/pi
Definitely Paper. Working with Fabric often requires you to get your hands dirty at the compiled bytecode level, which is very powerful for mod authors but really increases the floor for "actually building something that's fun to use" in a way that will probably just be very frustrating for new programmers. Forge is focused on client-side modding which is just inherently trickier and requires more awkward APIs. Paper really gets out of your way and lets you program basically whatever you want to do server-side without worrying at all about needing to install mods client-side. I would't worry at all about capabilities—minecraft is an extremely flexible program, especially with the chat commands system, and you can still create cool custom UIs and behaviors without touching client-side code at all.
There's a bit of a distinction in the Minecraft world between "mods" and "plugins". Even though to the outside perspective they're the same thing, the terms are used for two different communities/ecosystems, basically.
"Plugins" are for bukkit-based compatible servers, like Spigot and Paper (and Folia, with some modifications).
"Mods" are more for forge, sponge, etc servers.
So you can kinda pick which direction you want to go. But the Paper project has a ton of API documentation and a few Github repos to even get you started writing your own plugins. The Discord server has a dev help channel with people happy to answer any questions.
The Sponge team has a similar setup as well if that's the route you'd like to go.
Microsoft has struct rules about the stuff you can sell for the server, and servers need to be EULA compliant. If they aren't they can get blacklisted or shutdown.
People do buy plugins but the vast majority of the plugins you'd want to use are free open source plugins. A lot of the paid/closed source stuff is kinda crappy or misleading anyway.
You can run a great server using free plugins and software. It might not be a money maker, but if your goal is to have people play and have fun and not buy stuff it's super easy and doable.
People do still donate to projects to support the authors who make the wheels spin.
Worth mentioning that this requires mods which break many in-game devices. The official in-game behavior was not able to be scaled.
That's an interesting and legitimate strategy to the scaling problem, but good to know the downsides with the upsides when trying to take lessons from this.
(Disclaimer: I'm on the Paper team, and Paper is the org under which Folia sits.)
Paper (and Folia) have differences from "vanilla" (official) Minecraft, which is part of how they improve performance to begin with.
So it is true that there are differences from vanilla. Most "in-game devices" - if you mean builds, redstone machines, mob farms, etc - have versions that people have found will work on Paper and Folia. They might require some tweaking but complex redstone and mob farms are definitely very doable.
The goal is to have a close-to-vanilla experience while fixing bugs, patching exploits, and improving performance to run more people on a given piece of server hardware.
The biggest issue with Folia right now is that it's very new so there aren't a lot of plugins that support it just yet. That's changing every day! And of course it still has some crashing issues because it's still in development. :)
Do you know any redstone incompatibilities that Folia has that Paper doesn't? From what I've heard, the goal is to have very good redstone compatibility, but I haven't yet heard much about actual testing with complicated machines.
Cool stuff. The possibilities of Minecraft if some optimisation work was done would be amazing. It's demotivating just how rotten the internals of Minecraft are. Bedrock edition has really nice performance but nobody wants to play it.
Bedrock is also much harder to mod, being written in C++, and mods are a big part of what people like about Minecraft. They have an add-on system but that's just fancy JSON config files so you can only do things like adding a new block, not complete reworks of the internal code to improve performance or anything like that which is totally possible on Java Edition
Bedrock rendering performance is really good. Tick performance? Not so much, the game is riddled with horrible issues such as world fallthrough which don't exist on Java (due to floating-point rounding errors), and I think the default tick distance is something like 4 or 6 chunks. Bedrock looks good, but the simulation itself is just broken. Not to mention the uncanny feeling as previous posters said - they've managed to create a similar game but with a sterilised, odd feeling. It's like watching a bad Disney adaptation of a book.
I think the devil is in the details there - getting the terrain generation right, handling of things like redstone, and replicating the various glitches+bugs that people tend to use. Otherwise you've made a Minecraft-alike, not Minecraft.
If you want to read more, some technical details that were previously only covered by scattered blog posts and github gists have been consolidated into the project's documentation - https://docs.papermc.io/folia/reference/overview
Somewhat orthogonal to what Folia is doing, and already explored by the community.
Folia's main trick is to dynamically split the "main game tick loop", where logic and actions are processed into several threads based on location.
Such that processing for say a mob farm in the South of the map does not impede on processing for a hopper based storage system in the North West etc.
This doesn't solve the everyone's at one place problem (or "Jita problem", in EvE online parlance) but does solve some issues with populated servers where player density isn't extreme.
From reading the comments here, it would appear the problem is sending out position updates to all the clients. If each client is on a TCP connection, this will end up being a problem as each movement causes everyone else within range to receive a message. If everyone is moving, that's a heck of a lot of messages everyone is getting. Not only that, everyone needs a copy of the message in their TCP pipe, one at a time, sent from the server. So it quickly explodes. I guess you can think of it as O(N) clients each causing O(N) messages to go out, so it explodes quadratically.
This isn't likely what I'm about to suggest, but you could move to a broadcast type system where you have some sort of UDP broadcast where you simply tell everyone about all the movements, and a alternative negative-ack channel where clients can ask for missed broadcasts. That way only O(N) messages are sent (the magic happens in the network devices). The issue with that is the internet isn't great for packet loss or multicast routing, so if you have O(N) resend requests you're not much better off. If you had all your clients in one datacenter it might work :)
(Disclaimer: I'm on the Paper team, and Paper is the org under which Folia sits)
I didn't watch the stream because I was offline when it happened, but I just want to note that the test was run by someone not on the Paper team who got tubbo to stream it, which is how they got so many players. It can be tricky to get enough players for such a large test, so it made sense for cubxity to pair up with someone to get more players.
Twitch can get pretty funky, especially twitch chat, so hopefully there's nothing "bad" in that video, though.
> During the period when 1,000 players were online, we reached a maximum of ~7.9GB/s heap allocation and our GC was hovering around 2-3GB/s when averaged over a minute.
As someone who dealt with soft-realtime telephony stuff, this makes me want to scream in horror. It seems like the platform really hurts the performance here. In an average second with that many players, most (all?) of them will not do any action apart from changing their position. (Just moving around would ideally only use preallocated per-player space only; extra in-game events, npc AI, etc. could still use some allocs) It's great Folia can get more out of the bad situation, but... oofff.
(not a criticism of Minecraft by the way, it's better to be popular and inefficient than not exist - I just didn't realise quite how much overhead there is due to the runtime)