Per server you mean like in a game room or is one game room equal to one linux box? If so, I guess then that handling the game logic was the bottleneck, not the number of concurrent connections?
Also, congrats on the success and making some really cool games.
Per game room (each room is a process). I end up just using boxes that have 1 CPU core and run just that game room in there. Except for some dedicated servers that have 40+ cores, in which we run 40+ processes.
On Agar.io doing all the collision checking and encoding the packets is the biggest bottleneck. Similarly for Diep.io. Number of players of course increases those two factors almost linearly. For example, Diep.io doesn't process shapes that aren't being transmitted to anyone.
I was inspired by your games to try something similar for the latest Ludum Dare: http://www.bemmu.com/compo/ludum/37/index.html
At first I tried checking every creature for collisions against everything else, but unsurprisingly that was too slow (N^2). To reduce the checks I put each creature in a grid cell based on their position, then check for collisions only against creatures in the same or adjacent cells.
I think overlapping grids would be even more efficient, or perhaps to do these checks on GPU.