Question for those who have used monster servers before: Can PostgreSQL/MySQL us...

chucky_z · on May 19, 2016

Postgres scales great up to 256gb, at least with 9.4. After that it'll use it, but there's no real benefit. I don't know about MySQL. SQL Server scales linearly with memory even up to and past the 1TB point. I did encounter some NUMA node spanning speed issues, but numactl tuning fixed that.

I setup a handful of pgsql and Windows servers around this size. SQL Server at the time scaled better with memory. Pgsql never really got faster after a certain point, but with a lot of cores it handled tons of connections gracefully.

anarazel · on May 19, 2016

I've very successfully used shared buffers of 2TB, without a lot of problems. You better enable huge pages, but that's a common optimization.

alfalfasprout · on May 18, 2016

I don't work on 2TB+ memory servers, but one of my servers is close to 1TB of RAM.

PostgreSQL scales nicely here. Main thing you're getting is a huge disk cache. Makes repeated queries nice and fast. Still I/O bound to some extent though.

Redis will scale nicely as well. But it won't be I/O bound.

Honestly, if you really need 1TB+ it's usually going to be for numerically intensive code. This kind of code is generally written to be highly vectorizable so the hardware prefetcher will usually mask memory access latency and you get massive speedups by having your entire dataset in memory. Algorithms that can memoize heavily also benefit greatly.

adwf · on May 18, 2016

I've used Postgres out to the terabyte+ range with no probs, so it all works fine. Of course, whenever you approach huge data sizes like this, it tends to change how you access the data a little. eg. Do more threads equal more user connections, or more parallel computation? Generally though, databases aren't really hindered by CPU, instead by the amount of memory in the machine and this new instance is huge.

No idea about MySQL, people tend to scale that out rather than up.

jfindley · on May 19, 2016

For MySQL, it depends a bit what you're hoping to get out of scaling.

Scaling for performance reasons: Past a certain point, many workloads become difficult to scale due to limitations in the database process scheduler and various internals such as auto increment implementation and locking strategy. As you scale up, it's common to spend increasing percentages of your time sitting on a spinlock, with the result that diminishing returns start to kick in pretty hard.

Scaling for dataset size reasons: Still a bit complex, but generally more successful. For example, to avoid various nasty effects from having to handle IO operations on very large files, you need to start splitting your tables out into multiple files, and the sharding key for that can be hard to get right. But MySQL

In short, it's not impossible, but you need to be very careful with your schema and query design. In practice, this rarely happens because it's usually cheaper (in terms of engineering effort) to scale out rather than up.