PostgreSQL/FreeBSD performance and scalability on a 40-core machine [pdf]

emaste · on Aug 2, 2014

This is an updated version of the report originally discussed in https://news.ycombinator.com/item?id=7956796

Some memory debugging options were inadvertently left enabled in the original investigation. This report covers the testing redone with the diagnostics disabled.

yanowitz · on Aug 2, 2014

This paper is a tour de force. Well worth the read.

Two things I wonder about:

1. Is the extra overhead of the fast path worth incurring on every call for most use cases or should you only stall with a Postgres system?

2. What were the false starts in his analysis? How did he bail from them? What are the insurmountable bottlenecks?

toast0 · on Aug 2, 2014

I haven't looked at the patches, but it seems unlikely that the checks for the fast path will add significant overhead to calls that aren't on the fast path. The fast path work should apply to other programs that use shared memory in the same fashion: large shared memory allocations, without a backing store, accessed by many processes across many processors.

The lock reduction work in the buffer cache and page queues should apply more generally.

_delirium · on Aug 2, 2014

Section 6 is an interesting incidental discovery of a performance regression in the scheduler due to the gcc->llvm switch (a missed inlining opportunity), which when fixed should provide a speedup on quite a bit of high-thread-count code even if it doesn't otherwise hit any of the performance problems postgres had.

alberth · on Aug 2, 2014

A comparison of Linux & DragonflyBSD to FreeBSD [1]

I'm a huge user and fan of FrerBSD.

Given the remarkably poor performance of FreeBSD, I hope the OP link and the link below bring focused attention to freebsd to resolve these bottlenecks.

[1] http://lists.dragonflybsd.org/pipermail/users/attachments/20...

emaste · on Aug 2, 2014

This investigation starts from the tested in the DragonFly report. You can see the shape of the red line in the graph on page 13 ("stock") is equivalent to the DragonFly report.

profquail · on Aug 2, 2014

The graph from this report and the DragonFly report are a bit different in their shapes (and the number used to produce them). For example, look at the dropoff in TPS from 10 threads to 15 threads for the "stock" kernel in both reports -- the dropoff in this report is much more significant. My guess is that it's because DFly's report used a 20-core machine running 40 threads, versus this report which used a 40-core machine running 40 threads; however, their machine is using Ivy Bridge processors vs. the Westmere processors in the FreeBSD machine.

qwerta · on Aug 2, 2014

I got 24 core machine from Ebay for very reasonable price. I learned a lot and improved my product significantly, money well spend.

marktangotango · on Aug 2, 2014

Interesting, would you care to elaborate on the machine and what you learned?

qwerta · on Aug 2, 2014

HP Proliant 585 G2 from 2007. It has 4x6 AMD Opterons 8xxx with DDR2 etc. I guess it is no longer cost effective to run those 24/7, but they are still very powerful.

Old CPUs without power savings features are great for testing multi-threaded applications. Modern CPU might decide to switch-off some parts during test, rendering result inconclusive.

This beasts power requirement looks scary, but with some care it runs in ordinary house. I use separate AC circuit usually used for air-conditioning. It is very loud (like hoover). I usually run it about 30 minutes a day.

I work on concurrent Java stuff. Fork-join, CAS, low latency databases etc. I mostly use this machine to verify things I read somewhere. I found that about 60% of things I read about concurrency is wrong. For example messaging (actors) are not really that great compared to well designed threads with locks & CAS.

Right now I want to replace memory allocator with global lock (linearly scales up to 6 cores) with CAS version, which should scale up to 16 cores.

_delirium · on Aug 2, 2014

Interesting way of finding something to do with those! I've also been intrigued by the old Proliants on eBay, because the prices seem quite cheap initially: $250-400 for massively multicpu things with tons of RAM. But I've been scared off each time by a mixture of the weight and power usage. One listing even specified that delivery would be made to your loading dock (sadly I do not have a loading dock), although that was for a big Itanium machine.

adamnemecek · on Aug 2, 2014

Do you have a blog about this? You should.

Erwin · on Aug 2, 2014

   > 3 Samsung SSD 840 in single-volume RAID-1 configuration.

A 3-way mirrored RAID? That seems like an odd setup, especially for a benchmark. I guess (?) reads could be faster with an additional drive, but the system had 1 TB of memory so would pgbench create a working set beyond that anyway?

olavgg · on Aug 2, 2014

I've been using 3 mirrored drives for the last few years because I want extra protection.

Think about it, one drive suddenly dies and you need to resilver/resync a new drive. What if there are some bad blocks on the existing drive that's left?

With ZFS and 3 mirrored drives, these bad blocks would most likely be healed during the resilver operation.

taneliv · on Aug 2, 2014

Aren't bad blocks detected during scrubbing? I assumed scrubbing every two weeks (as I do) should reveal read problems before any drives fail, at least in the common case. Maybe there's something important I'm missing?

insaneirish · on Aug 2, 2014

No, you're not missing anything. Scrubbing catches it all. Seems your parent is concerned about errors between last scrub and a failure.

This is why I have a backup system that I zfs send to.