

PostgreSQL/FreeBSD performance and scalability on a 40-core machine [pdf] - adamnemecek
https://www.kib.kiev.ua/kib/pgsql_perf_v2.0.pdf

======
emaste
This is an updated version of the report originally discussed in
[https://news.ycombinator.com/item?id=7956796](https://news.ycombinator.com/item?id=7956796)

Some memory debugging options were inadvertently left enabled in the original
investigation. This report covers the testing redone with the diagnostics
disabled.

------
yanowitz
This paper is a tour de force. Well worth the read.

Two things I wonder about:

1\. Is the extra overhead of the fast path worth incurring on every call for
most use cases or should you only stall with a Postgres system?

2\. What were the false starts in his analysis? How did he bail from them?
What are the insurmountable bottlenecks?

~~~
toast0
I haven't looked at the patches, but it seems unlikely that the checks for the
fast path will add significant overhead to calls that aren't on the fast path.
The fast path work should apply to other programs that use shared memory in
the same fashion: large shared memory allocations, without a backing store,
accessed by many processes across many processors.

The lock reduction work in the buffer cache and page queues should apply more
generally.

------
_delirium
Section 6 is an interesting incidental discovery of a performance regression
in the scheduler due to the gcc->llvm switch (a missed inlining opportunity),
which when fixed should provide a speedup on quite a bit of high-thread-count
code even if it doesn't otherwise hit any of the performance problems postgres
had.

------
alberth
A comparison of Linux & DragonflyBSD to FreeBSD [1]

I'm a huge user and fan of FrerBSD.

Given the remarkably poor performance of FreeBSD, I hope the OP link and the
link below bring focused attention to freebsd to resolve these bottlenecks.

[1]
[http://lists.dragonflybsd.org/pipermail/users/attachments/20...](http://lists.dragonflybsd.org/pipermail/users/attachments/20140310/4250b961/attachment-0002.pdf)

~~~
emaste
This investigation starts from the tested in the DragonFly report. You can see
the shape of the red line in the graph on page 13 ("stock") is equivalent to
the DragonFly report.

~~~
profquail
The graph from this report and the DragonFly report are a bit different in
their shapes (and the number used to produce them). For example, look at the
dropoff in TPS from 10 threads to 15 threads for the "stock" kernel in both
reports -- the dropoff in this report is much more significant. My guess is
that it's because DFly's report used a 20-core machine running 40 threads,
versus this report which used a 40-core machine running 40 threads; however,
their machine is using Ivy Bridge processors vs. the Westmere processors in
the FreeBSD machine.

------
qwerta
I got 24 core machine from Ebay for very reasonable price. I learned a lot and
improved my product significantly, money well spend.

~~~
marktangotango
Interesting, would you care to elaborate on the machine and what you learned?

~~~
qwerta
HP Proliant 585 G2 from 2007. It has 4x6 AMD Opterons 8xxx with DDR2 etc. I
guess it is no longer cost effective to run those 24/7, but they are still
very powerful.

Old CPUs without power savings features are great for testing multi-threaded
applications. Modern CPU might decide to switch-off some parts during test,
rendering result inconclusive.

This beasts power requirement looks scary, but with some care it runs in
ordinary house. I use separate AC circuit usually used for air-conditioning.
It is very loud (like hoover). I usually run it about 30 minutes a day.

I work on concurrent Java stuff. Fork-join, CAS, low latency databases etc. I
mostly use this machine to verify things I read somewhere. I found that about
60% of things I read about concurrency is wrong. For example messaging
(actors) are not really that great compared to well designed threads with
locks & CAS.

Right now I want to replace memory allocator with global lock (linearly scales
up to 6 cores) with CAS version, which should scale up to 16 cores.

~~~
_delirium
Interesting way of finding something to do with those! I've also been
intrigued by the old Proliants on eBay, because the prices seem quite cheap
initially: $250-400 for massively multicpu things with tons of RAM. But I've
been scared off each time by a mixture of the weight and power usage. One
listing even specified that delivery would be made to your loading dock (sadly
I do not have a loading dock), although that was for a big Itanium machine.

------
Erwin

       > 3 Samsung SSD 840 in single-volume RAID-1 configuration.
    

A 3-way mirrored RAID? That seems like an odd setup, especially for a
benchmark. I guess (?) reads could be faster with an additional drive, but the
system had 1 TB of memory so would pgbench create a working set beyond that
anyway?

~~~
olavgg
I've been using 3 mirrored drives for the last few years because I want extra
protection.

Think about it, one drive suddenly dies and you need to resilver/resync a new
drive. What if there are some bad blocks on the existing drive that's left?

With ZFS and 3 mirrored drives, these bad blocks would most likely be healed
during the resilver operation.

~~~
taneliv
Aren't bad blocks detected during scrubbing? I assumed scrubbing every two
weeks (as I do) should reveal read problems before any drives fail, at least
in the common case. Maybe there's something important I'm missing?

~~~
insaneirish
No, you're not missing anything. Scrubbing catches it all. Seems your parent
is concerned about errors between last scrub and a failure.

This is why I have a backup system that I zfs send to.

