
Heap WARM Tuples – Design Draft - ahachete
https://www.postgresql.org/message-id/CABOikdMop5Rb_RnS2xFdAXMZGSqcJ-P-BY2ruMd%2BbuUkJ4iDPw%40mail.gmail.com
======
rch
Looks like a solid approach without much downside, except for some added
complexity. Without the high profile use case I can see how time and attention
might be better spent elsewhere, but this seems like a good idea at present.

~~~
gamache
I think it makes a bunch of sense. As described in the earlier Postgres
discussion around Uber's problem, it's not all that weird a use case -- just a
difficult one.

Summary: a table with around 50,000 rows that gets joined to other tables a
lot, with one index per column, and that table receives ~500 writes a second,
causing a "write amplification" because all the indexes need updating
constantly.

It's not hard to imagine lots of other workloads involving patterns like that.
Postgres has most simple use cases solved, so IMHO improving complex patterns
like this is good for Postgres' future.

------
andy_ppp
God, I wish I received such a thorough brief before I have to implement
features at work :-/

~~~
was_boring
At my job if a code change is expected to take more than a day, you have to
write a spec for the changes and have someone else review it.

This helps cut down on re-work from misunderstandings, as well as provides a
historical record of why something changed.

~~~
ajsharma
What kind of development do you do and how old/big is the codebase?

------
tener
Man, testing database system sure must be very hard.

------
thewonderer
I wonder how much time such a proposal would take to implement in postgres.

~~~
anarazel
A quick and dirty implementation of this isn't hard, it 's weighing the long
term consequences and weird edge cases that make it hard.

~~~
thewonderer
So you're saying that you don't know either.

~~~
pgaddict
Well, it depends on what you mean by "implementing".

I believe we can get the code working in a few days, and there are probably
people already working on a PoC of this (or one of the alternative proposals).
So if this is what you mean by implementing, the answer is "days".

But as anarazel is pointing out, this is going to touch a fairly critical part
of the database - it interacts with storage, MVCC and likely various other
things (e.g. various index optimizations like Index Only Scans).

Moreover there are other proposals (I'm aware of 3 or 4, and I might have
missed some), so the question inevitably will be - which of the proposals is
the best compromise?

So I expect a lengthy discussions on pgsql-hackers, a lot of testing,
benchmarking etc.

But all this does not really matter that much - there's no chance this could
get into current releases, so the earliest release it can get into is 10,
which means code freeze likely sometime in April/May 2017.

------
ungzd
Implemented especially for cab companies to use Postgres as key-value store.

~~~
lucian1900
Not really.

Another similar use-case is game sessions. Membership within a session changes
constantly and so do the sessions themselves. They are also both deleted
often.

If you run pg_dump on the same instance, it will keep long transactions open,
which will cause VACUUM to struggle.

~~~
ahachete
That's true, but why would you want to run pg_dump on the instance? Use
pg_receivexlog and friends to have a physical backup and pg_recvlogical for
logical backups and you avoid the problem with pg_dump (you're alone with
other long-running tx, but it may help).

~~~
koolba
Regardless of other forms of backups it's useful to have a logical pg_dump
based one as well. When shit hits the proverbial fan its nice to have a
totally separate backup path that, even if it's not as up to date as a
streaming physical backup, can be easily restored.

Biggest issues I've heard are either the long running transaction (mitigated a
bit by running on a slave), cache busting effect (reading all your data
thrashes your OS disk cache), and how painfully long it can take (good luck
running pg_dump on a multi TB database). As a backup to a more "live" backup,
it's nice to have though. Plus it's dead simple to restore a DB via pg_restore
in a lower environment for testing.

~~~
ahachete
Size does matter. And there are databases which you practically can't backup
logically (as you mentioned). So IMHO only applies to small databases in which
case, sure, a pg_dump is always handy. But not in the rest of the cases.

