
Odyssey: Scalable PostgreSQL Connection Pooler - bsg75
https://github.com/yandex/odyssey
======
petercooper
A quick meta BTW, but if you want to keep up with Postgres stuff, we have a
newsletter (and covered this Odyssey release earlier today):
[https://postgresweekly.com/issues/335](https://postgresweekly.com/issues/335)
(there is RSS, etc.)

~~~
roebk
thanks petey

------
derefr
If we are able to control/modify the protocol at both ends, not just in the
middle, would this still be the optimal solution?

I've been thinking for a while that connection-oriented DBMS protocols seem
like a good candidate for being carried as flows over a single-socket
multiplexed protocol, like HTTP/2-over-TCP is today. (I say "over TCP" because
connections between app servers and databases are long-lived and stable, so
there isn't really the same advantage in replacing TCP with something like
QUIC there.)

Also, such a protocol, by removing the direct "connection = session"
association, would remove the ability of DBMSes to key their session state by
socket ID, and force them to refactor to more explicit session IDs. This would
mean that there'd be an opportunity to introduce another intermediate
abstraction layer during the refactoring, keeping "session" separate from
"flow", such that the DBMS could have multiple concurrent MVCC
transactions/worldstates as individual _flows_ , all referencing the same
ancestor MVCC worldstate (with the same config vars set, temporary objects
created + populated, etc.) as an explicit stateful _session_ that the flow
would be "opened against." Which would be pretty awesome, and would also
benefit non-socket-multiplexed flows (e.g. pgbouncer's per-transaction routing
mode would no longer have non-SQL-conformant semantics.)

~~~
atombender
HTTP2 could be great, but how would you implement this with Postgres, which
uses one process per connection? You'd have to do all the multiplexing in a
single server process, but you'd still have to have one process per underlying
session, I think, and so you get the same problem with per-connection overhead
that a pool tries to solve.

~~~
derefr
One of the advantages of a multiplexed protocol, is that it would reduce the
need for each client to keep around a pool of N open+idle connections, just in
case the client needs to abruptly make a low-latency query on a new session in
parallel to its existing ones.

Part of the reason abrupt scheduling of low-latency parallel queries requires
clients to keep a pool of open+idle connections, is that the PG backend takes
a moment to fork(2) out a process to serve your connection, and nothing about
multiplexing eliminates that cost; but another part of the reason, is that it
takes the TCP socket a moment to establish, and takes the TCP window even
longer than a moment to scale. (And, intriguingly, open+idle separate-TCP-
socket flows scale their TCP windows _down_ while idle, so current connection-
pooling strategies aren't even getting 100% of the possible benefit.)

Decreasing the costs of flow establishment would decrease the number of
open+idle flows clients keep around, which means the backend host would be
free to do other things with its freed-up resources.

In combination with the fact that sessions would be "grouped" to a connection,
such that long-term metrics could be captured for the behavior of a client (=
one connection) as a whole, you could design an alternative "elastic
scheduler" for session backends, without needing to change the forking model.
E.g. keep a number of _prefork_ idle backend processes around, associated with
each connection, where the size of the prefork pool varies dynamically per
connection with the observed velocity of the client's opening of new sessions
on that connection. (So, for a client that does everything under one session,
you'd have zero observed session-open events per unit time, and thus have zero
prefork processes waiting around for that client.)

------
tuldia
Scalable in which sense?

Assuming that this is something like pgbouncer or pgpool that sit between the
client and the database, and that you have a limit of connections with the
database as well as the number of client connections you can keep, what value
does this adds compared with those other (more mature, battle tested, included
in major distros) projects?

~~~
x4m
Pgbouncer is single-threaded. If you have 80-core box it is a huge waste to
terminate your SSL in single-threaded proxy pooler.

~~~
takeda
Starting from version 1.12.0 you can actually run multiple pgbouncer processes
listening on the same port, that eliminates some of the issues.

~~~
x4m
Yes, we use a cascade of PgBouncers. But it's hard to maintain, actually.
Also, PgBouncer was not actively developed for years. I consider new PgBouncer
features (port reuse, SCRAM-SHA-256) accomplishment of Odyssey to some (small)
extent :)

~~~
takeda
Not sure what you mean that it wasn't developed for years.

They had releases at least once a year (only in 2018 they had one release).

~~~
x4m
This year they had 3, and the year is not over yet :)

------
tyingq
Curious why this isn't a priority for the Postgres team. Not my area of
expertise, and even I know about pgbouncer. Now another tool, so it seems like
there's demand for it.

~~~
craigkerstiens
There is definitely a demand for it, there are really three problems that
exist with connections in Postgres.

1\. The initial time to establish a connection is higher than would be ideal

2\. Each connection, even an idle one, has an overhead of roughly 10 MB. For
idle connections this is pretty much wasted space and most application
frameworks grab a pool of connections accumulating quite a few idle
connections.

3\. The max number of connections you can push the system to is somewhere in
hundreds to low thousand. Yes, it is possible to push PG further but it is not
trivial.

But, the sad reality is it isn't a trivial quick fix. There are a lot of
foundational issues with connection that have to be fixed. There are some
folks and efforts in the community working towards this, and in my opinion it
is the single biggest improvement that could happen in Postgres over the next
2-3 years, so hopefully it will improve directly in core over time.

~~~
majewsky
Without broken formatting:

> 1\. The initial time to establish a connection is higher than would be ideal

> 2\. Each connection, even an idle one, has an overhead of roughly 10 MB. For
> idle connections this is pretty much wasted space and most application
> frameworks grab a pool of connections accumulating quite a few idle
> connections.

> 3\. The max number of connections you can push the system to is somewhere in
> hundreds to low thousand. Yes, it is possible to push PG further but it is
> not trivial.

------
hinkley
> Odyssey has sophisticated asynchronous multi-threaded architecture which is
> driven by custom made coroutine engine: machinarium

Let me find my space helmet. The voice in my head is crying 'architectural
astronaut'.

We are forgetting again to ask the important question of "what else did you
try, and why didn't it work so you had to write your own?" I think I get the
'why' for pooling. I know a guy at work who struggles a bit with query rate
(query speed is great... until it hits the pool limits). I don't know the
'why' for the technology it's built on top of. Lacking that, I see someone
wanting to write machinarium and then building something on top of it.

Which doesn't sound like a bad thing at all, until you find a feature the
'thing on top of it' gets wrong which is consistent with the thing the primary
authors are most invested in. You can't have that feature because machinarium
doesn't want you to have it. Close ticket.

Form always triumphs over function when the architect is invested in their
solution and not the user. Making a library that embodies that solution, at
least in my experience, vastly amplifies that investment. That also
_telegraphs_ the investment, which if you know to look for can tell you 'buyer
beware'. Buyer beware.

Machinarium sounds a lot like libuv:
[https://github.com/libuv/libuv/wiki/Projects-that-use-
libuv](https://github.com/libuv/libuv/wiki/Projects-that-use-libuv)

Have I got that wrong?

~~~
SergeAx
Disclaimer: I worked at Yandex in 2006-2007.

Yandex is a very BIG company in terms of users, requests, data stored etc.
It's surely bigger than Twitter, I believe it is bigger than Netflix. It is
also algorithms company (like "not content company") So, when Yandex doing
something, is mostly doing it because all other options were failed on their
load.

Yandex also has a very extensive expertise in C/C++. There is a ClickHouse,
there is a CatBoost. I would trust them in their domain.

~~~
PeterZaitsev
I respect Yandex a lot. They are fantastic at doing 80% of what Google, Uber,
Amazon (combined) do with like 10% of resources - people and otherwise. So
their software tends to be super efficient.

~~~
pepemon
They are fantastic at loosing their best talent to the western counterparts.
Nothing fantastic at all for a big company within other reaches.

~~~
SergeAx
It's not the Yandex' fault, it is a political situation in Russia. IT skills
is the best ticket out for honest people today, can't blame them.

------
duelingjello
For scaling (LB) reads + HA, HAproxy + built-in replication to hot standbys is
much simpler. You can use keepalived to promote a new master automatically if
it dies. HAproxy can also give you two different ports on the vIP, one for
read-only and one for read-write that automatically find which server is the
master and which servers are replicas, using periodic banner testing. It's
free, doesn't depend on someone coding a Postgres statement proxy and getting
every corner-case just right, and lower latency. I suggest using LVS too so
the vIP can move around if a box dies.

For scaling (LB) read&writes + HA, consider postgres-xl instead. It's FOSS w/
optional commercial support.

If you have crazy OLAP or metrics data, consider Greenplum. Freemium FOSS.

------
ralusek
Is this an alternative to PGBouncer, and if so, how do they compare?

~~~
craigkerstiens
This would be an alternative to pgbouncer, this github issue has a rundown of
some of the differences -
[https://github.com/yandex/odyssey/issues/3](https://github.com/yandex/odyssey/issues/3)

------
haggy
Are there benchmarks comparing this to Hikari? What is the benefit to using
this over Hikari?

~~~
clhodapp
This is an external pool, rather than an internal one. That is to say that it
runs as a separate process that re-implements the Postgres network protocol to
accept incoming connections. The use-case is that it allows you to share one
pool across multiple application processes. Depending on your architecture, it
can actually make sense to use _both_ an internal pool to reduce connection
startup latency and an external pool to reduce your total number of
connections.

------
Tomdarkness
Looks cool. Maybe I'm missing something though, but it's not really clear how
you'd set this up? Is there a quick start or some examples of configs for
different situations?

~~~
thedevelopnik
Yeah there’s detailed info on the contents of the config file, but not where
to put said file, or how to tell the binary where to find and reference it.
This looks like an exciting project that needs more externally-facing docs. A
lot of knowledge is assumed at the moment.

------
Demiurge
How does this compare to pgBouncer?

~~~
arcticfox
This is definitely my top question coming in here. I know what pgbouncer can
do for me, and it's pretty good, is there a good reason to look at this
instead?

~~~
x4m
It's multithreaded PgBouncer.

~~~
takeda
pgbouncer is event based for a reason, they recently added an option to run
multiple processes listening on the same socket so that allows it to utilize
all available cores.

~~~
x4m
This is cool. And we expect them to add more and more features if Odyssey will
gain traction. It's about the competition. Finally, it should create enough
pressure to make builtin pooler happen.

------
PeterZaitsev
This looks very cool!

------
ppande
Just curious to see if anyone knows how it compares to using Apache Drill as
an external scalable process to Postgres Jdbc store?

------
castorp
Can it do load balancing and failover?

~~~
adontz
No fail-over supported so far.

