
The SQL layer in CockroachDB - ivank
https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/sql.md
======
anarazel
A lot of that looks quite familiar from working on postgres. Partially that's
because it's "just the right way" to do some things, but I also wonder if
there's some more looking at pg than just reusing the wire protocol?

PS: Should any cockroach contributors have opinions on limitations & evolution
of the protocol: I'd be interested.

~~~
mjibson
I work at CockroachDB on the implementation of the Postgres protocol, and I'm
a maintainer of lib/pq, a Go client implementation of the protocol.

Overall I think it is a very good protocol. I can't come up with any gripes I
have with it, nor do I remember any "it would be nice if" moments when adding
support for some of the various parts of it. There's certainly some difficulty
in some of the more complicated parts, like managing all of the type hinting
back and forth for prepared statements. But in general I found the
documentation to be complete and well done.

The one thing that we did have trouble with was figuring out the exact formats
for all types when using the binary encoding format. We wrote some code to
sniff real Postgres servers so we could figure out what was going on. It would
have been easier, obviously, if it was documented, but this wasn't a huge deal
since it is likely a rarely needed thing.

~~~
anarazel
> I work at CockroachDB on the implementation of the Postgres protocol, and
> I'm a maintainer of lib/pq, a Go client implementation of the protocol.

Thanks for replying!

> Overall I think it is a very good protocol. I can't come up with any gripes
> I have with it, nor do I remember any "it would be nice if" moments when
> adding support for some of the various parts of it. There's certainly some
> difficulty in some of the more complicated parts, like managing all of the
> type hinting back and forth for prepared statements. But in general I found
> the documentation to be complete and well done.

Cool. I've more complaints than you in that case :) Although drivers,
especially libpq (no pipelining, no mixed binary/text results), are more
frequently an issue.

One thing I was wondering whether you might also be interested is something
like 'commit identifiers' (WAL LSNs in pg's case) for slightly relaxed
consistency models. That allows things like committing on one node, and when
later doing queries on replicas specify that data needs to have replicated at
least up to that identifier.

> The one thing that we did have trouble with was figuring out the exact
> formats for all types when using the binary encoding format. We wrote some
> code to sniff real Postgres servers so we could figure out what was going
> on. It would have been easier, obviously, if it was documented, but this
> wasn't a huge deal since it is likely a rarely needed thing.

Yea, I think that's a fair complaint. I wasn't around yet when the current
binary format was devised (2003ish IIRC?), I don't know how we ended up
deciding that code was the documentation for that.

~~~
mjibson
> Although drivers, especially libpq (no pipelining, no mixed binary/text
> results), are more frequently an issue.

The Go lib/pq driver is some pretty old Go code and has numerous similar
problems. We try to keep it moving along but it's got some stuff that makes it
hard to work on.

> One thing I was wondering whether you might also be interested is something
> like 'commit identifiers' (WAL LSNs in pg's case) for slightly relaxed
> consistency models. That allows things like committing on one node, and when
> later doing queries on replicas specify that data needs to have replicated
> at least up to that identifier.

CockroachDB is always serializable. It is linerizable on individual keys.
We've discussed having a causality token before that could be passed to other
transactions or connections that I think would allow for more linerizable
guarantees. If I'm understanding your question correctly then yes, I think
these would be very useful to us. (I've asked someone more familiar with our
consistency model to address this more because I'm not confident in the
accuracy of my reply.)

~~~
bdarnell
(Cockroach Labs CTO) Yes, the commit identifiers sound like something we could
use, as long as they're not tied too closely to pg's WAL LSNs. For us, the
token would be a 96-bit hybrid logical timestamp, and we'd want semantics
roughly similar to HTTP cookies (the server sends it back with the response,
and then that client will include it on any future requests. With our current
implementation the client would only need to send the token on new
connections, but including room for it on a per-request basis seems like a
good idea).

Also on the subject of the protocol, we've run into a need for a (server-
initiated) ping message
([https://github.com/cockroachdb/cockroach/pull/10188](https://github.com/cockroachdb/cockroach/pull/10188)).

~~~
anarazel
> (Cockroach Labs CTO) Yes, the commit identifiers sound like something we
> could use, as long as they're not tied too closely to pg's WAL LSNs.

There was some discussion around this before, and the idea was basically to
just include some text-payload after COMMIT. PG would produce something like
"COMMIT 343/3f0037a0", but the spec for would just be text. Some drivers might
do something with the knowledge that it looks like an LSN, but that's hard to
avoid.

> Also on the subject of the protocol, we've run into a need for a (server-
> initiated) ping message
> ([https://github.com/cockroachdb/cockroach/pull/10188](https://github.com/cockroachdb/cockroach/pull/10188)).

Maybe I'm missing something here (I admittedly only skimmed ticket and things
it references), but shouldn't server triggered tcp keepalives be sufficient
for this case? In postgres that's configured using
tcp_keepalives_idle/tcp_keepalives_interval/tcp_keepalives_count. Several
drivers (including at least libpq and pgjdbc) can do the same to protect
against vanishing servers.

Doing keepalives on the pg wire protocol level has some disadvantages, but
also some issues. Namely you've to be ready to send ping/pongs forth/back in
almost any protocol state. Not necessarily easy if there's partial writes and
such.

~~~
bdarnell
Ah, right. We did turn on TCP keepalives so we probably don't need a protocol-
level ping (although we often deploy behind haproxy, so tcp keepalives need to
be configured at each hop)

~~~
anarazel
Similarly, tcp level keepalive not necessarily sufficient, if there's
intermediate "pgwire" protocol level connection poolers like pgbouncer. Nor do
they protect against applications that are effectively hung somewhere.

So I do think "pgwrite" level pings make some sense, it's just not exactly
straightforward to add them :/

------
AdamJacobMuller
This is awesome.

I was looking at using/extracting their pgwire implementation to provide an
SQL-like interface to the in-memory state of a particular application we use,
this document will make it way way easier.

~~~
threeseed
You can also use Calcite if you're on the JVM:

[https://calcite.apache.org](https://calcite.apache.org)

~~~
AdamJacobMuller
That's much nicer than hacking stuff out, unfortunately/fortunately (not sure
anymore) I am not a java programmer, this particular app is in go.

------
Steeeve
I always find that when building architecture documentation after the fact,
you want to re-factor or at least re-engineer a lot of the code base to
account for areas where lines are crossed or decisions were made that might be
performant and functional but lead to poor maintainability or opacity to all
but the most involved project participants.

It would be great to see this document evolve into a guideline rather than a
"this is how most of the code works today" document.

------
oregontechninja
The name isn't a problem, it's smart and memorable.

~~~
atomical
I strongly disagree. It conjures up thoughts that are unpleasant. That isn't a
good marketing strategy.

~~~
Karunamon
Most of the bad things I can think up of about cockroaches are good in the
context of databases. Replication, being hard to kill, being around for ages,
collective decision-making..

As someone downthread said, it's not a consumer product and doesn't need to be
marketed like one.

------
josephg
I've got a little transactional KV store in the wings and have been thinking
about writing a Go frontend for it. I'd love full SQL support, but there's no
way I'm going to implement my own query optimizer. How hard would it be to
repurpose this layer to back onto a generic sorted KV store?

It this deeply tied in to the rest of cockroachdb's architecture, or does it
talk through a simple generic KV store interface?

~~~
irfansharif
while there is an explicit delineation between the internal `kv` package[1]
and the overarching `sql` one[2], your specific concern regarding a query
optimizer is not strictly an isolated component at this moment in time. query
optimizations, last I checked, was not an explicit pre-processing stage but
rather done so when constructing the AST representation of the query with each
node being an `iterator` as described in the Volcano Evaluation System
paper[3]. there was some work underway to move towards an intermediate
representation[4] to go through the standard query optimizations, no context
into how far along this effort is.

[1]:
[https://github.com/cockroachdb/cockroach/tree/master/pkg/kv](https://github.com/cockroachdb/cockroach/tree/master/pkg/kv)

[2]:
[https://github.com/cockroachdb/cockroach/tree/master/pkg/sql](https://github.com/cockroachdb/cockroach/tree/master/pkg/sql)

[3]:
[https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b...](https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf)

[4]:
[https://github.com/cockroachdb/cockroach/pull/10055](https://github.com/cockroachdb/cockroach/pull/10055)

~~~
knz42
I wrote the rfc for this "effort": there will be an IR in CockroachDB, just
not right now. It's a lot of work and we're not going to do this in one go.

------
jjirsa
Distributed DBs are hard; bringing new contributors up to speed is really
hard. They'll be able to hire some, but finding random OSS contributors is
HARD. Very hard.

------
wolf550e
Their number 1 problem is the name.

~~~
systems
they should have named it after a fruit ... like mangodb or something

~~~
Rapzid
Mango IS tasty. Much more tasty than cockroach IMHO. Still though, I doubt the
name is much of an issue.

~~~
user5994461
I just had a horrible vision of mango and cockroach together :(

You guys are ruining my lunch time.

