
Local and distributed query processing in CockroachDB - benesch
https://www.cockroachlabs.com/blog/local-and-distributed-processing-in-cockroachdb/#
======
sixdimensional
Nice write up and discussion. I like the approach being taken by
CockroachLabs, it feels a lot like how open RethinkDB was about their
development - very pragmatic. What is being done here (distributed SQL engine)
is a complex problem, and I for one welcome more open implementations and
people working on the problem.

------
marknadal
I had the pleasure of chatting with Spencer the other day, great guy. We're
still very opposite in database tradeoffs that we believe in, but it joys me
that at least both ends of the spectrum are covered in the OSS community. As
I've expressed before, I don't think Master-Slave globally strongly consistent
databases are the direction the future is headed, but at least if I'm wrong ;)
we'll have Spencer & Co(ckroach) to save the day for Open Source (hearing his
vision and emphasis on OSS was very refreshing and affirming too, especially
after the last year of database announcements/failures/crippleware).

So they have definitely won my heart over, although I'll still make critiques
where appropriate. This particular article was very well done, thoughtful, and
insightful. So thank you! Being Postgres wire compatible is a daunting task
though, one that to me seems unnecessary (we're implementing SQL on top of our
decentralized graph database, but not at the wire level). But it once again
showcases our polar opposite views. Obviously, their extra effort will result
in remarkably better SQL compatibility, performance, and experience. So they
are the hands up winner, but I'm curious to see the extent of full SQL use
(versus approximations) in the industry over the next decade.

Congrats guys, great article.

~~~
lwansbrough
pgwire compatibility is a huge benefit to us. Not all of us want to use this
year's hottest language, so it's nice when we can use an existing library and
immediately get to work. I would strongly recommend anyone who is making a
database from scratch to either: write libraries for every popular language so
people can use it (don't do this) or interface with one of the many existing
protocols that has been thoroughly tested and has strong support across many
platforms (do this.)

Maybe I'm out of my depth here, but I'm not sure your comparison of
CockroachDB SQL being "wire level" whereas your SQL is "on top of a
decentralized graph" makes much sense. CockroachDB is built on a key value
store. More on that here: [https://www.cockroachlabs.com/blog/sql-in-
cockroachdb-mappin...](https://www.cockroachlabs.com/blog/sql-in-cockroachdb-
mapping-table-data-to-key-value-storage)

I suspect your technology, too, would be built on something similar? The
difference being in how you implement the "front end."

~~~
marknadal
That is a really good point. There are a lot of really incredible drivers for
different languages out there, and reusing them is a major selling point.

Right, the difference being is no SQL would actually be sent over the wire.
The SQL parsing happens on the client (so it is front end only), then it is
converted to our wire graph spec, and then sent out. So it is more SQL
emulation/approximation. Even though CockroachDB is key/value underneath, they
are actually running SQL on top. Which is why their system would always be
better than ours.

You sound really smart! If you are interested in these things, you should jump
in on your favorite DB projects, or start your own!

------
state_machine
In case anyone is interested in even more of the technical specifics, the
original design RFC might be interesting too:
[https://github.com/cockroachdb/cockroach/blob/master/docs/RF...](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/distributed_sql.md)

------
redwood
Is anyone using this software yet?

~~~
irfansharif
[cockroachdb engineer] Baidu and Heroic Labs are ones we publicly announced in
our 1.0 release[1], stay tuned for more~

[1]:
[https://www.cockroachlabs.com/blog/cockroachdb-1-0-release/](https://www.cockroachlabs.com/blog/cockroachdb-1-0-release/)

------
EGreg
I wonder, why aren't graph databases used more often? Why is neo4j relatively
alone?

It seems obvious to me that graph databases are much more parallelizable AND
more scalable, since you are essentially able to break up parts of the graph
into their own computing nodes quite easily.

The lookups are usually O(1) instead of O(log N) and instead of indexes and
table scans to do joins you literally just traverse a graph at runtime. Plus
you have more flexibility because instead of relational algebra you can
literally run any code at any poit to walk a graph.

Why aren't they supplanting relational databases despite being faster and more
parallelizable and more powerful?

~~~
felixgallo
Using the word 'literally' doesn't magically imbue speed into a system.
Traversing a graph -- how does that work in a transaction? Is it going to be
quicker than striding a packed in-memory hash?

~~~
EGreg
Simple. You store the exact pointer to related data, so you go and get it in
O(1). In a join, you have to do a O(log N) search through an index. And all
indexes usually have to be loaded into memory, to boot.

~~~
elvinyung
> Simple. You store the exact pointer

How would that work in a scale-out, distributed cluster? What is a pointer?
How do I figure out what machine an object is really located? What happens if
that machine is down? What if I want to move the object/rebalance the cluster?
How do I keep multiple copies of an object (for e.g. fault tolerance)? How do
I figure out which copy is the right one?

How do I organize the pointers? Would I use a hash table? A tree? A graph? How
would that data structure be distributed? Would every machine store a copy of
the lookup data structure, or just some specific machines? What if _those_
machines fail? How do I maintain copies? How do I keep the lookup data
structure up to date?

------
mhuffman
I swear this DB could solve all the technical problems in the world and it
will still have an image problem ... unless that is the point.

~~~
dang
"Please avoid introducing classic flamewar topics unless you have something
genuinely new to say about them."

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
atomical
Do you have a list of classic flamewar topics?

~~~
dang
No; such a list would encourage flamewars about the topics not on it and
metaflamewars about the list itself.

~~~
atomical
So basically the list is in your head? That's not very open.

~~~
dang
HN is moderated! That means humans making interpretations and judgment calls.
There's no way to make that 'open' in the sense I imagine you mean, but we try
our best to be 'open' in the sense of being clear about what we're doing and
answering questions about particular cases.

What we don't do is formalize everything, because a) that's impossible and b)
what a nightmare it would be to try.

