Hacker News new | past | comments | ask | show | jobs | submit login
How we built scalable spatial indexing in CockroachDB (cockroachlabs.com)
153 points by orangechairs on Dec 14, 2020 | hide | past | favorite | 29 comments



Blog post author here -- happy to answer any questions about the article.


Hi there! I have a question:

In the article you demonstrate how quad-trees are used for indexing spatial data, but does Cockroach DB do this in 3D as well (i.e. octree)?

I was fairly interested in PG Pointcloud [1] back when I was in grad school, but storing arbitrary 3D information seemed to present unique challenges in this space (especially with regards to indexing). Would the approach used here extend to 3D? I don't know if I see a clear reason it wouldn't, but I was wondering if there's anything in Cockroach that might make that extension difficult. I recognize you're using the S2 library here as well, which doesn't seem to have all the 3D primitives, but I'd be interested to hear your thoughts!

[1] https://github.com/pgpointcloud/pointcloud


CockroachDB does not yet support 3D shapes or indexing. We've discussed extending it to 3D, and you are correct that S2 does not support 3D. S2 was a very convenient starting point for us, but we use a small subset of it, for computing the covering. It is viable to replace that part and consider a decomposition of 3D space.

We would probably need more than 64 bits for the cell-IDs, which is fine given that the IDs are being stored in the inverted index. And since we only represent the actual cells that are populated, it would likely not effect the index size. The main performance challenge is that as the trees get deeper, the number of ancestors one has to search increases -- unlike subtrees, which can be searched with a single range lookup regardless of depth, the ancestors are not in a single range. This can slow down queries even if most of the ancestor space is empty. We have some ideas to prune this search.

I am not familiar with PG Pointcloud. Thanks for the link.


Not an expert, but curious if space-filling curves (like Peano curves) [0] can be used to convert 3D to 2D representations and then the current cdb API/architecture can be used as-is?

[0] https://en.m.wikipedia.org/wiki/Space-filling_curve


Yes, we would probably use a space-filling curve to convert from 3D, just like the current scheme uses a Hilbert curve from 2D.

There are some practical performance challenges when the (sparse) tree gets deeper, as I mentioned in my previous response.


What can it do that Citus can't while being less compatible with postgres?


I've used both approaches, GiST in PG and S2 in Mongo, and I feel that S2 has given me more tools to do queries like "which polygons contain this single point?" efficiently in my use case.


[flagged]


"Eschew flamebait. Don't introduce flamewar topics unless you have something genuinely new to say. Avoid unrelated controversies and generic tangents."

https://news.ycombinator.com/newsguidelines.html

The provocation is understandable but it has been done to death on HN and the principle of curiosity implies mandatory retirement.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...


I'd like to understand the policy to be sure I know what to avoid in the future. Is it that the choice of name in this case was already discussed? Or is discussing the choice of a product name considered inappropriate? Or is it that the discussion of the name in this case is too tengential to the announcement of spatial indexing?

Thank You


> Is it that the choice of name in this case was already discussed?

About 1000 times on every single submission of CRDB, going back years and years.

In any given post about CRDB, at least half the comments are about the name. I personally don't think your parent comment is particularly inflammatory, and it seems to be more curious in nature, but it's a sensitive subject because sometimes people just want to talk about the database as a database, not as a brand.


It's the first, as orthecreedence explained. Though threads about product names are usually pretty shallow and don't make the greatest HN discussion, we wouldn't moderate them generally. This is a special case because there have literally been hundreds if not thousands of comments about it over god knows how many years, and once things have been repeated that often they become their own circle of hell. Thus, whenever something about this database gets posted, the main concern is whether the thread will get overwhelmed yet again by bickering about the name.

I realize it's an arcane point if you never saw any of those previous flamewars, but the


Ok thanks, that makes sense to me.


That's good because I had a mangled sentence in there ("choice of product" when I meant to say "product name"). Fixed now :)

Edit: oh dear, there's also a dangling at the

(which I'll leave for amusement's sake)


What? Of course there is a reason: there's an old adage about cockroaches being the only animals that would survive a nuclear war. Obviously not true, but is does evoke the idea of resilience/reliability to anyone that recognizes the idea.


Hum, interesting, that's an angle I hadn't thought of. So it's a comparison to the reliability and durability of cockroaches. To me, it immediately evoked a disdain to the idea of having cockroaches in my house. So it had a negative association in my head. I wonder how much product names can affect marketing draw, I wonder if product research goes that far.


fyi, I am not sure why but you will get downvoted to oblivion for asking this very reasonable question.

Obviously it is a bad idea.

You as IT head: "Hi Managerial members! Can you approve my budget increase for CockroachDB?"

Them: "Um ... WTF?"

For whatever reason though, if you point out this obvious ramification of naming something meant for B2B a really stupid name, you will get attacked and downvoted.

IDK if it is just because people brought it up before, or CDB employees brigade it or what, but that is what happens.


> For whatever reason though, if you point out this obvious ramification of naming something meant for B2B a really stupid name, you will get attacked and downvoted.

Sooo many posts about projects will have comments / arguments about the project name. It's stupid, there's another project with the same / similar name, it should have been named this...

It's boring, requires no effort, and doesn't touch the actual subject matter of the post.

> IDK if it is just because people brought it up before, or CDB employees brigade it

I don't think a conspiracy is necessary to explain this phenomenon.


I think you will agree this is not a normal name.

I can see a bit of an issue with innocently naming your OS app the same as something else that is already established but you were unaware of. Ok. Maybe that person even is happy to get that information.

That is not the same as this.

Also, some people (not everyone) might not read every single story on HN or do in-depth research before making a comment that is perfectly reasonable .. like the OP.


Smart!


[flagged]


I'm in no special position to interpret Hacker News rules/guidelines, but I can say that I find pretty much all posts like this from Cockroach Labs to be super interesting and worth my time, if for no other reason than because CockroachDB is one of the most interesting things happening in the world of databases at the moment. I'd say much the same of its distant, MySQL-flavoured cousin, TiDB, but I don't see nearly as much written about that. I guess this mostly just reflects a lower emphasis on marketing by PingCAP... but if the equivalent content did exist I'd be up-voting it just the same.

So while the reason Cockroach Labs employees are paid to write these articles might ultimately be marketing, they're far from being what I'd consider marketing fluff pieces, and so it feels to me that they're at least as much "for curiosity" as they are "for promotion".

Again, I'm in no special position to interpret the rules, but I find it hard to object to the content itself, and I figure most articles like this are going to get posted here eventually anyway. Would proper Hacker News etiquette be to wait for someone not affiliated with Cockroach Labs to post it?


Ehh, if they were posting stupid marketing content, yeah I agree. But they are posting mostly technical articles that are very informative and interesting if you follow the DB (which I do, loosely, and almost used it in production but another DB ended up winning out).

I like seeing stuff like this and I don't view it as promotion (or if it is, it's 10% promotion and 90% technical content).


Out of curiosity, which other DB ended up winning out?


ScyllaDB. I was replacing a high-write Postgres instance (10% creations, 90% updates), and was testing out a drop-in replacement that could "just scale." CRDB did wonderfully, but ultimately I decided to rearchitect the problem to fit more of a CRDT model (albeit a crappy version) and just throw an endless stream of events at the DB which get reconstructed when the data is pulled out.

The model ended up working fantastically well, and I can't say enough about how much I like ScyllaDB so far. I will say I wish there was an easier backup method, but I understand that's a big ask on a distributed DB and automating our backups wasn't that difficult.


A good CMO has someone hounding the engineers to do write ups like this. A great one makes them post from their own accounts.


Most blog posts are marketing efforts. I think the test is whether the article provides value. I have a pretty okay background in this field and found it interesting.


Alot more posts than people think on here are actually ads and are manipulated to the top. Fast.ai used to do this all the time. I used to try and fight it/expose it, now I just move on to the next post.


/r/gatekeeping


[flagged]


You might be interested in BikeshedDB [0].

[0] https://github.com/tbg/bikesheddb


> wget https://binaries.cockroachdb.com/cockroach-latest.src.tgz

Sorry, but I can't reach `cockroach` anywhere. Can you please wrap base64 around it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: