Flatbush: A very fast static spatial index for 2D points and rectangles in JS

vanderZwan · on Oct 10, 2018

Vladimir Agafonkin, the maintainer of this repo, has a lot of cool JS libraries like this, about efficient algorithms for a specific task with very fast implementations[0]. For example, similar to flatbush he also has a repo naimed kdbush, which is points only, but (I suppose) more optimized as a result[1].

He also is involved with libraries for Mapbox and Leaflet, so if you're doing something with the web and with maps, there's a big chance you've used some of his code directly or indirectly.

[0] https://github.com/mourner/projects

[1] https://github.com/mourner/kdbush

romwell · on Oct 11, 2018

Also the frontman of a pretty good rock band[1]

Who said one can't be a Renaissance man these days?

[1]http://rain.in.ua/en.html

dragonshed · on Oct 10, 2018

I don't work with mapping so it's all new to me, but this list of projects[0] is like a gold mine. I've known about protobufs for a while, but had no idea about pbf.

mourner · on Oct 10, 2018

Hey, author of the library here — feel free to ask me any questions! Very happy to see this trending on HN.

mattdesl · on Oct 10, 2018

I often use your kdbush module for this, is there a case where flatbush is better for static points?

mourner · on Oct 10, 2018

I think kdbush should always be faster for points — it uses more than 2x less memory, because when you encode points as bounding boxes, you basically duplicate them. Indexing and search is faster mostly because you operate on less data.

vanderZwan · on Oct 10, 2018

BTW, the readme of kdbush and flatbush don't mention each other. Might be nice if they do and clarify how they compare.

(because sometimes, people have an older, not updated library, for example)

mourner · on Oct 10, 2018

Good point, thank you! I'll interlink those.

bufferoverflow · on Oct 10, 2018

So is it multithreaded? If so, how many cores does it scale to till it hits other bottlenecks? Does it load the cores to 100%?

mourner · on Oct 10, 2018

No, it's single-threaded (it's pure JavaScript after all), but you can index data on a worker thread and then transfer the index instantly to the main thread for search queries. I don't think the algorithm needs parallelization — isn't worth the complexity, it's already fast enough.

dandare · on Oct 11, 2018

This will immediately make my project faster, thank you kind stranger!

abainbridge · on Oct 10, 2018

> Enables fast spatial queries on a very large number of objects

Can someone explain that? I assume the problem is something like, "user clicks on 2D map, find which objects are near the click". A naive solution might store the objects in an array sorted by X coordinate. The problem there is that the search for objects near (x,y) would have discard many objects that were far away on the Y axis. Somehow this thing solves that problem?

laszlokorte · on Oct 10, 2018

The readme says it uses an R-tree.[0] As the wikipedia article explains all the objects are grouped so that near by objects are put in the same group and for each group a bounding box is stored. When querying the the structure for a given position you can discard all groups that to not intersect your given position/range.

[0] https://en.wikipedia.org/wiki/R-tree

mourner · on Oct 10, 2018

Basically, yes. I wrote a long article about this [1] which goes into depth on how algorithms like this work — hope it helps!

[1] https://blog.mapbox.com/a-dive-into-spatial-search-algorithm...

laythea · on Oct 10, 2018

Yes indeed. Eg a camera location coordinate can be thrown at these kinds of things (this is 2d, eg quadtrees, but similar things exist for 3d also, see octtrees), and it return all the 3d objects in that space (out of a huge number of objects in the entire "universe"). And then the program processes that reduced list of objects.

I think of these as like a prefilter.

bytematic · on Oct 10, 2018

That hilbert() function is nuts

pradn · on Oct 10, 2018

Link: https://github.com/mourner/flatbush/blob/eb61250246df5ebb4c9...

vanderZwan · on Oct 11, 2018

Hmmm... delicious bitmasking. That's when you know the author is serious about squeezing performance out of a JS VM: forcing 32 bit integer optimizations.

mourner · on Oct 11, 2018

Note that the hilbert function wasn't written by me, just merely ported from C++ — here's a post about it by the original author: http://threadlocalmutex.com/?p=126

vanderZwan · on Oct 11, 2018

> here's a post about it by the original author: http://threadlocalmutex.com/?p=126

Nice!

But still: you did port it from C++, and consciously or not, that optimization still applies.

rcfox · on Oct 10, 2018

Oh, nice. We use RBush quite extensively in Upverter, and there are a few places where we build static trees to query. I'll have to try using this and see if it improves performance.

mourner · on Oct 10, 2018

Awesome — let me know how it goes!

jcmoscon · on Oct 10, 2018

WOW- Upverter is pretty amazing! I bet you have lots of fun developing it!

dandare · on Oct 10, 2018

I am wondering, would it be possible to do this even faster with WebAssembly? So that it is supported by all major browsers?

ryanworl · on Oct 10, 2018

At a quick glance, I see mostly primitive operations on pre-allocated chunks of memory. WebAssembly might give you a slight performance improvement, but I doubt it would be drastic. V8 is pretty good at optimizing this kind of code. I’m sure the other browser vendors can do this pretty well too.

mourner · on Oct 10, 2018

You're right! In my experience, today JavaScript is so fast on careful code like this that even ports to C++ or Rust aren't much faster, and any gains would likely be offset by JS <-> WebAssembly communication overhead (although this is being rapidly improved).

Here's a good thread with benchmarks that compare Earcut (also one of my libraries) to C++ and AssemblyScript WebAssembly ports: https://github.com/mapbox/mapbox-gl-js/issues/4835 (in short, mostly slower than JS)

Another anecdotal example is that C++ and Rust ports of https://github.com/mapbox/delaunator (my Delaunay triangulation library) are only 10-15% faster.

bmh · on Oct 11, 2018

One other datapoint - I ported flatbush to C++, and I got a 2.6x speedup on the "Insert 1 million boxes" benchmark. C++ port is at https://github.com/IMQS/flatbush

fabian2k · on Oct 10, 2018

Do you have to consider a lot of Jascript- or even engine-specific quirks to write fast code like this, or is it mostly about paying attention to more general principles?

My impression is that JS engines can handle straightforward imperative code on typed arrays very well. But is there more stuff that I should pay attention to when writing performance-sensitive code that acts on large datasets in JS?

mourner · on Oct 10, 2018

The most critical thing is avoiding unnecessary work — that usually means estimating algorithmic complexity and trying to reduce it where possible (e.g. by sorting + binary search, caching, using basic data structures like a priority queue, etc.). I'm meaning to make a long blog post about my approach to algorithms, but meanwhile you can check out my recent slide deck on the matter:

https://speakerdeck.com/mourner/fast-by-default-everyday-alg...

hnaccy · on Oct 11, 2018

Is this supposed to have audio?

mourner · on Oct 11, 2018

No, just slides — I don't have a video of the talk. Perhaps there will be one someday if I'm invited to some international JS conference :)

ryanworl · on Oct 10, 2018

Hopefully when SIMD lands, WebAssembly purely for performance improvement will be a viable strategy!

brett40324 · on Oct 11, 2018

Hi, mourner! Thanks for turning me on to rollup.js as I was reading through your very nice work!