Hacker News new | comments | show | ask | jobs | submit login
Flatbush: A very fast static spatial index for 2D points and rectangles in JS (github.com)
104 points by lobo_tuerto 4 days ago | hide | past | web | favorite | 33 comments





Vladimir Agafonkin, the maintainer of this repo, has a lot of cool JS libraries like this, about efficient algorithms for a specific task with very fast implementations[0]. For example, similar to flatbush he also has a repo naimed kdbush, which is points only, but (I suppose) more optimized as a result[1].

He also is involved with libraries for Mapbox and Leaflet, so if you're doing something with the web and with maps, there's a big chance you've used some of his code directly or indirectly.

[0] https://github.com/mourner/projects

[1] https://github.com/mourner/kdbush


Also the frontman of a pretty good rock band[1]

Who said one can't be a Renaissance man these days?

[1]http://rain.in.ua/en.html


I don't work with mapping so it's all new to me, but this list of projects[0] is like a gold mine. I've known about protobufs for a while, but had no idea about pbf.

Hey, author of the library here — feel free to ask me any questions! Very happy to see this trending on HN.

I often use your kdbush module for this, is there a case where flatbush is better for static points?

I think kdbush should always be faster for points — it uses more than 2x less memory, because when you encode points as bounding boxes, you basically duplicate them. Indexing and search is faster mostly because you operate on less data.

BTW, the readme of kdbush and flatbush don't mention each other. Might be nice if they do and clarify how they compare.

(because sometimes, people have an older, not updated library, for example)


Good point, thank you! I'll interlink those.

So is it multithreaded? If so, how many cores does it scale to till it hits other bottlenecks? Does it load the cores to 100%?

No, it's single-threaded (it's pure JavaScript after all), but you can index data on a worker thread and then transfer the index instantly to the main thread for search queries. I don't think the algorithm needs parallelization — isn't worth the complexity, it's already fast enough.

This will immediately make my project faster, thank you kind stranger!

> Enables fast spatial queries on a very large number of objects

Can someone explain that? I assume the problem is something like, "user clicks on 2D map, find which objects are near the click". A naive solution might store the objects in an array sorted by X coordinate. The problem there is that the search for objects near (x,y) would have discard many objects that were far away on the Y axis. Somehow this thing solves that problem?


The readme says it uses an R-tree.[0] As the wikipedia article explains all the objects are grouped so that near by objects are put in the same group and for each group a bounding box is stored. When querying the the structure for a given position you can discard all groups that to not intersect your given position/range.

[0] https://en.wikipedia.org/wiki/R-tree


Basically, yes. I wrote a long article about this [1] which goes into depth on how algorithms like this work — hope it helps!

[1] https://blog.mapbox.com/a-dive-into-spatial-search-algorithm...


Yes indeed. Eg a camera location coordinate can be thrown at these kinds of things (this is 2d, eg quadtrees, but similar things exist for 3d also, see octtrees), and it return all the 3d objects in that space (out of a huge number of objects in the entire "universe"). And then the program processes that reduced list of objects.

I think of these as like a prefilter.


That hilbert() function is nuts


Hmmm... delicious bitmasking. That's when you know the author is serious about squeezing performance out of a JS VM: forcing 32 bit integer optimizations.

Note that the hilbert function wasn't written by me, just merely ported from C++ — here's a post about it by the original author: http://threadlocalmutex.com/?p=126

> here's a post about it by the original author: http://threadlocalmutex.com/?p=126

Nice!

But still: you did port it from C++, and consciously or not, that optimization still applies.


Oh, nice. We use RBush quite extensively in Upverter, and there are a few places where we build static trees to query. I'll have to try using this and see if it improves performance.

Awesome — let me know how it goes!

WOW- Upverter is pretty amazing! I bet you have lots of fun developing it!

I am wondering, would it be possible to do this even faster with WebAssembly? So that it is supported by all major browsers?

At a quick glance, I see mostly primitive operations on pre-allocated chunks of memory. WebAssembly might give you a slight performance improvement, but I doubt it would be drastic. V8 is pretty good at optimizing this kind of code. I’m sure the other browser vendors can do this pretty well too.

You're right! In my experience, today JavaScript is so fast on careful code like this that even ports to C++ or Rust aren't much faster, and any gains would likely be offset by JS <-> WebAssembly communication overhead (although this is being rapidly improved).

Here's a good thread with benchmarks that compare Earcut (also one of my libraries) to C++ and AssemblyScript WebAssembly ports: https://github.com/mapbox/mapbox-gl-js/issues/4835 (in short, mostly slower than JS)

Another anecdotal example is that C++ and Rust ports of https://github.com/mapbox/delaunator (my Delaunay triangulation library) are only 10-15% faster.


One other datapoint - I ported flatbush to C++, and I got a 2.6x speedup on the "Insert 1 million boxes" benchmark. C++ port is at https://github.com/IMQS/flatbush

Do you have to consider a lot of Jascript- or even engine-specific quirks to write fast code like this, or is it mostly about paying attention to more general principles?

My impression is that JS engines can handle straightforward imperative code on typed arrays very well. But is there more stuff that I should pay attention to when writing performance-sensitive code that acts on large datasets in JS?


The most critical thing is avoiding unnecessary work — that usually means estimating algorithmic complexity and trying to reduce it where possible (e.g. by sorting + binary search, caching, using basic data structures like a priority queue, etc.). I'm meaning to make a long blog post about my approach to algorithms, but meanwhile you can check out my recent slide deck on the matter:

https://speakerdeck.com/mourner/fast-by-default-everyday-alg...


Is this supposed to have audio?

No, just slides — I don't have a video of the talk. Perhaps there will be one someday if I'm invited to some international JS conference :)

Hopefully when SIMD lands, WebAssembly purely for performance improvement will be a viable strategy!

Hi, mourner! Thanks for turning me on to rollup.js as I was reading through your very nice work!



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: