Hacker News new | past | comments | ask | show | jobs | submit login

That's cool, but not what I would call high performance. If you do these often you would want an index, and should only take single digit ms.



The reason I call it high performance is that it avoids the hours/days of processing (for the planet file)[1] that would be required for pulling the data out of PBF and indexing it. And you'd also need RAM at least the size of the planet to even get that level of speed.

You could certainly amortize this cost for repeated queries, but for one-off queries I haven't seen anything faster.

[1]: https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks


You wouldn't need hundreds of gigs of ram to answer this query in 5ms. You'd need a few mb or so to answer this query, after initial indexing is done of course.


How long do you think it would take to index it?


Depends on the machine :) hours maybe?


So for a one-off query DuckDB is tons faster, and easier.


yeah, but it's not high performance, which was my original point.

I spend a lot of time optimizing stuff developers thought was "high performance" and they're scanning a 100gb+ dataset on every page load.


If the alternative takes hours, then ~30 seconds seems high performance to me.


DuckDB is indeed great for one off stuff.

But calling 30s high performance for quantifying 60k unique values like in this case is misleading at best. I try not to mislead people. :)


I can help answer this a bit.

Processing the data with code and SQLite takes about 4 hours for the United States and Canada. If I can figure out how to parallelize some of the work, it could be faster.

The benefit of using SQLite is its low memory footprint. I can have this 40GB file (the uncompressed version) and use a 256MB container instance. It shows how valuable SQLite is in terms of resources. It takes up disk space, but I followed up with the zstd compression (from the post).

Was this more work and should've I have used Postgres and osm2pql, yes. Was it fun, well yeah, because I have another feature, which I have not written about yet, which I highly value.


Not near a computer to try this out but I'd be surprised if you couldn't get a huge speed up by selecting the whole file into a real table first and querying against that. DuckDB should be able to better vectorize operations then




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: