Hacker News new | past | comments | ask | show | jobs | submit login
OpenStreetMap PBF Perfomance Tricks (paulkernfeld.com)
42 points by occupy_paul_st 76 days ago | hide | past | web | favorite | 5 comments

Yeah, the OSM PBF format is really efficient in size, but it can be quite a pain to work with those files. Every way consists of nodes, which are usually at the very beginning of the file (first nodes, then ways, then relations). So in order to work efficiently, you'll need a node cache (which at the moment is more than 40G). Seeking is not viable because there is no indication in which block the node will be. On-disk caching is an option, but it will slow you down substantially.

So unless you are doing meta-analysis of the raw OSM data, without assembling geometries, hand-rolling an OSMPBF reader is viable, otherwise I would suggest either using something pre-processed (extracts in real geodata formats), an established parser like osmium or rather import the data into e.g. postgres and do some querying there.

Yeah, fair. I'm only working with Boston-area data so I can pretty easily load a Massachusetts OSM file from memory. However, memory would definitely be an issue for processing the entire world.

Are nodes in first fileblocks in "planet dumps"? And then fileblocks mostly with ways? With this dump configuration fileblocks are mostly useless.

Blocks are useful, because you can distribute the workload onto several workers. Most encoders nowadays write first blocks with just nodes, then blocks with just ways and then just relations.

My favorite OpenStreetMap data science tool is "osmium"

* https://osmcode.org/osmium-tool/manual.html

And you can convert osm.pbf to osm.opl - and can reprocess with unix tools (grep,awk,cut,tr,...):

* https://osmcode.org/opl-file-format/#usage-examples

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact