
OpenStreetMap PBF Perfomance Tricks - occupy_paul_st
https://paulkernfeld.com/2019/08/05/osm-pbf-performance-tricks.html
======
thomersch_
Yeah, the OSM PBF format is really efficient in size, but it can be quite a
pain to work with those files. Every way consists of nodes, which are usually
at the very beginning of the file (first nodes, then ways, then relations). So
in order to work efficiently, you'll need a node cache (which at the moment is
more than 40G). Seeking is not viable because there is no indication in which
block the node will be. On-disk caching is an option, but it will slow you
down substantially.

So unless you are doing meta-analysis of the raw OSM data, without assembling
geometries, hand-rolling an OSMPBF reader is viable, otherwise I would suggest
either using something pre-processed (extracts in real geodata formats), an
established parser like osmium or rather import the data into e.g. postgres
and do some querying there.

~~~
ungzd
Are nodes in first fileblocks in "planet dumps"? And then fileblocks mostly
with ways? With this dump configuration fileblocks are mostly useless.

~~~
thomersch_
Blocks are useful, because you can distribute the workload onto several
workers. Most encoders nowadays write first blocks with just nodes, then
blocks with just ways and then just relations.

------
pella
My favorite OpenStreetMap data science tool is "osmium"

* [https://osmcode.org/osmium-tool/manual.html](https://osmcode.org/osmium-tool/manual.html)

And you can convert osm.pbf to osm.opl - and can reprocess with unix tools
(grep,awk,cut,tr,...):

* [https://osmcode.org/opl-file-format/#usage-examples](https://osmcode.org/opl-file-format/#usage-examples)

