Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Host a planet-scale geocoder for $10/mo (ellenhp.me)
127 points by ellenhp 9 months ago | hide | past | favorite | 34 comments
For the uninitiated, a geocoder is maps-tech jargon for a search engine for addresses and points of interest.

Geocoders are expensive to run. Like, really expensive. Like, $100+/month per instance expensive unless you go for a budget provider. I've been poking at this problem for about a month now and I think I've come up with something kind of cool. I'm calling it Airmail. Airmail's unique feature is that it can query against a remote index, e.g. on object storage or on a static site somewhere. This, along with low memory requirements mean it's about 10x cheaper to run an Airmail instance than anything else in this space that I'm aware of. It does great on 512MB of RAM and doesn't require any storage other than the root disk and remote index. So storage costs stay fixed as you scale horizontally. Pretty neat.

Demo here: https://airmail.rs/#demo-section

Writeup: https://blog.ellenhp.me/host-a-planet-scale-geocoder-for-10-...

Repository: https://github.com/ellenhp/airmail




I don't know if it's fit your specific use case, but for pure search take a look to sonic (https://github.com/valeriansaliou/sonic). It's blazing fast and require very few resources


The geocoder is built on top of tantivy which is fast and uses low resources too (https://github.com/quickwit-oss/tantivy).

I'm curious about the comparison between those two.


Does it have support for S3?

I admit I am a bit lost with all the new search options: tantivy, quickwit, sonic, meilisearch, zinc, toshi, lnx... A lot have popped up, particularly out of the Rust community, and I have a hard time keeping up.


> "Do store the Sonic database on SSD-backed file systems only."

From the README, it works only on SSD.

All those projects serve different purposes, and several are not actively maintained.

- Meilisearch: It provides a search-as-you-type experience and comes with many features; I don't know it very well, but I think it targets first e-commerce/application search.

- Quickwit: it's a distributed search engine for append-only data and works well on S3, a good fit for observability/security/financial/... data.

- Sonic: it looks like it targets search-as-you-type use cases and does not provide many features (which can be a very good feature in itself as it remains very light).

- Tantivy: It's a library; you need to build your server on top of it if you want an HTTP API. toshi, lnx did. It's used by a lot of search projects like tabbyML, Milvus, bloop, paradedb, airmail...


Oh I forget to add stract is using tantivy too, I really hope this project will take off.

https://stract.com/

https://github.com/StractOrg/stract

https://news.ycombinator.com/item?id=39254172


theres also typesense, a memory based one


This is super cool to read about!

Geocoding is one of those _hard_ problems and this really seems like a great step forward.


Congrats on the achievement.

One trick part when working on "planet-scale" is parsing and matching the results for multiple countries. I tried some addresses in Brazil without success. Queries like "Starbucks Sao Paulo" return some results but addresses like "Avenida Paulista 100" (or its variations) don't.

Last time I looked (~2018) pelias-parser used some ML training and the results weren't very good for Brazil. I'm guessing in 2024, an open-source fine tuned LLM would do a good job?


Amazing, thanks for the writeup and sharing


Can it reverse geocode as well? lat/long to address


Is there any way to zoom on the demo map in mobile?

I’ve enjoyed my brief dalliances with digital cartography. I’m grateful for a stack like this that I can explore.


I’m a Range request fan-boy, so thanks for sharing. Byte-indexing static objects is basically a brilliant point of light in the dark universe of code. I have fixed dozens of systems that read entire zip files over the network and into memory just to get the list of files inside (totally unnecessary if the host supports Range headers).


Which geocoder is $100/mo? Nominatim? Which cloud?

Either way this looks really cool and I’ll give it a try!


I think I priced Nominatim on EC2 and it came in around $350/mo because you need 64GB of RAM. Also on EC2 Pelias ended up being close to $100/mo. Keep in mind that I'm including EBS prices for around 500gb of storage for both of those. If you go with a bare metal server on hetzner (which most regular folks would do, and what we do for maps.earth) you can definitely come in under that.

Even if I'm off by a little bit or if EC2 prices are a bad benchmark, the fact remains that you need 100x less RAM to run Airmail than Nominatim, which felt novel enough to post about.


You can set it up off AWS. A 64GB RAM server would cost you $16/mo.

https://serverhunter.com is amazing


> EC2 prices are a bad benchmark

In general, reserved instance prices are a better number for "cheapest way to run a thing on AWS.

I find https://instances.vantage.sh/ is a good way to look for the pricing info.

It says an x2gd.xlarge in us-east-1 has 64gb ram and costs $72 USD/mo (on a 3 year reservation paid up-front).


Thanks! I’ve been using OSMs free Nominatim for testing but need to move off it to support a higher query volume - I had no idea it was going to be so expensive, so this comes at the perfect time!


There are ways to make it work on a budget, but with Nominatim you do need 64GB of physical memory at a minimum for the planet. Some of the other replies in this thread have offered resources on how to find that for cheap. My go-to for this is hetzner but I'm super curious if there's anything west of the atlantic that compares because the latency going to Finland is brutal, even with mosh instead of ssh.

Good luck! Don't use Airmail in prod (yet?), if that thought crossed your mind. It's just a demo for now and that's probably how it should stay for a while longer :)


Woah this is ridiculously good. You've done a good job here working off Pelias. I did find managing the Elasticsearch cluster for a production instance of Pelias hopelessly annoying.

Remarkable.


Very cool. I went and had a look at the cost of using the Google maps api and this project seems like it would be pretty competitive with it.


Please say this with me, “I might prototype and test with Google Maps but I shall not use Google Maps API for production.” The cost might just not be what you anticipate.


This is one of those sentences that are triggering/traumatizing for those who HAVE gone through the experience. And not even for a popular project!


geocod.io is a hosted way of geocoding run by good friends of mine. It looks like they charge $0.50 per thousand lookups.


I'm hosting Photon geocoder from Komoot. I find it decently accurate and not too expensive to run


I love Photon, it's generally my favorite of the bunch to work with. My only objection to Photon is that it's kind of tough to build an index for it, but I think if memory serves GraphHopper (?) publishes one so that's kind of moot unless you have special requirements. Airmail is not in a state where it's usable for anything other than a demo, so if I were building something new today I would use Photon for it, unless I needed OpenAddresses in which case I'd use Pelias.


I will say, as someone who worked professionally in maps for several years, you are one of the rare people that has tried self hosting multiple geocoders at scale. This is such a valuable boots on the floor take that maybe only 10 people in the world can actually offer. Maybe 10 is too few but definitely less than 100


Geocoders are expensive to run. Like, really expensive. Like, $100+/month per instance expensive unless you go for a budget provider.

When you write like this it sounds very unprofessional. Also you are basically saying "this is really expensive, unless it isn't".

Why is there any difficulty in this at all? Why would this even need to be something someone subscribes to? It is basically a nearest neighbor search.


Writing a geocoder is hard because they combine natural language processing (i.e. address parsing, an exceptionally hard problem) with the need to search through a very large dataset, typically 300-500GB but sometimes more. Doing that for each query generally requires using ElasticSearch, which doesn't respond well to being run on a VPS with 512MB of RAM. If you're not interested in this subject, there are lots of other posts on this site you can go read about.

Edit: In addition to the RAM requirements, you need a persistent volume containing the entire index attached to each serving instance, which is a real bummer when you want to scale horizontally.

> When you write like this it sounds very unprofessional. Also you are basically saying "this is really expensive, unless it isn't".

Likewise actually, it's kind of unprofessional to respond in this way to somebody's personal writing style for a writeup on her personal blog about a personal project. Go read some other Show HN thread, there's lots of genuinely cool stuff here, no need to pick on someone for writing in a way you disapprove of.


> If you're not interested in this subject, there are lots of other posts on this site you can go read about

This is just as unprofessional and condescending.


I was annoyed, and that was intended to be a little condescending yeah. I'm not going to let someone who goes by the name "CyberDildonics" tell me to be more professional online without being a little bit snarky in response. I hope that isn't against the rules.

edit: Looks like it is in fact against the rules, but I think I'm tentatively going to leave the comment up because I did provide a valuable answer to the parent commenter's question. Mods can do what they will of course. :)


I'm mostly just sad you stopped the chain of people calling their parent post "unprofessional". We could have gotten it 4 replies deep!


Ignore the haters here. What you've built is cool and you're rightly proud of it.


I like you. Your style seems great to me. The nit-picker talking about your "lack of professionalism" is not my preferred sort of human to encounter, and I would frankly prefer to hear from their sort much less often. I'd take 10 more ppl like you on this site any day


[flagged]


> Isn't this an ad for something you're selling?

No. This is a demo of an open-source project that I've released for free.

> To me it seems like something that has been around for a very long time and is now being resold under a new label of "geocoding"

That's just what this class of software is called.

https://developers.google.com/maps/documentation/geocoding/o...

https://docs.mapbox.com/api/search/geocoding/

https://geocode.earth/

https://github.com/komoot/photon

https://nominatim.org/

https://pelias.io/

https://www.here.com/platform/geocoding

https://developer.tomtom.com/geocoding-api/documentation/geo...

https://geocoding.geo.census.gov/geocoder/

https://github.com/Qwant/mimirsbrunn

All of these call themselves geocoders.

> 25 years ago tiny contained $100 GPS units had the entire US on a 2 GB flash card.

Yeah! Super cool feat of engineering accomplished mostly by requiring you to do structured search, if I remember correctly. I never drove with one as they were a bit before my time. I remember them being very fiddly, but that was an observation I made from the back seat of a car as a child so, grain of salt. And yes to google maps offering this for free as a part of their maps client. Not free if you're a 3rd party software developer though, as you'll see if you click through the first link I posted above.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: