Hacker News new | comments | ask | show | jobs | submit login
Geolocating requests with Google Load Balancer for free (doit-intl.com)
199 points by vadimska 10 months ago | hide | past | web | favorite | 69 comments

I've been working on and off on a system to build this data for free - https://www.open-geo-ip.com/

It uses your browser location to build the data, bootstrapped with Mechanical Turk workers around the world. It has one feedback loop, where if you use the JS library it will give you the users location if they deny browser location (if we have the data...) and if they do share location, then we update the database (but throw away the last part of their IP address).

There are two or three other loops where we could build more data (which you can get here https://www.open-geo-ip.com/data/download ).

I'm probably going to do a small kickstarter to pay for some cleanup and expansion work or just kill it if that fails to get funded. Any ideas appreciated on things to do with it.

Cool project! I've been thinking about how one could build a crowdsourced geoIP database. Bootstrapping with Mechanical Turk is an interesting approach, especially if you can request workers in different cities and countries. One idea I had was a browser extension that volunteers could install to automatically submit their current location and IP address on browser startup.

It would be helpful if the home page demonstrated the passive geolocation results. The page asked for my browser location but didn't show the result.

btw the map zooming is very slow in my mobile browser for some reason.

> it will give you the users location if they deny browser location

Seriously? Did you help build BonziBuddy, too?


Well right now a) we most likely have no idea where an ip address is and b) even if we do, it's nearest-city accuracy.... which is exactly what everyone else does.

Denying browser location doesn't stop the app or website from using your ip address against some third party service to figure out where you are.

>Denying browser location doesn't stop the app or website from using your ip address against some third party service to figure out where you are.

I didn't state that it does. However, just because you can do something doesn't mean you should. Did it ever occur to you that if someone explicitly tells their browser that they don't want to be located that you should respect that?

I hope you are never critical of Facebook's or Google's privacy standards, because there's a full-length mirror with your name on it.

I feel it’s a bit of a stretch to compare public information (what city does best guess say an IP is located in) and private information (where is my device right now).

The lookup against what IP is in what city is published in numerous public databases that anyone can look up against in small doses at no cost, or at scale commercially.

This service shines a light on that, doing away with illusions of privacy and showing you where you potentially have none.

Comparing to Google and Facebook feels disingenuous. You can’t lookup any person in their DB and see their records. It’s not public precisely because they’re hoarding what (in aggregate) is legitimately private data.

It’s still an illusion of privacy mind, as all the recent fuss finally coming about demonstrates.

The Google/Facebook comparison is made because those companies also deliberately ignore the user's privacy wishes.

Also, this isn't about locating someone to their city. According to the blog post, it returns latitude and longitude pairs.

Yes, these may not be precise in some circumstances, but as detailed in many recent HN posts, Google has many many ways to narrow that down to a very close approximation of where you actually are. And it's only going to get better at it over time.

Whatever location information is being presented is being displayed from a lookup from a publicly accessible database. It’s not a private DB that only one org has access to. This is the public library. The only cost of admission is putting in the effort to look. Google’s knowledge of you is private. It’s not comparable.

Should there be efforts to collect more information to make that public information more specific? Maybe. Can you actually delete already public information from the internet? Not really.

“Would you like to share what colour the sky is?”

You can choose not to. The webpage can still tell you. That information is public regardless whether you provide more specific information or not

"Can" vs. "should."

The bottom line is that the user specifically requests not to be located, and this is a way to do exactly the opposite of what the user wants.

Being technically capable of doing something is not an excuse for violating trust.

Here's a web page that may help you wrap your brain around the concept: https://en.wikipedia.org/wiki/Ethics

Every web request is like a letter with a to: and a from: address. Your anger is equivalent to being mad that someone looked at the from: address on an envelope when you told them not to.

Also, your argument is that because the user doesn't want it, it is unethical? I don't want to sit in traffic but that doesn't make traffic unethical.

Similarly, you accuse them of 'violating trust'. It's public knowledge that any IP address can be looked up. Just because you weren't aware of it doesn't mean your trust is violated. In the same way, just because you didn't know something was against the law doesn't make it not illegal.

I am for privacy, don't get my wrong, but your comments represent one of the biggest challenges with privacy right now: the assumptions of privacy and trust. It's hard to have rational and productive arguments about privacy when people get emotional about the inner workings of the system. If you don't agree with the system, work to change it, but don't blame others for what is, at the end of the day, just a feature of how it all works. Instead, try to understand the feature and think about how we can implement future systems with similar functionality but more privacy.

>Your anger is equivalent to being mad that someone looked at the from: address on an envelope when you told them not to.

No, it's the equivalent of asking a woman in a bar if you can call her and when she says "no," you look up her number in the phone book and call her anyway.

Except she’s not forced to wear her full name on display at the bar (unlike IP addressing) and is able to opt out of being in the phone book (unlike GeoIP . DBs).

I get what you’re saying and I see where you’re coming from, but to try and use this phone number analogy, it’s like telling someone what city/state they’re in based on their area code when they’ve opted to provide you no location information beyond their phone number.

The phone number itself contains location information. It’s not necessary accurate information as I could easily (and do) use a 212 number wherever I am I the USA, not just in New York.

Finally, we’ve had rulings about phone numbers and IP addresses. Phone numbers “belong” to the end user, not the operator, and move with the user if they want to. IP addresses “belong” to the carrier, and are non portable. In a number of cases, carriers actively provide city-level accuracy for where they’re using their IP space as it actively improves performance for end users.

> Denying browser location doesn't stop the app or website from using your ip address against some third party service to figure out where you are.

Just how like being uninvited from a party doesn't stop you from just showing up anyway (and trashing the bathroom for good measure).

Whelp, all those markers almost crashed my browser.

Yeah it worked great when we had 100 data points. Another thing to fix :-)

One thing I will say for Maxmind is that their API was exceptionally good at picking up anonymous proxies like Tor.

I dealt with a ton of fraud a couple of jobs ago and being able to eliminate credit card transactions for people on Tor was huge in cutting down on charge backs from fake card use. The minFraud API was really beneficial as well.

Don't sell the offering short.

I don't doubt that there are a lot of baddies using Tor (and I wouldn't be surprised if that was the majority of Tor users in some cases). It's a shame though that cutting off Tor access is the solution. A very reasonable one from a business standpoint, but unfortunate.

Doesn't have to be a majority of Tor users, if the tiny minority of actors are flooding tons of stolen cc transactions. A normal person may make an online purchase a day or so, while someone trying to get as much value as possible out of a bunch of stolen cc may just spam a lot of transactions.

For the server seeing the incoming cc purchase requests, it's still majority fraud...

Tor is a special case here: you can use any Tor DNSBL and it will pick up 100% of the nodes, effectively.

All Tor IPs specifically and publicly advertise the fact that they are an open proxy.

Well, I am sure MaxMind provides a lot of value for many use-cases. However, I just need something very basic (country, city, lat and long) and getting this "out of the box" from Google's LB is blessing for me.

The author does specify that for his own needs, which was basically adding 3 headers to requests, the offering was not worth it. He also does explicitly say that MaxMind is very good. So I don't think he was unfair at all in the post.

For those not running on GCE, I run a free service at https://blip.runway7.net/ that piggybacks on the Google App Engine headers. Suitable for calling from the client browser / device, it allows you to ask for location specific resources straight from the client without having to do anything on the server at all.

Code at https://github.com/runway7/blip

This is especially useful in single page apps, or static sites, if that's not obvious.

Just realized that Cloudflare also provides IP Geolocation, even for free plan.[1] However it only provides ISO country code of the visitors.

[1] https://support.cloudflare.com/hc/en-us/articles/200168236-W...

That's great. Do you know if this also works on all cases when I use Cloudflare only as DNS? Edit: After some thinking this cannot work when Cloudflare is used as DNS only.

Using this:


The load balancer expands variables to empty strings when it cannot determine their values, for example for geographic location variables when the IP address’s location is unknown, or for TLS parameters when TLS is not in use.

Geographic values (regions, subdivisions, and cities) are estimates based on the client’s IP address. From time to time, we update the data that provides these values in order to improve accuracy and to reflect geographic and political changes.

If using the CLI, don't put the single quotes or the whole thing will become the header without a value.

I have used the CLI and the quotes worked fine for me. Here is the complete command I have used:

gcloud beta compute backend-services update app --custom-request-header 'X-Client-Geo-Location:{client_region},{client_region_subdivision},{client_city}' --custom-request-header 'X-Client-Geo-Region:{client_region_subdivision}' --custom-request-header 'X-Client-Geo-LatLong:{client_city_lat_long}' --custom-request-header 'X-Client-TLS-Version:{tls_version}' --custom-request-header 'X-Client-TLS-Chiper:{tls_cipher_suite}' --custom-request-header 'X-Client-Hostname:{tls_sni_hostname}' --custom-request-header 'X-Client-RTT:{client_rtt_msec}'

AppEngine has had this feature for years and it’s extremely convenient!

Exactly! I was using GAE for 9 years now and it's so easy to have these headers automagically being attached to your requests. Finally, Google Compute Engine has the same convenient way of geolocating requests.

I have always said that the GCP HTTP(S) load balancer is the hidden gem in cloud. I'm glad it's becoming more and more popular (also feature rich).

One feature that must be mentioned is that it allows you to cache content (could be anything: HTML, JPEG and more) for as little as 1 second! CloudFlare requires an enterprise plan and even then you cannot set a TTL lower than 30 seconds.

Does anybody know how good the data is? I mean everyone of the geolocation lookup providers has faults and choosing the majority from different sources will have the best results. But maybe Google has much more accurate data?

Note that if you're using this load balancer behind a CDN then the results can be incorrect, although probably generally within the same region.

Also MaxMind does have completely free databases down to the city level, just with lower accuracy compared to their paid products: https://dev.maxmind.com/geoip/geoip2/geolite2/

This probably will depend on CDN. Google's CDN is part of the Load Balancer itself and therefore it won't affect geolocation. With "3rd party" CDNs it's different, of course.

Finally, the free version of MaxMind is great. Thanks for mentioning it.

If you are hosting outside of the Google eco-system and want to be able to do geo-IP for a store locator etc., or just to set a cookie on a page with the lat/lon, how does one build a minimal app in the Google cloud to do just this bit of the task?

If this is easy doable then that would be an easy migration route.

It would be pretty easy to build. Probably something you can do end-to-end in one or two days.

see this other comment for full source code: https://news.ycombinator.com/item?id=16915184

Excellent! Another great capability added! This will save us some over head!

Awesome, save some efforts of configure ngnix to work with geoip databases.

yep. also, spares some compute cycles for your app

Wow, that's so cool! I wish my workload was running on Google Cloud

We've done some testing and found many inaccuracy in Google LB Geo-IP results, anyone know who is in charge and which channel we can feedback to Google?

or anyone would like to try our service, please drop me a mail data at ipip.net

has anyone tried to compare geolocation accuracy of this solution vs Maxmind? I would assume Country-level will be very comparable (its easy to do), but doing city level accuracy is not trivial

Cool!! Location-based traffic insights have significant value in many use-cases. Show the results on a heat-map to see where you should be targeting customers.

Finally, i am waiting for something like that for sometime.

Nice! Any clue if there is a way this can be enabled automatically on balancers created using kubernetes (on GKE)? That would be really awesome.

My thoughts exactly. I am now working on this problem myself and I will share my results soon.

Looking forward!

No need for external services from now on.

Been waiting for this for a while!

Great article! It's about time that Google will come with a geo solution

Anything like this for AWS?

AWS added geolocation to cloudfront in 2014.


AWS cloudfront feature is incomplete - it returns only country code.

Googles and maxmind features include city/lat/lng.

Not everything goes thru CloudFront. I hope ELB/ALB will have this feature soon too!

I do geo routing in Route53. Because of how often China blocks cloudfront.

Finally we can have it as a not only on app engine :-)

Sharing is caring :) Thank you

Finally!! Thanks for sharing

Wow, such nice developers! offering more user private data to google huh

nobody's offering Google anything, rather, Google is offering developers meta data for HTTP requests.

Finally, i am waiting for something like that for some time.

Aren't there CloudFlare, AWS and Azure have it?


At last


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact