Hacker News new | past | comments | ask | show | jobs | submit login
Changes to accessing and using Geolite2 databases (maxmind.com)
172 points by anandchowdhary on Dec 30, 2019 | hide | past | favorite | 61 comments



From CC BY SA 4.0: "No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits."

GeoLite2's new EULA "incorporate[s] into this Agreement by reference" specifically the CC BY SA 4.0 ("Creative Commons Corporation Attribution-ShareAlike 4.0 International License") with a statement that this EULA (as well as their data processing addendum DPA, privacy policy PP, and website terms of service WT) take precedent in case of conflict.

"This Agreement controls in the event of any conflict with the above-referenced documents. Thereafter, for any conflicts among the above 4 documents, the priority and precedence of interpretation is DPA, PP, WT and Creative Commons License."

The EULA then has a number of what are literally called "additional restrictions" (omg), which restrict (to summarize) 1) how you distribute it (with some kind of weird statement saying "where not inconsistent with the other terms of this Agreement, as in the Creative Commons License", which seems to invert the prioritization?! I don't know what to do with this...), 2) how you secure your distribution (which I guess precludes any ability to distribute the database along with an application? this is now only for backend use?), 3) that you will destroy old copies of the database (which is pretty egregious per CC BY SA, but I can appreciate this is likely the goal of this new EULA for CCPA compliance), and 4) that you won't send personal data to MaxMind (was anyone doing that before? ;P).

So, I don't understand the goal then with respect to "incorporat[ing] into this Agreement by reference" the CC BY SA if it is no longer a license agreement even remotely compatible with CC BY SA. Like, I was expecting this EULA to be an unrelated license, not some attempt to provide an awkwardly incompatible set of provisions. I now need to send this EULA off to my lawyer, who is probably going to come back with something like "we recommend you don't use this for any purpose unless we can get legal clarification".


Wait a minute, now that I've registered for Maxmind there is a Do Not Sell My Personal Information Requests page with the text:

The following IP addresses are associated with valid 'Do Not Sell My Personal Information' requests as required by applicable privacy regulations, and have been removed from (or will be removed from the next releases of) the GeoIP2 and GeoLite2 databases. None of the IP addresses listed or contained within a listed network may be used for advertising or marketing purposes.

If I had sent Maxmind a request for data removal, the last thing I would expect is that my IP address would be shared with any internet user who bothered to create a Maxmind account. Even if this page were removed, it might not be difficult to obtain the opted-out addresses by doing a diff between GeoIP2 free releases. Perhaps a search for narrow slivers of addresses removed that were previously in California?

Law of unintended consequences for CCPA? Bad implementation for CCPA compliance? What interesting things could be done with a list of publicly available list of opted-out IP addresses?


There's no other way for it to work. It's like putting yourself on a "do not call" list -- it only works if the list is visible so callers can obey it.

Yes a list exists now of privacy-minded individuals' IP addresses, but I can't imagine what you would do with that, and trying to identify them from it would in itself be illegal probably?


There's a few huge differences here from the DNC list.

* Implementation: Access to the DNC list is provided by various carriers for paying customers, with lookups available on a per-number basis. A binary { true, false } response is sent. EDIT: Thank you PaybackTony for the correction. The entire DNC list is also available as a CSV for $$$$.

* Expectations of privacy: Does Maxmind send out a response to those fulfilled complaints stating that their IP address will be made available to anyone on the Internet who registers for an account?

* Hiding in the crowd: How many numbers are in the DNC registry? How many will request a Maxmind opt-out?

Regarding trying to identify them from it would in itself be illegal probably?: maybe. But with all these IP addresses sitting available to anyone on the Internet who bothers to make an account, it doesn't really matter.


This actually isn't true. I build telecommunications software and although individual carriers may offer a binary true or false response to a DNC request, the actual DNC list is a csv of all the numbers in a given region / area code that are on the list. You must pay for access to the list, per area code or you can pay a larger fee for all area codes. From experience, this is why many telemarketers don't obey the DNC list. They want to, but they also don't want to pay the greater than $15k it costs to access all area codes' DNC list. You're also not allowed to resell access to the list in any way, so any service offering a scrub against the national DNC is violating those terms. Most services that say they do that still require you to give them your DNC access info (SAN).

The pay to access part I think is what separates it from the IP Address list here.


> * Expectations of privacy: Does Maxmind send out a response to those fulfilled complaint stating that their IP address will be made available to anyone on the Internet who registers for an account?

This can’t be otherwise, someone interested by this data would just have to diff databases to check for missing ip adresses.


Sure, but do you expect all consumers filing CCPA complaints to understand this? Even the more astute consumers filing complaints probably don't understand that Maxmind provides both flat files and API as a service for IP lookups. They should set expectations here.

Maxmind could be completely transparent here to consumers, for those who don't understand the consequences of requesting a removal using the current implementation:

Thank you for your CCPA request which we have fulfilled. Your IP address has been removed from our databases.

Furthermore, your IP address has been placed on https://www.maxmind.com/en/accounts/do-not-sell-requests for the next month. During this time period, your IP address will be displayed on this page. Of course, no other personal information you provided us will not be included. Please note that no specific privileges are required to access this page, other than an email address to create an account. We have no realistic countermeasures against this, or potentially hostile actors who may regularly archive this page content.

We value your privacy!

Maxmind Ltd


> Yes a list exists now of privacy-minded individuals' IP addresses, but I can't imagine what you would do with that.

"We are having a sale on Firewalls and RFID blocker wallets!"


Yes, in this case "security" by obfuscation is the only way to go.

I have a tip for people that don't like to be called by sales people. Buy two simcards, the first should be as anonymous as possible (cashcard or equivalent) then later buy a real subscription one, if you are lucky the subscription one will overwrite the cashcard number in all official indexes and you can use the first in your mobile and the second (that gets all the spam) in a mobile router for internet only.

Works for me and it's a life saver!


I'm glad this was posted here because I never would have heard about it otherwise, until one of my sites' monthly downloads would have broken.

But I still can't quite figure out if anything's changing technically except a new file location that requires an account to access, and new license terms -- or is there something else?

If someone requests a "do not sell" for their IP address that is in the middle of a range... is the range being split in 2, and skipping that address? Or are the ranges staying the same, and is there a separate "blacklist" of some kind that says, if the IP is any of these individual addresses, it's against the law to geolocate? And is that blacklist in the same file, or something to do with "we will... communicate all valid “Do Not Sell” requests to you as we receive them"?

Wish this post were both clearer, and that it had been announced at least a couple of months in advance, rather than less than 2 weeks before taking effect today. (Still, I can't complain too much -- it's free so I'm just glad the public databases exists at all.)


We apologize for not providing more notice. The CCPA was finalized in October, and it contains a number of ambiguities, particularly as it relates to our products. To arrive at our current interpretation, it has taken many weeks of working with privacy lawyers, following discussions in the privacy community, and observing actions other companies have taken. We're sorry for the timing, appreciating the frustration it causes, and wish we could have provided more notice.

MaxMind will provide immediate notification of Do Not Sell requests via the https://www.maxmind.com/en/accounts/current/do-not-sell-requ... page on our website (login required).

Future builds of the database will exclude those IP addresses by introducing the "split ranges" you describe.

Hope that helps.

Mark Fowler MaxMind


Given that for home users, IP address assignment is temporary, is there a plan to add them back after some time has elapsed?


Yes. We will add them back if we are confident they've been reallocated.

Jason Ketola MaxMind


Will it be possible for malicious users to inflate the size of the database dramatically by submitting thousands/millions of Do Not Sell IPs?

Also, what’s the procedure a user needs to go through before adding an IP to the Do Not Sell list? How do you determine the users authority to make a change? I didn’t see any info on the notice itself. Hopefully you require the user to show extended “ownership” of the IP over the course of a month or more, lest users on dynamic IPs and temporary AWS/GCP/etc IPs are submitted for removal.


The procedure is indicated in our privacy policy: https://www.maxmind.com/en/privacy-policy. We do have a vetting process in place.

Jason Ketola MaxMind


Mark, what's the best way to programmatically download the files? I created an account, but I need something I can do easily with cURL.

edit: This is it: https://download.maxmind.com/app/geoip_download?edition_id=G...

Use a license key generated in "my license key" section. Use quotes if in shell.


Thanks for the clarity and explanation -- totally appreciate that this was a challenge to figure out legally.

And thanks also for keeping the database itself up!


"... is the range being split in 2, and skipping that address?"

That would be funny, as it will create, by omission, a defacto database of "IP addresses of privacy nuts". Which might have commercial/advertising value.


OK, privacy nut here :)

I go to great lengths to avoid being associated with the IP address assigned by my ISP.

But I'd never request that it be removed from the dataset. It would obviously be stupid to do it as Mirimir (for example). But less obviously, it would also be stupid to do it as my meatspace identity. Because that would flag it as a privacy nut. Which I'm very careful to avoid doing.

So I wonder what people think that it's smart to have their IP addresses delisted.


I'm not sure I understand how the logic of this works.

For those that don't know, GeoLite2 databases are for the most part prefix trees on IP address space, they can contain concrete IP addresses, but more often they just map an IP address range to some metadata. This is particularly true about GeoLite2, which is a very coarse database.

To me, this is equivalent to a database saying in which state is a given zip-code. How can that imply any kind of personal data?


Maybe they just like the idea of more registered MaxMind users and found a plausible scapegoat? They say it even affects the "GeoLite2 Country" offering. That's pretty coarse :)


My thoughts exactly. Similarly when GDPR came in in Europe a lot of marketers used it as an excuse to email everyone they'd ever transacted with and refresh their list of warm prospects.


To my mind the null hypothesis would be "legal gonna legal".

If they'd been looking for an excuse to do this, I'd've expected the GDPR to be it.


"but more often they just map an IP address range to some metadata"

How did they acquire that metadata? This isn't a rhetorical question -- I am online with a provider that covers the entire country and services like MaxMind know my specific province/city (actually my little town just outside the city). I am unaware of any mechanism that the providers publish that declares specific locations -- or why they would even want to do that (the privacy implications are large, and it seems likely to be regulated) -- so how do these companies know?

Some providers, especially in Europe, cover multiple countries.

Traditionally it was by users running tools that would use either GPS coordinates or user-input and these geolocation providers would reverse engineer up a provider's tree. It is not a fundamental property of IPv4 and was not an intended use -- it just came about because there is commercial value is locating users. And now there's a massive perpetual data trawl monitoring massive amounts of data to keep it up to date from that nascent beginning.

And really it has always been a remarkable privacy intrusion that crept up and we simply accepted.

edit: I edited this a bit to more clearly convey that while providers clearly know the geolocation of every single IP, it's unclear how that data makes its way out.


MaxMind also sells a fraud prevention product, having access to a lot of transactions and their IP plus shipping address information probably helps to build a database like this


Providers definitely have geographical maps of IP ranges; these are useful for customer support and network diagnostics and maintenance.

Whether they sell this info is unknown to me. I suppose they legitimately could, given its being coarse.


A lot of providers include some geographic info in their reverse dns. My IP is currently mapped to X-X-X-X.tukw.qwest.net (ip redacted) which means I'm near Tukwila, WA. You can use that to put me in the Seattle metropolitan area which is usually close enough. Getting better accuracy than that means using other sources -- either data from the ISP, or from services that have IP and Geo information together and provide it somehow.


Possibly, but I'd wager that it doesn't play any part in this. Not only is it completely non-standard, using arbitrary divisions that often vary at completely different geographical levels, providers are woefully slow and inaccurate at actually updating it -- the few that still have such vestiges often publish completely inaccurate mappings (e.g. Bell Canada and Rogers both have this with their intermediary nodes, with descriptions that have no correlation with their actual locations).

My provider -- a nationwide provider -- simply uses a no-additional-information IP quartet with a suffix as the DNS. Zero geographical information. Many others follow the same trend. And the nodes in between vary and change frequently as they update their infrastructure and pathways.

I do remember when this whole exercise began and it started as a bunch of users running a desktop tool that would provide self-geolocation and publish it back. But...why? Why did we ever need this on the desktop and is it worth the privacy intrusion.


No idea where they get their data because for my IP address they are totally wrong about the city while the whois data is correct.


Interestingly enough, some of the information maxmind is providing can be found in RIRs databases, such as RIPE for instance: ftp://ftp.ripe.net/ripe/dbase/split/, ARIN: https://ftp.arin.net/pub/rr/

This is the same information than can usually be accessed using the "whois ip_address" command.

I wonder how this is going to be handled as it's necessary that this information remains public for network operators.

Would ping / mtr / traceroute need to be banned on a per-ip basis too as these tools can be used to triangulate?

The whole thing feels like "please remove my address from maps, it's private data", well yes, ok...


I'm pretty sure maxmind does more than simply keeping a whois database; they likely aggregate this data from multiple sources, but of course we don't know since it's a trade secret.


They do more than aggregate public sources. I looked into it in 2013, and found data in maxmind that was available from no public sources.

https://rant.gulbrandsen.priv.no/digital-envoy-maxmind jfyi


Well this sucks.

I depend on these to keep thousands of user's ips private. IPGeoBlock on many of Wordpress installs keep a lot of bad bots (and humans) out - and there are already options baked into that plugin to query several online DB's with the user IP to find country code... Most of my sites that allow other users to login, I keep their data private by telling ipGeoBlock to download the DB onto our server, check that, and NOT query the other online services.

Sure would be nice if you could still provide GeoLite2 Country GeoLite2 ASN

just remove all the USA ones :) - then I could have it query the downloaded DB if it finds a result block them - and if not result then let them try to login..

So they are going to offer the DB if registered and agree to whatever so called terms.. we should be able to get someone in a country without a jurisdiction that considers USA agreements legal - to get the DB and put it online right?

having to sign up with maxmind alone is reducing privacy for me - I guess unintended consequences - but sheesh!


Since the smallest globally routeable address is a /24 (256 address block), surely you can just use the block to resolve to country as there would be few situations where a block would be split across an international boundary (excluding point to point links).



Thanks for linking, I intentionally saved all files to archive.org as I was sure a lot of people would only notice when it is too late.

You can see all links with their hash here:

https://forum.matomo.org/t/maxmind-is-changing-access-to-fre...


This is fascinating and I’m sure will have lots of far-reaching effects for the courts to chew on... For example, do I “own” the IP address that my ISP assigned me? Is it really my PII? If enough people ask to be removed, have I harmed the ISPs property? What about dynamic IPs? I like that I know have the power to opt-out... but will it last?

As a developer, thinking about this is a great mental exercise for switching my thinking from “use public datasets” to “use user opt-in data” - what would I have used IP->GEO info for before? Guessing a user’s language? I can use Accept headers. Guessing a user’s real location? Better to use the web/mobile GPS api and get explicit consent. I guess the internal maps we make from our server logs will get less accurate... fine?


In the truest sense of property law, only the pre-RIR allocations (aka legacy allocations) are actually owned by companies. RIR allocations are assigned for use by companies, but ranges are still owned by the regional registrars.

You're just using/renting something.


Does it matter if i'm renting something or not? I thought the issue was surrounding privacy of users data. Would disclosing that a user is renting an ip address from a specific location be something that is prohibited?


OP here. My CI builds started breaking out of nowhere [1] because the public download URL of the Geolite2 database started giving 404s.

I reached upon the GitHub issue opened by MaxMind on the package I was using [2] who recommended that every user should create an account and download the package, so I used Git LFS to manually add the package to the repo for now, until I can come up with a better CI-driven solution, because one of the rules is that you need to update the database as soon as a new one comes out, and stop using the older version within 30 days of update, and you might need to provide this in writing as well.

[1] https://travis-ci.org/staart/api/builds/630988787

[2] https://github.com/runk/node-geolite2/issues/17

[3] https://www.maxmind.com/en/geolite2/eula


I'd like to apologize on behalf of MaxMind for breaking your CI build. Let me assure you as a fellow programmer, I know how frustrating that can be! As you saw we've tried to reach out to the larger open source projects using GeoLite2 in advance, but we've not had time to give as much notice as we would like.

The CCPA was finalized in October, and it contains a number of ambiguities, particularly as it relates to our products. To arrive at our current interpretation, it has taken many weeks of working with privacy lawyers, following discussions in the privacy community, and observing actions other companies have taken. We're sorry for the timing, appreciating the frustration it causes, and wish we could have provided more notice.

With regard to the regular updating of the database automatically, I recommend you take a look at our own tool for automatically downloading the latest database: GeoIP Update https://dev.maxmind.com/geoip/geoipupdate/

Hope that helps.

Mark Fowler MaxMind


Seems like this is the kind of thing you'd want to cache/host locally (as a separate step) so that your builds don't break anytime the MaxMind website is unavailable (for any number of reasons)?


That's pretty interesting that an IP address is enough to trigger CCPA. I thought it had to specifically tie to identity versus something broader like zip codes and city names to be covered under CCPA.

I wonder if you would be okay geolocating just the first three octets.


Any identifier that can be deemed to identify a person or household is covered. This means IP address, tokens, uuid's, etc.

* IANAL


Are there no zip codes in rural areas that are specific to a household?


Zip5 codes are allocated by geography and population. See very large rural tracts of land in AZ or NM that have a large zip5 code area. Zip-9 probably could be associated with one household. Zips are actually a pain in the ass for purposes other than mail delivery because they change from time to time, are not cleanly allocated to geography or logical features on a map. They can be discontinuous and overlapping. Many companies employee proprietary means of assigning geographic identifiers that are not dependent on zip, population size or anything other than geo.

https://en.m.wikipedia.org/wiki/ZIP_Code


To get a rough idea of how often a ZIP-9 narrows things down to a single street address, I took a look at the sales tax rate and boundary files made available for states that are in the Streamlined Sales Tax Agreement [1].

12 of those states, Arkansas, Georgia, Iowa, Kansas, Nebraska, North Carolina, Ohio, Oklahoma, South Dakota, Tennessee, Vermont, and Washington, use address-based tax rates. (The other 12 either just have one rate for the whole state, or just go by ZIP).

For each ZIP-9 in the address files for the 12 address-based states, I found the lowest and the highest street number for that ZIP-9. I then counted how many of the ZIP-9s had the lowest street number the same as the highest street number.

There were a total of 9,311,327 distinct ZIP-9 values.

2,415,305 of them had a low to high range that only included one number.

That's about 26% of the ZIP-9's having a unique street number. Note that this does not necessarily mean a single household, because I'm not looking at the full address. Apartment buildings, for instance, will in many cases show up in that 26%.

[1] https://www.streamlinedsalestax.org/Shared-Pages/rate-and-bo...


Excellent analysis. That’s like 2% or less of households in the US? So even at the Zip9 level not ideal for identifying specific households but problematic if you’re one of those 2M Zip9s.


So, if I were to make this request behind Carrier grade Nat no one from that provider can be tracked?

Conversely, if my DHCP lease renewal I can be tracked?


Fairly sure this will be abused in short order by (say) places looking to tie an IP address to an account name which MaxMind is now saying they'll have going forwards.

eg think BitTorrent litigation trolls


This is deeply unfortunate. I know the folks at Maxmind fairly well, and they're good people. I'm quite sure this isn't what they wanted to do, and that they've pushed their lawyers to let them continue distributing this data as much as is possible, in a way that is as gentle as possible. I applaud them for their efforts; it is appreciated.

Unfortunately, it's also clear to me that this renders the Maxmind geoip databases non-free. I've filed bugs for the removal of the geoipupdate package from Debian main, and I believe the geoip-database maintainer has already terminated updates.


This might not be a common experience nowadays (sigh), but I encountered this problem a few weeks ago in Perl. The long-used module at https://metacpan.org/pod/Geo::IP now mentions "the GeoIP Legacy file based database" and has a link to http://dev.maxmind.com/geoip/geolite which actually redirects to https://dev.maxmind.com/geoip/geoip2/geolite2/ . Interestingly, now that I look there again, it no longer refers (at the top of the page) to the January 2019 deprecation of the version 1 database, but now to "making significant changes to how you access free GeoLite2 databases starting December 30, 2019". (I now have to also review that for my workplace...)

Anyway, the new CPAN modules are under https://metacpan.org/pod/GeoIP2 and can be confusing. One confusing thing: despite a prominent Perl developer such as Dave Rolsky working (still?) at MaxMind, it says that the "module is deprecated and will only receive fixes for major bugs and security vulnerabilities".

For porting purposes, please note that the module you want to use is https://metacpan.org/pod/GeoIP2::Database::Reader . (They also mention https://metacpan.org/pod/MaxMind::DB::Reader which can makes things unclear...) I didn't see any other info on the github page about Perl support. Also the interface is definitely more tedious and object-oriented in the worst way. I'm sure there are good reasons... And in my experience it's much slower if you use the pure-Perl interface (this is something you should read the MaxMind::DB::Reader perldoc for: it mentions installing a C library in "PURE PERL VERSUS XS").

Good luck


It's a shame this only made it into here after the restriction was already in place.

Or of interest, does anyone know how long a historic copy might be considered valid? How regularly do people update their "copy" generally?


Someone from Maxmind has responded a few times in other places about why they are giving such short notice.

I think it boils down to the law just recently being firmed up and it becoming effective at the new year.

I also believe that you have to download a new version every 30 days to stay compliant with the license.


This provision in the EULA might be troublesome when dealing with backups:

You shall cease use of and destroy (i) any old versions of the Services within thirty (30) days following the release of the updated GeoLite2 Databases

I'm sure I can get customers to otherwise agree with the GeoLite2 terms when I install/get them to download the database, but how to get the GeoLite2 database out of their three-months-retained VM snapshots ? I can ensure we 'cease use' of a restored database, but ensuring destruction is a problem...

(The GDPR explicitly acknowledges this problem and allows you to keep data in your backups if it's too burdensome to remove it as long as you've taken measures to reapply data deletions on restore)


The biggest problem with the new Maxmind data is that it requires many MB of dependencies: https://github.com/maxmind/MaxMind-DB-Reader-java/issues/48


I do not work for any company that uses GeoLite2 but I do have a gaming server that I want to display the country of a player to the staff team. I signed up 3 days ago but no email received as of now. Should I contact maxmind and will I be able to get a license for such an intend use?


Seems to only cope with fixed IP addresses, which only some places use.

For example, my IP address (not in the US) changes ~18 hours or so and can come up in at least 30 different IP ranges that I've so far seen. (Used to make writing out SSH rules for remote access a pita, until automating it)


How is IP to broad location even PII?


Its not all that broad in many circumstances. You get a zip code, a latitude/longitude and a specific degree of error.


Last time I used Maxmind the coordinates generally just seemed to be the coordinates of the city. And apparently my current IP is in the center of another city in roughly the same part of the country. Beyond the country and possibly the ISP I wouldn't trust Maxmind's data much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: