
Changes to accessing and using Geolite2 databases - anandchowdhary
https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/
======
saurik
From CC BY SA 4.0: "No additional restrictions — You may not apply legal terms
or technological measures that legally restrict others from doing anything the
license permits."

GeoLite2's new EULA "incorporate[s] into this Agreement by reference"
specifically the CC BY SA 4.0 ("Creative Commons Corporation Attribution-
ShareAlike 4.0 International License") with a statement that this EULA (as
well as their data processing addendum DPA, privacy policy PP, and website
terms of service WT) take precedent in case of conflict.

"This Agreement controls in the event of any conflict with the above-
referenced documents. Thereafter, for any conflicts among the above 4
documents, the priority and precedence of interpretation is DPA, PP, WT and
Creative Commons License."

The EULA then has a number of what are _literally_ called "additional
restrictions" (omg), which restrict (to summarize) 1) how you distribute it
(with some kind of weird statement saying "where not inconsistent with the
other terms of this Agreement, as in the Creative Commons License", which
seems to invert the prioritization?! I don't know what to do with this...), 2)
how you secure your distribution (which I guess precludes any ability to
distribute the database along with an application? this is now only for
backend use?), 3) that you will destroy old copies of the database (which is
pretty egregious per CC BY SA, but I can appreciate this is likely the goal of
this new EULA for CCPA compliance), and 4) that you won't send personal data
to MaxMind (was anyone doing that before? ;P).

So, I don't understand the goal then with respect to "incorporat[ing] into
this Agreement by reference" the CC BY SA if it is no longer a license
agreement even remotely compatible with CC BY SA. Like, I was expecting this
EULA to be an unrelated license, not some attempt to provide an awkwardly
incompatible set of provisions. I now need to send this EULA off to my lawyer,
who is probably going to come back with something like "we recommend you don't
use this for any purpose unless we can get legal clarification".

------
nahikoa
Wait a minute, now that I've registered for Maxmind there is a _Do Not Sell My
Personal Information Requests_ page with the text:

 _The following IP addresses are associated with valid 'Do Not Sell My
Personal Information' requests as required by applicable privacy regulations,
and have been removed from (or will be removed from the next releases of) the
GeoIP2 and GeoLite2 databases. None of the IP addresses listed or contained
within a listed network may be used for advertising or marketing purposes._

If I had sent Maxmind a request for data removal, the last thing I would
expect is that my IP address would be shared with any internet user who
bothered to create a Maxmind account. Even if this page were removed, it might
not be difficult to obtain the opted-out addresses by doing a diff between
GeoIP2 free releases. Perhaps a search for narrow slivers of addresses removed
that were previously in California?

Law of unintended consequences for CCPA? Bad implementation for CCPA
compliance? What interesting things could be done with a list of publicly
available list of opted-out IP addresses?

~~~
crazygringo
There's no other way for it to work. It's like putting yourself on a "do not
call" list -- it only works if the list is visible so callers can obey it.

Yes a list exists now of privacy-minded individuals' IP addresses, but I can't
imagine what you would do with that, and trying to identify them from it would
in itself be illegal probably?

~~~
nahikoa
There's a few huge differences here from the DNC list.

* Implementation: Access to the DNC list is provided by various carriers for paying customers, with lookups available on a per-number basis. A binary { true, false } response is sent. EDIT: Thank you PaybackTony for the correction. The entire DNC list is also available as a CSV for $$$$.

* Expectations of privacy: Does Maxmind send out a response to those fulfilled complaints stating that their IP address will be made available to anyone on the Internet who registers for an account?

* Hiding in the crowd: How many numbers are in the DNC registry? How many will request a Maxmind opt-out?

Regarding _trying to identify them from it would in itself be illegal
probably?_ : maybe. But with all these IP addresses sitting available to
anyone on the Internet who bothers to make an account, it doesn't really
matter.

~~~
Eikon
> * Expectations of privacy: Does Maxmind send out a response to those
> fulfilled complaint stating that their IP address will be made available to
> anyone on the Internet who registers for an account?

This can’t be otherwise, someone interested by this data would just have to
diff databases to check for missing ip adresses.

~~~
nahikoa
Sure, but do you expect all consumers filing CCPA complaints to understand
this? Even the more astute consumers filing complaints probably don't
understand that Maxmind provides both flat files and API as a service for IP
lookups. They should set expectations here.

Maxmind could be completely transparent here to consumers, for those who don't
understand the consequences of requesting a removal using the current
implementation:

 _Thank you for your CCPA request which we have fulfilled. Your IP address has
been removed from our databases._

 _Furthermore, your IP address has been placed
on[https://www.maxmind.com/en/accounts/do-not-sell-
requests](https://www.maxmind.com/en/accounts/do-not-sell-requests) for the
next month. During this time period, your IP address will be displayed on this
page. Of course, no other personal information you provided us will not be
included. Please note that no specific privileges are required to access this
page, other than an email address to create an account. We have no realistic
countermeasures against this, or potentially hostile actors who may regularly
archive this page content._

 _We value your privacy!_

 _Maxmind Ltd_

------
crazygringo
I'm glad this was posted here because I never would have heard about it
otherwise, until one of my sites' monthly downloads would have broken.

But I still can't quite figure out if anything's changing technically except a
new file location that requires an account to access, and new license terms --
or is there something else?

If someone requests a "do not sell" for their IP address that is in the middle
of a range... is the range being split in 2, and skipping that address? Or are
the ranges staying the same, and is there a separate "blacklist" of some kind
that says, if the IP is any of these individual addresses, it's against the
law to geolocate? And is that blacklist in the same file, or something to do
with "we will... communicate all valid “Do Not Sell” requests to you as we
receive them"?

Wish this post were both clearer, and that it had been announced at least a
couple of months in advance, rather than less than 2 weeks before taking
effect today. (Still, I can't complain too much -- it's free so I'm just glad
the public databases exists at all.)

~~~
2shortplanks
We apologize for not providing more notice. The CCPA was finalized in October,
and it contains a number of ambiguities, particularly as it relates to our
products. To arrive at our current interpretation, it has taken many weeks of
working with privacy lawyers, following discussions in the privacy community,
and observing actions other companies have taken. We're sorry for the timing,
appreciating the frustration it causes, and wish we could have provided more
notice.

MaxMind will provide immediate notification of Do Not Sell requests via the
[https://www.maxmind.com/en/accounts/current/do-not-sell-
requ...](https://www.maxmind.com/en/accounts/current/do-not-sell-requests)
page on our website (login required).

Future builds of the database will exclude those IP addresses by introducing
the "split ranges" you describe.

Hope that helps.

Mark Fowler MaxMind

~~~
gav
Given that for home users, IP address assignment is temporary, is there a plan
to add them back after some time has elapsed?

~~~
maxmindjason
Yes. We will add them back if we are confident they've been reallocated.

Jason Ketola MaxMind

------
dsign
I'm not sure I understand how the logic of this works.

For those that don't know, GeoLite2 databases are for the most part prefix
trees on IP address space, they _can_ contain concrete IP addresses, but more
often they just map an IP address range to some metadata. This is particularly
true about GeoLite2, which is a very coarse database.

To me, this is equivalent to a database saying in which state is a given zip-
code. How can that imply any kind of personal data?

~~~
endorphone
"but more often they just map an IP address range to some metadata"

How did they acquire that metadata? This isn't a rhetorical question -- I am
online with a provider that covers the entire country and services like
MaxMind know my specific province/city (actually my little town just outside
the city). I am unaware of any mechanism that the providers publish that
declares specific locations -- or why they would even want to do that (the
privacy implications are large, and it seems likely to be regulated) -- so how
do these companies know?

Some providers, especially in Europe, cover multiple countries.

Traditionally it was by users running tools that would use either GPS
coordinates or user-input and these geolocation providers would reverse
engineer up a provider's tree. It is not a fundamental property of IPv4 and
was not an intended use -- it just came about because there is commercial
value is locating users. And now there's a massive perpetual data trawl
monitoring massive amounts of data to keep it up to date from that nascent
beginning.

And really it has always been a remarkable privacy intrusion that crept up and
we simply accepted.

edit: I edited this a bit to more clearly convey that while providers clearly
know the geolocation of every single IP, it's unclear how that data makes its
way out.

~~~
toast0
A lot of providers include some geographic info in their reverse dns. My IP is
currently mapped to X-X-X-X.tukw.qwest.net (ip redacted) which means I'm near
Tukwila, WA. You can use that to put me in the Seattle metropolitan area which
is usually close enough. Getting better accuracy than that means using other
sources -- either data from the ISP, or from services that have IP and Geo
information together and provide it somehow.

~~~
endorphone
Possibly, but I'd wager that it doesn't play any part in this. Not only is it
completely non-standard, using arbitrary divisions that often vary at
completely different geographical levels, providers are woefully slow and
inaccurate at actually updating it -- the few that still have such vestiges
often publish completely inaccurate mappings (e.g. Bell Canada and Rogers both
have this with their intermediary nodes, with descriptions that have no
correlation with their actual locations).

My provider -- a nationwide provider -- simply uses a no-additional-
information IP quartet with a suffix as the DNS. Zero geographical
information. Many others follow the same trend. And the nodes in between vary
and change frequently as they update their infrastructure and pathways.

I do remember when this whole exercise began and it started as a bunch of
users running a desktop tool that would provide self-geolocation and publish
it back. But...why? Why did we ever need this on the desktop and is it worth
the privacy intrusion.

------
Eikon
Interestingly enough, some of the information maxmind is providing can be
found in RIRs databases, such as RIPE for instance:
ftp://ftp.ripe.net/ripe/dbase/split/, ARIN:
[https://ftp.arin.net/pub/rr/](https://ftp.arin.net/pub/rr/)

This is the same information than can usually be accessed using the "whois
ip_address" command.

I wonder how this is going to be handled as it's necessary that this
information remains public for network operators.

Would ping / mtr / traceroute need to be banned on a per-ip basis too as these
tools can be used to triangulate?

The whole thing feels like "please remove my address from maps, it's private
data", well yes, ok...

~~~
judge2020
I'm pretty sure maxmind does more than simply keeping a whois database; they
likely aggregate this data from multiple sources, but of course we don't know
since it's a trade secret.

~~~
Arnt
They do more than aggregate public sources. I looked into it in 2013, and
found data in maxmind that was available from no public sources.

[https://rant.gulbrandsen.priv.no/digital-envoy-
maxmind](https://rant.gulbrandsen.priv.no/digital-envoy-maxmind) jfyi

------
stevenicr
Well this sucks.

I depend on these to keep thousands of user's ips private. IPGeoBlock on many
of Wordpress installs keep a lot of bad bots (and humans) out - and there are
already options baked into that plugin to query several online DB's with the
user IP to find country code... Most of my sites that allow other users to
login, I keep their data private by telling ipGeoBlock to download the DB onto
our server, check that, and NOT query the other online services.

Sure would be nice if you could still provide GeoLite2 Country GeoLite2 ASN

just remove all the USA ones :) - then I could have it query the downloaded DB
if it finds a result block them - and if not result then let them try to
login..

So they are going to offer the DB if registered and agree to whatever so
called terms.. we should be able to get someone in a country without a
jurisdiction that considers USA agreements legal - to get the DB and put it
online right?

having to sign up with maxmind alone is reducing privacy for me - I guess
unintended consequences - but sheesh!

~~~
jlawer
Since the smallest globally routeable address is a /24 (256 address block),
surely you can just use the block to resolve to country as there would be few
situations where a block would be split across an international boundary
(excluding point to point links).

------
mpetroff
The Wayback Machine has copies of the last CC BY-SA 4.0 version:

[https://web.archive.org/web/20191227182209/https://geolite.m...](https://web.archive.org/web/20191227182209/https://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz)

[https://web.archive.org/web/20191227182412/https://geolite.m...](https://web.archive.org/web/20191227182412/https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz)

[https://web.archive.org/web/20191227182527/https://geolite.m...](https://web.archive.org/web/20191227182527/https://geolite.maxmind.com/download/geoip/database/GeoLite2-ASN.tar.gz)

[https://web.archive.org/web/20191227182816/https://geolite.m...](https://web.archive.org/web/20191227182816/https://geolite.maxmind.com/download/geoip/database/GeoLite2-City-
CSV.zip)

[https://web.archive.org/web/20191227183011/https://geolite.m...](https://web.archive.org/web/20191227183011/https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country-
CSV.zip)

[https://web.archive.org/web/20191227183143/https://geolite.m...](https://web.archive.org/web/20191227183143/https://geolite.maxmind.com/download/geoip/database/GeoLite2-ASN-
CSV.zip)

And the last copy of the download page before the download links were removed,
for reference:
[https://web.archive.org/web/20191222130401/https://dev.maxmi...](https://web.archive.org/web/20191222130401/https://dev.maxmind.com/geoip/geoip2/geolite2/)

~~~
Findus23
Thanks for linking, I intentionally saved all files to archive.org as I was
sure a lot of people would only notice when it is too late.

You can see all links with their hash here:

[https://forum.matomo.org/t/maxmind-is-changing-access-to-
fre...](https://forum.matomo.org/t/maxmind-is-changing-access-to-free-
geolite2-databases/35439/3)

------
evantahler
This is fascinating and I’m sure will have lots of far-reaching effects for
the courts to chew on... For example, do I “own” the IP address that my ISP
assigned me? Is it really my PII? If enough people ask to be removed, have I
harmed the ISPs property? What about dynamic IPs? I like that I know have the
power to opt-out... but will it last?

As a developer, thinking about this is a great mental exercise for switching
my thinking from “use public datasets” to “use user opt-in data” - what would
I have used IP->GEO info for before? Guessing a user’s language? I can use
Accept headers. Guessing a user’s real location? Better to use the web/mobile
GPS api and get explicit consent. I guess the internal maps we make from our
server logs will get less accurate... fine?

~~~
scurvy
In the truest sense of property law, only the pre-RIR allocations (aka legacy
allocations) are actually owned by companies. RIR allocations are assigned for
use by companies, but ranges are still owned by the regional registrars.

You're just using/renting something.

~~~
bestnameever
Does it matter if i'm renting something or not? I thought the issue was
surrounding privacy of users data. Would disclosing that a user is renting an
ip address from a specific location be something that is prohibited?

------
anandchowdhary
OP here. My CI builds started breaking out of nowhere [1] because the public
download URL of the Geolite2 database started giving 404s.

I reached upon the GitHub issue opened by MaxMind on the package I was using
[2] who recommended that every user should create an account and download the
package, so I used Git LFS to manually add the package to the repo for now,
until I can come up with a better CI-driven solution, because one of the rules
is that you need to update the database as soon as a new one comes out, and
stop using the older version within 30 days of update, and you might need to
provide this in writing as well.

[1] [https://travis-ci.org/staart/api/builds/630988787](https://travis-
ci.org/staart/api/builds/630988787)

[2] [https://github.com/runk/node-
geolite2/issues/17](https://github.com/runk/node-geolite2/issues/17)

[3]
[https://www.maxmind.com/en/geolite2/eula](https://www.maxmind.com/en/geolite2/eula)

~~~
2shortplanks
I'd like to apologize on behalf of MaxMind for breaking your CI build. Let me
assure you as a fellow programmer, I know how frustrating that can be! As you
saw we've tried to reach out to the larger open source projects using GeoLite2
in advance, but we've not had time to give as much notice as we would like.

The CCPA was finalized in October, and it contains a number of ambiguities,
particularly as it relates to our products. To arrive at our current
interpretation, it has taken many weeks of working with privacy lawyers,
following discussions in the privacy community, and observing actions other
companies have taken. We're sorry for the timing, appreciating the frustration
it causes, and wish we could have provided more notice.

With regard to the regular updating of the database automatically, I recommend
you take a look at our own tool for automatically downloading the latest
database: GeoIP Update
[https://dev.maxmind.com/geoip/geoipupdate/](https://dev.maxmind.com/geoip/geoipupdate/)

Hope that helps.

Mark Fowler MaxMind

------
tyingq
That's pretty interesting that an IP address is enough to trigger CCPA. I
thought it had to specifically tie to identity versus something broader like
zip codes and city names to be covered under CCPA.

I wonder if you would be okay geolocating just the first three octets.

~~~
throwaway95914
Any identifier that can be deemed to identify a person or household is
covered. This means IP address, tokens, uuid's, etc.

* IANAL

~~~
frenchyatwork
Are there no zip codes in rural areas that are specific to a household?

~~~
rubyfan
Zip5 codes are allocated by geography and population. See very large rural
tracts of land in AZ or NM that have a large zip5 code area. Zip-9 probably
could be associated with one household. Zips are actually a pain in the ass
for purposes other than mail delivery because they change from time to time,
are not cleanly allocated to geography or logical features on a map. They can
be discontinuous and overlapping. Many companies employee proprietary means of
assigning geographic identifiers that are not dependent on zip, population
size or anything other than geo.

[https://en.m.wikipedia.org/wiki/ZIP_Code](https://en.m.wikipedia.org/wiki/ZIP_Code)

~~~
tzs
To get a rough idea of how often a ZIP-9 narrows things down to a single
street address, I took a look at the sales tax rate and boundary files made
available for states that are in the Streamlined Sales Tax Agreement [1].

12 of those states, Arkansas, Georgia, Iowa, Kansas, Nebraska, North Carolina,
Ohio, Oklahoma, South Dakota, Tennessee, Vermont, and Washington, use address-
based tax rates. (The other 12 either just have one rate for the whole state,
or just go by ZIP).

For each ZIP-9 in the address files for the 12 address-based states, I found
the lowest and the highest street number for that ZIP-9. I then counted how
many of the ZIP-9s had the lowest street number the same as the highest street
number.

There were a total of 9,311,327 distinct ZIP-9 values.

2,415,305 of them had a low to high range that only included one number.

That's about 26% of the ZIP-9's having a unique street number. Note that this
does not necessarily mean a single household, because I'm not looking at the
full address. Apartment buildings, for instance, will in many cases show up in
that 26%.

[1] [https://www.streamlinedsalestax.org/Shared-Pages/rate-and-
bo...](https://www.streamlinedsalestax.org/Shared-Pages/rate-and-boundary-
files)

~~~
rubyfan
Excellent analysis. That’s like 2% or less of households in the US? So even at
the Zip9 level not ideal for identifying specific households but problematic
if you’re one of those 2M Zip9s.

------
hlieberman
This is deeply unfortunate. I know the folks at Maxmind fairly well, and
they're good people. I'm quite sure this isn't what they wanted to do, and
that they've pushed their lawyers to let them continue distributing this data
as much as is possible, in a way that is as gentle as possible. I applaud them
for their efforts; it is appreciated.

Unfortunately, it's also clear to me that this renders the Maxmind geoip
databases non-free. I've filed bugs for the removal of the geoipupdate package
from Debian main, and I believe the geoip-database maintainer has already
terminated updates.

------
kwoff
This might not be a common experience nowadays (sigh), but I encountered this
problem a few weeks ago in Perl. The long-used module at
[https://metacpan.org/pod/Geo::IP](https://metacpan.org/pod/Geo::IP) now
mentions "the GeoIP Legacy file based database" and has a link to
[http://dev.maxmind.com/geoip/geolite](http://dev.maxmind.com/geoip/geolite)
which actually redirects to
[https://dev.maxmind.com/geoip/geoip2/geolite2/](https://dev.maxmind.com/geoip/geoip2/geolite2/)
. Interestingly, now that I look there again, it no longer refers (at the top
of the page) to the January 2019 deprecation of the version 1 database, but
now to "making significant changes to how you access free GeoLite2 databases
starting December 30, 2019". (I now have to also review that for my
workplace...)

Anyway, the new CPAN modules are under
[https://metacpan.org/pod/GeoIP2](https://metacpan.org/pod/GeoIP2) and can be
confusing. One confusing thing: despite a prominent Perl developer such as
Dave Rolsky working (still?) at MaxMind, it says that the "module is
deprecated and will only receive fixes for major bugs and security
vulnerabilities".

For porting purposes, please note that the module you want to use is
[https://metacpan.org/pod/GeoIP2::Database::Reader](https://metacpan.org/pod/GeoIP2::Database::Reader)
. (They also mention
[https://metacpan.org/pod/MaxMind::DB::Reader](https://metacpan.org/pod/MaxMind::DB::Reader)
which can makes things unclear...) I didn't see any other info on the github
page about Perl support. Also the interface is definitely more tedious and
object-oriented in the worst way. I'm sure there are good reasons... And in my
experience it's much slower if you use the pure-Perl interface (this is
something you should read the MaxMind::DB::Reader perldoc for: it mentions
installing a C library in "PURE PERL VERSUS XS").

Good luck

------
alias_neo
It's a shame this only made it into here after the restriction was already in
place.

Or of interest, does anyone know how long a historic copy might be considered
valid? How regularly do people update their "copy" generally?

~~~
d1str0
Someone from Maxmind has responded a few times in other places about why they
are giving such short notice.

I think it boils down to the law just recently being firmed up and it becoming
effective at the new year.

I also believe that you have to download a new version every 30 days to stay
compliant with the license.

------
unilynx
This provision in the EULA might be troublesome when dealing with backups:

 _You shall cease use of and destroy (i) any old versions of the Services
within thirty (30) days following the release of the updated GeoLite2
Databases_

I'm sure I can get customers to otherwise agree with the GeoLite2 terms when I
install/get them to download the database, but how to get the GeoLite2
database out of their three-months-retained VM snapshots ? I can ensure we
'cease use' of a restored database, but ensuring destruction is a problem...

(The GDPR explicitly acknowledges this problem and allows you to keep data in
your backups if it's too burdensome to remove it as long as you've taken
measures to reapply data deletions on restore)

------
bullen
The biggest problem with the new Maxmind data is that it requires many MB of
dependencies: [https://github.com/maxmind/MaxMind-DB-Reader-
java/issues/48](https://github.com/maxmind/MaxMind-DB-Reader-java/issues/48)

------
WGeorge480
I do not work for any company that uses GeoLite2 but I do have a gaming server
that I want to display the country of a player to the staff team. I signed up
3 days ago but no email received as of now. Should I contact maxmind and will
I be able to get a license for such an intend use?

------
justinclift
Seems to _only_ cope with fixed IP addresses, which only some places use.

For example, my IP address (not in the US) changes ~18 hours or so and can
come up in at least 30 different IP ranges that I've so far seen. (Used to
make writing out SSH rules for remote access a pita, until automating it)

------
foota
How is IP to broad location even PII?

~~~
SanchoPanda
Its not all that broad in many circumstances. You get a zip code, a
latitude/longitude and a specific degree of error.

~~~
jeltz
Last time I used Maxmind the coordinates generally just seemed to be the
coordinates of the city. And apparently my current IP is in the center of
another city in roughly the same part of the country. Beyond the country and
possibly the ISP I wouldn't trust Maxmind's data much.

