He makes it sound as though this isn't particularly difficult or expensive. Does anyone want to speculate on the method he is using to compile this data? For his zipcode database he seems to be running mass queries on the Google Maps API.
Simply doing mass WHOIS queries to the regional registrars only gets you data about the corporate headquarters of the ISPs, which I suppose could be enough depending on the level of granularity that is acceptable to you.
(presently a similar question on his forum, posted by someone else, is unanswered: http://blogama.org/node/105)
I haven't verified that claim.
wc -l ip_group_city.csv
wc -l GeoLiteCity-Location.csv
This is also the frequency at which Maxmind releases updates. So...
Anyway, I'd still like to know what those sources are.
I'm guessing that you're merely pretending not to understand what's being implied, but I don't understand why...
Akamai has a (much more expensive) geolocation product that's based on doing triangulation, since they have a lot more information on routing (all those edge servers) but I'm not sure it's appreciably more accurate.
My file is updated once a day and is basically one giant PHP array. It works quite well for parsing through logs and I've used it on many of my hi-demand sites with some caching.
Is anyone interested in taking this data and turning it into a legit open source project? Let me know and I'll donate the domain name :)
You'd have to contact the author and make sure the data has an open license, but my guess is that he's OK with it.
Be nice to have a better idea of why this is more accurate than say MaxMind GeoIP, how much more accurate it is, where the data comes from etc.
I haven't seen anything that is all that accurate beyond that. Most geo location (for the UK anyway), pegs people at the ISPs head quarters, rather than their actual location. Which isn't really very useful.
"How accurate is the data?
Very accurate. The database is updated during the first week of each month."
To me this is not acceptable. Firstly, I want numbers and not a vague assurance. Secondly, I'm dubious whether a brand new service in this space is going to be, what I would consider to be, very accurate.
This isn't to say that the service is without merit just that I want a little more info and a little less marketing hyperbole.
The claim is 99.5% accuracy. Whether that is reproducible and whether it holds outside the US is unknown.
I spent a lot of time a few years back looking for something like this, and the existing solutions were just terrible. Like the author says, there just wasn't a usable dataset out there. Mostly there were a bunch of bad web APIs wanting to charge money for wildly inaccurate data.
So yeah, this might still be inaccurate, but it's a huge step forward.