Hacker Newsnew | past | comments | ask | show | jobs | submit | reincoder's commentslogin

I work for IPinfo. We provide a free country and ASN database on a free tier with an unlimited amount of requests. You can download the entire database or use the API services. For country and ASN, it is free.

However, we do not offer city level data for free. > How would one even go about verifying it?

We believe we are the most accurate IP data provider out there, but you should come to that conclusion yourself.

I can tell you why our data is super accurate compared to the rest of the industry. The industry as a whole uses self-reported information that is offered by ASN and ISPs. It is called "geofeed". The issue with geofeed is that IP geolocation providers do not tend to verify the accuracy. Many providers just aggregate these public records and repeat what the ISPs and ASNs want them to tell them. This is a quite bad practice.

So we built a network of distributed servers (currently 1360 servers across 160 countries) that run ping, traceroute and other internet measurements and try to infer the location of IP geolocations. This means when you come to asking how do I know you are accurate, we can share our active measurement data and tell you that this is the evidence.

Now, comes the qustions of how you identify accuracy yourself.

First, if you have access to a large pool of known locations of IP addresses, you can run comparisons across different vendors. You need a GPS-backed device to locate IP addresses.

If you do not have a large pool of well-known location IPs, you can take a sample of IP addresses and check them yourself across multiple vendors. You can then use a tool like ping.sx or our own tool ipinfo.io/probenet/live to see evidence of where these IP addresses are located based on latency.

Do not bet on consensuses among IP geolocation providers; run your own tests.

Our data was evaluated by peer-reviewed academic research. You can take a look at that as well, if you want.

> I am not really using this data for anything other than have enough data to troubleshoot customer support/fraud.

Now, I will be honest...you should not pay anything to us. The way you have describing your issue, it seems like the free services we already offer that should satisfy your need.

Do you really need large scale IP address enrichment of all the IP addresses that visit your website? If yes, then for the first layer use our free data that provides ASN and country information.

Then, when you need troubleshooting with your customers, you can look up those individual IP addresses for free on our website, where we provide all our data for free access.

---

Let me know if you need any help, always happy to answer questions.


So many of my open questions answered in one answer. Thank you.

A follow up based on new information - if 'geofeed' identifies something with wrong geo location, and your method detects different geolocation, what do I see as the consumer consuming your API? I am assuming the inferred data, but that also feels counter-intuitive (since the data does not align with what ASN/ISP are reporting).

How often does your active measurement data disagree with geofeed data?

How do you handle mobile/cellular IPs

> Do you really need large scale IP address enrichment of all the IP addresses that visit your website? If yes, then for the first layer use our free data that provides ASN and country information.

If I am troubleshooting a support case that is days/weeks/months old, wouldn't this mean that enriching this information at a later date may give me different data than what it was associated with at the time the requests were made? My understanding was that IPs get re-assigned.

How frequently do IP-to-location mappings change in practice?

Do you offer historical IP data snapshots?


> I am assuming the inferred data, but that also feels counter-intuitive (since the data does not align with what ASN/ISP are reporting).

That is a very good question. Now, geofeed does not have a verification system. Active measurement is something we use to verify ASN or ISP itself.

Even active measurement has its own limitations. Now in those case where we see active measurements not producing reliable data, we do reach out to ISPs and ASNs to purchase a server in their facility. Geofeed as a system is voluntary and most major ISPs actually do not maintain or even publish that. For example, today I found out a major UK-based telecom geolocated 500k IP addresses in a town with 200k people. ISPs are not inherently incentivized to maintain the accuracy of their self-reported, voluntarily published location data. So, we do proactive outreach to purchase a server from them so we can provide consistent accurate data for their IP addresses.

On the matter of advertised locations not matching actual location, I highly recommend reading this: https://ipinfo.io/blog/vpn-location-mismatch-report

For residential ISPs, we do a lot of outreach and open communication to build a good partnership with them. The goal is that we pay for the privilege to report accurate data for them.

> How often does your active measurement data disagree with geofeed data?

Very frequently.

Here is the summary peer reviewed research paper on this matter: https://community.ipinfo.io/t/ip-geolocation-and-geofeeds-wh...

Active Measurement (1,330 probes, 27.7M RTTs):

  - Country-level: 92.0% accurate → 8% wrong country
  - City-level: 79.6% accurate → 20.4% wrong city
Mobile Device GPS (169 devices, 24 countries):

  - Country-level: 84.5% accurate
  - City-level: 29.9% accurate → 70% wrong city
> How do you handle mobile/cellular IPs

Primarily through active measurement, we are also running a lot of research around more reliable mobile geolocation data.

Because our data is updated daily, I think due to the refresh rate we have an accuracy advantage.

> If I am troubleshooting a support case that is days/weeks/months old, wouldn't this mean that enriching this information at a later date may give me different data than what it was associated with at the time the requests were made? My understanding was that IPs get re-assigned.

You will be surprised to know that historical IP location does not have much demand.

If you are evaluating a support case after some time, you should work with your current data. If the customer raises a question, you address this in real time with their current IP address.

Usually, I do not recommend storing historic IP geolocation information. In most operations, the enrichment happens in real time within the day. Unless you want to do periodic reporting of some sort.

Internally, we of course have the data, but because our IP geolocation is so accurate, it currently sits at around 700 MB. If you add a historical layer to that data, it will be a terabyte of data. There is not much consumer need for it.

> How frequently do IP-to-location mappings change in practice?

https://ipinfo.io/blog/how-many-ips-change-geolocation-over-...

On the city level is 1.3% each day and 16% each month.

> Do you offer historical IP data snapshots?

I highly recommend that you work with current day's data.

In cases where we provide historical data, it is usually for academic research.

---

Let me know if you have any more questions.


> On the matter of advertised locations not matching actual location, I highly recommend reading this: https://ipinfo.io/blog/vpn-location-mismatch-report

Good read

Do you happen to know if anyone is compiling all of this data about VPNs into one place? It would be super interesting to know which VPNs are providing genuine services vs masquerading the locations. Maybe even an SEO for you.

> I highly recommend that you work with current day's data.

Just to clarify: You are suggesting that we don't pro-actively enrich every IP address, store IPs, and only enrich them when troubleshooting something?


> Do you happen to know if anyone is compiling all of this data about VPNs into one place? It would be super interesting to know which VPNs are providing genuine services vs masquerading the locations. Maybe even an SEO for you.

We made that report independently and, according to our analysis, we only identified three VPNs: Windscribe, Mullvad, and iVPN to not have virtual VPN server locations.

> Just to clarify: You are suggesting that we don't pro-actively enrich every IP address, store IPs, and only enrich them when troubleshooting something?

I think you should experiment with this yourself a little. The Lite API is completely free. So you can do ingestion enrichment and post-enrichment enrichment. See what works best for you.


Did a quick dive to explore viability of migrating to ipinfo. My idea was: use lite version for enriching everything and then use pay-as-you-go for enriching authenticated user sessions.

I couldn't get /lite/ to work. In a sample of IPs I've tried with, multiple are returning 404. Your website for the same IPs is returning information. Looks like these are just not included in the lite dataset?

Turns out there is no pay-as-you-go tier. Subscription is the only option. Not a deal breaker, but dissapointing setup.


> I couldn't get /lite/ to work.

Email me: abdullah@ipinfo.io

I think there is an issue with setting up our API.


Just to close the (public) loop, the issue was that we were using wrong API endpoint: ipinfo.io/lite instead of api.ipinfo.io/lite.

Thank you Abdullah


I tried using the /lite/ endpoint to get country data, but it is giving 404 errors for valid IPs.

{ "status": 404, "error": { "title": "Wrong ip", "message": "Please provide a valid IP address" } }


I work for IPinfo. The accuracy you see is inferred data actually. Our IP address location should not perfectly pinpoint anyone, unless that IP address is a data center of some sort. The highest accuracy for a non-data center IP address is usually at the ZIP code level. In terms of carrier IP addresses, currently we do one data update per day. If we did more, I guess the accuracy of mobile IP addresses would improve, but on an overall scale, it would be quite miniscule.

Our country-level data (which is free) is 10-15 times larger than the free/paid country-level data out there. We constantly hear that the size of the database is an issue. The size is a consequence of accuracy in the first place. So, it is a balancing act.


> Our IP address location should not perfectly pinpoint anyone, unless that IP address is a data center of some sort.

By perfectly, I meant it got my city and zip correct, but I looked up the lat/lng and its a 5 min drive away. So pretty dang close!

Not sure how you got it that close if its only supposed to point to the nearest data center.


I work for IPinfo. Has our data been inconsistent for you? We actually invest heavily and continuously in data accuracy. I think for hosting IP addresses we are nearing the highest level of accuracy possible, especially with data center addresses. We are investing in novel, cutting-edge research for carrier IP geolocation.

I am curious about your experience with us so far.


I work for IPinfo. We track close to a hundred resproxy providers. So, if OP's router is compromised, the device IPs will likely be flagged.

From what I know, whenever a router is backdoored or a resproxy SDK gains access to a device to use their bandwidth, the access to that pool of devices is often shared among multiple resproxy vendors. Many resproxy vendors do not have their own SDKs for their services.

Also, as far as I know, not many resproxy operators manage their sim farms or hardware pools. It is mostly based on compromised devices or SDK access.


This is called a geofeed. Companies that own or operate IP addresses can customarily share the location of those IP addresses. This is less of "IP-based Geolocation" rather "Geolocation of IP addresses.


Thank you, Dimitry. Everyone at IPinfo really appreciates the shoutout!

---

Our research scientist, Calvin, will be giving a talk at NANOG96 on Monday that delves into active measurement-based IP geolocation.

https://nanog.org/events/nanog-96/content/5678/


I work for IPinfo. We are launching a collaborative project with IXPs and major internet organizations to share raw measurement for routing and peering data for this purpose.

Latency variability is a huge issue. We run both traceroute and ping data, and we observe that there are entire countries that peer with IXP thousands of miles away in a different continent.

We bought a server from the oldest telecom company in the country and recently activated it. Currently, there is a 20 ms latency when traffic is directed towards the second oldest telecom. The packets have to travel outside the country before coming back in. This is a common phenomenon that occurs frequently. So, we usually have multiple servers in major cities since various ASNs have different peering policies.

For us we can map those behaviors and have algorithms and other data sources, make measurement-based geolocation perform well.

We are hoping to support IXPs, internet governance agencies, and major telcoms in identifying these issues and resolving them.


What is your path towards 'resolving' these issues?

I've done some mapping while comparing turn servers my org hosted on cloud vms vs a commercial offering, and it's pretty easy to find very different routing from point A to point B, but sometimes it's pretty clearly that not every transit network has access to every submarine cable, so traffic from say Brazil to South Africa might go from Brazil directly to Africa, or it might go to Florida, then Europe, then Africa. It'd be nice to take a more direct route, but maybe the Brazil -> Africa hop doesn't transit all the way, so BGP prefers the scenic route as it has a shorter AS path.

I didn't have any leverage to motivate routing changes though, so other than saying hmm, that's interesting, there wasn't much to do about it.


From our data side, we focus on network diversity and conduct continuous measurements. Due to the nature of our measurements and our knowledge of the precise locations of all 1330 servers, we understand how network packets travel across the internet. We simplify this information into algorithms and know how to accommodate detours that packets may take. There are specific patterns that we can identify and map, like some African servers route their traffic through LINX or a French IXP. If you are not connecting to private networks or even major telecoms on EU-based IXPs.

To help the system, we are reaching out to IXPs, major telecoms and peering agencies to advise them on how to peer and make critical internet routing decisions. We want to tell them on how to engage in data-focused peering, how their IXP is perceived from a broader internet data perspective, and how their packets from the IXP travel across the internet. We hope this colloboration will bring much needed efficiency in internet routing.


So, there is a dashboard internally for that. When we do ProbeNet PoP assessment, we have a high-level overview of the frequent and favored connections. We have a ton of servers in Africa, and there is a strong routing bias towards France, Germany, and the UK instead of neighboring connections.

Everyone in our engineering and leadership is very close with various CDN companies. We do echo this idea to them. It is not IP geolocation; we actually have a ton of routing data they can use.


Some of our (IPinfo) services are hosted on GCP, and because our service is widely used (with 2 trillion requests processed in 2024) people sometimes say they cannot access our service. It is usually due to how Google's device-based IP geolocation is used. The user's IP address is often mistakenly identified as being located in a country where Google does not offer service.

I have seen a Europe-based cloud hosting provider's IP ranges located in countries where Google does not provide service. This is because these IP ranges are used as exit nodes by VPN users in that country.

Device-based IP geolocation is strange. We prefer IP geolocation based on the last node's IP geolocation. We hope to collaborate with Google, Azure, and other big tech on this if they reach out to us.


Yeah. This can be a problem.

The device-based IP geolocation, because the algo is so sensitive and the result can be altered with few devices behind the IP (at least for Google), can be used theoretically steering / trick big techs to believe that the IP is at location it is not, just like VPN providers in your article by publishing "bogon" geofeed etc. This defies their purpose of doing this in the first place: geolocking and regulatory requirements.

The "tech" is already there: browser extensions [1] that overwrite the JS GeoLocation API to show "fake" locations to the website (designed for privacy purpose). also dongles are available on gray market that can be attached to iPhone / Android devices to alter the geolocation API result by pretending it is some kind of higher precision GPS device but instead providing bogon data to the OS. Let alone after jailbreaking / rooting your device, you can provide whatever geolocation to the apps.

[1] https://github.com/chatziko/location-guard


That is really interesting. I wonder if we have any internal data on this. I will check.

We are trying to work with ISPs everywhere, so if port level geolocation of the IP address is common, we surely need to account for that. I will flag this to the data team. To get the ball rolling, I would love to talk to an ISP operator who operates like this. If you know someone please kindly introduce me to them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: