The focus is identifying the undisclosed use of virtual locations by some popular VPN services. But that's not why I've submitted it to HN.
I'd appreciate comment and criticism about the criterion that I've used for virtual locations. That is, where apparent signal velocity is greater than the speed of light, I conclude that the disclosed server location is implausible. And so it must be a virtual location.
I calculate apparent signal velocity by dividing twice the calculated great circle distance between server and ping probe, by the minimum round-trip time (minrtt). For server locations, I rely primarily on whatever the VPN service has disclosed.
I'd also appreciate comment and criticism about my explanation of how VPN services achieve virtual locations ("How Virtual Server Locations Are Possible"). Basically that they exploit discrepancies between location information that's associated with IP addresses, and BGP advertisements based on address announcements.
Finally, for 195 servers that are apparently located near Nuland, I found an interesting pattern in the ping data.
For ping probes in Europe, minrtt is 1-30 msec. Then there’s a gap in observed minrtt across the Atlantic, with apparently much faster transmission. I’m guessing that's because trans-Atlantic cables don’t have many routers. Next there are groups for eastern and western North America, with faster transmission than within Europe, but slower than across the Atlantic. That probably also reflects router density. Last, there's a jump to Asia and Australia.
Does that make sense?
The core of the article is one of (mis)representation. The VPN providers are trying to present broad network coverage with a minimum of expenditure, and the clients are concerned with actual geophysical location. Virtual Locations seems to be a term designed to comingle the concepts, and obfuscate actual location and connectivity.
The clustering is because there simply arent that many physical locations with connectivity to various networks (Internet Exchange Points “IXPs”). Those IXPs are themselves often clustered and associated with a few nearby physical facilities (DCs or Colo(cation) for hosting general compute servers.
You mention companies “leasing” address blocks. Thats probably more correct to say theyre “delegated” from NSPs to clients like the vpn companies.
In general Id find whois/rir data to be of very little use. Theres little to no association or restriction for either of those and where the IP addresses are actually announced or used.
It may not matter for your use/methodology but latency for ICMP is unreliable. Many networks will use a lower traffic priority for it, introducing variable loss/jitter/delay. If at all possible actual TCP or UDP is better for evaluating RTT.
One thing you might look at is the vpn provider network adjacencies in different locations. Many/most NSPs handily include some sort of location identifier (like iata code) in their device names. From those adjacent devices you can probably infer city at least. PeeringDB and/or looking glass portals are another way to deduce network topology and connectivity, possibly narrowing it down to specific facilities.
Lastly your ICMP replies are inherently limited in what you see. Its very likely that there are underlying networks (ie MPLS, or GRE tunnels) that you cant see. Itd be very easy to have an interconnect in AMSIX and tunnel it over to belgium or germany. You wouldnt see that in DNS or ICMP replies, but could infer it through latency.
It seems so. I just used it because other have. But yes, technically they're PoPs.
Thanks for the clarification about clustering, and saying "delegated" vs "leased". Still, VPNs pay for address delegation, I would guess.
That's a good point about ICMP being unreliable. I'll need to check whether any of the ping services allow TCP or UDP. But yes, ICMP being slow just reduces the chance of finding superluminal signal velocity.
> Many/most NSPs handily include some sort of location identifier (like iata code) in their device names.
I'm not sure what that means. Where would I get that information?
> PeeringDB and/or looking glass portals are another way to deduce network topology and connectivity, possibly narrowing it down to specific facilities.
I'm not familiar with that, either. Although I at least sort of know what looking glass portals are. So hey, something to learn, clearly.
I did find quite a few minrtts under 1 msec. Suggesting that the VPN server and ping probe are very close. Perhaps hosted in the same facility. But I didn't know where to look for relevant information.
> Lastly your ICMP replies are inherently limited in what you see. Its very likely that there are underlying networks (ie MPLS, or GRE tunnels) that you cant see. Itd be very easy to have an interconnect in AMSIX and tunnel it over to belgium or germany. You wouldnt see that in DNS or ICMP replies, but could infer it through latency.
I'm not sure what this means. Wouldn't adding a tunnel just increase latency? Or rather, is there any way that using "underlying networks (ie MPLS, or GRE tunnels)" could obscure the use of PoPs?
Depends on how you look at it. I havent been in the NSP industry for a decade but usually customers would get a /24 or /22 of the NSPs allocation for free as part of the actual network service cost. Once you're bigger than that the customer (eg VPN provider) should be getting allocations/assignments directly. https://www.arin.net/resources/registry/reassignments/#alloc...
> I'll need to check whether any of the ping services allow TCP or UDP. But yes, ICMP being slow just reduces the chance of finding superluminal signal velocity.
For your methodology its probably fine. But you can see intersting artifacts in advanced networks. Like ICMP might be terminated (replied to) early at teh network edge, where actual TCP session setup happens deeper inside, and then layer 7 (http) is tunneled even further. So ping (icmp), traceroute (udp), and wireshark (tcp/http) would show different latencies by a few ms.
>> Many/most NSPs handily include some sort of location identifier (like iata code) in their device names.
> I'm not sure what that means. Where would I get that information?
The DNS records (PTRs) on the 'point to point' interconnects is where I would look. Both sides are probably a /32 that is owned by the NSP. For general sanity and ease of use they populate those PTRs with operational data. Try https://www.caida.org/publications/papers/2019/learning_rege... or http://www.caida.org/~mjl/pubs/rnc.pdf
The ICMP ttl exceeded replies from traceroute (hopefully) include those device or port addresses. And a quick DNS lookup will give you the PTR/A record. From here in australia to eu1.vyprvpn.com I pass through:
> 4 be10-3999.core1.vdc01.syd.aussiebb.net (220.127.116.11) 7.169 ms 5.369 ms 6.191 ms
> 5 be1.bdr1.coresite-sv1.sjc.aussiebb.net (18.104.22.168) 164.582 ms 161.045 ms 153.770 ms
This is a single layer 3 link from Sydney Australia (SYD) to CoreSite SV1 in San Jose (SJC) California.
> 6 ce-0-17-0-0.r01.snjsca04.us.bb.gin.ntt.net (22.214.171.124) 153.749 ms 153.840 ms 157.461 ms
Where its handed off to an NTT backbone device; probably site #4 in San Jose CA (snjsca04), router #1 (r01), interface ce-0-17-0-0.
> 7 sjo-b21-link.telia.net (126.96.36.199) 157.553 ms 164.782 ms 166.938 ms
> 8 nyk-bb2-link.telia.net (188.8.131.52) 892.775 ms *
> 9 ldn-bb3-link.telia.net (184.108.40.206) 528.821 ms 427.296 ms 405.795 ms
> 10 adm-bb3-link.telia.net (220.127.116.11) 311.039 ms
And I don't know telia naming offhand, but this looks like San Jose border router #21 (sjo-b21), across their backbone of New York (and/or Newark?) router 2 (nyk-bb2), London and Amsterdam.
> 11 adm-b2-link.telia.net (18.104.22.168) 308.913 ms
> 12 ic-311014-adm-b2.c.telia.net (22.214.171.124) 307.252 ms 410.241 ms 409.498 ms
And here we see a single Amsterdam border device or site (adm-b2) with what looks like two different interfaces. Im guessing #12 has a circuit identifier or similar, with the acutal end customer on the other side there.
> I did find quite a few minrtts under 1 msec. Suggesting that the VPN server and ping probe are very close. Perhaps hosted in the same facility. But I didn't know where to look for relevant information.
Yes. As a rule of thumb <1ms is same building or location, ~2-3ms is same metro area or a very large network topology in between the two. For more information I'd probably look a the device path (traceroute) or the BGP announcements to see what addresses are announced, and by which networks. You can infer network topology from the AS Path in the BGP announcements. All of that should be available from looking glasses or groups like caida & cymru.
> I'm not sure what this means. Wouldn't adding a tunnel just increase latency? Or rather, is there any way that using "underlying networks (ie MPLS, or GRE tunnels)" could obscure the use of PoPs?
The tunnel itself wouldnt necessarily add meaningful latency or interfere with your methodology. But it would obscure the topology that you'd see at the IP/ICMP (trceroute or BGP) level. So maybe the router is in AMS-IX, but that router could tunnel the packets to another device in Germany. Traceroute or DNS might not show that, but you'd see a latency anomaly there. Related to my geo-political comment if a user cares about law enforcement the physical location of the end device could very much matter.
>> VPNs pay for address delegation, I would guess.
> Depends on how you look at it. I havent been in the NSP industry for a decade but usually customers would get a /24 or /22 of the NSPs allocation for free as part of the actual network service cost. Once you're bigger than that the customer (eg VPN provider) should be getting allocations/assignments directly. https://www.arin.net/resources/registry/reassignments/#alloc...
OK, but I was thinking of HideMyAss, for example. Its cluster near Nuland, NL has 109 IPv4 addresses. RIPE tells me that they're all over the world. But latency tells me that they're all near Nuland.
So where did HideMyAss get all those IPv4 addresses? I'm guessing that some firm has obtained delegated IPv4 blocks from NSPs in all those places, and then allocated addresses to HideMyAss. And it has apparently become a huge business sector, especially with IPv4 exhaustion. But also for providing VPN services, spammers, etc with deceptive PoPs.
I looked into that using https://bgp.he.net/ but all that's on another VM that I lack the RAM to run right now. It might even be HideMyAss' NSP for that facility.
> The tunnel itself wouldnt necessarily add meaningful latency or interfere with your methodology. But it would obscure the topology that you'd see at the IP/ICMP (trceroute or BGP) level. So maybe the router is in AMS-IX, but that router could tunnel the packets to another device in Germany. Traceroute or DNS might not show that, but you'd see a latency anomaly there. Related to my geo-political comment if a user cares about law enforcement the physical location of the end device could very much matter.
Right. Latency for a direct tunnel might well be lower than that for Internet transport. And as you say, the tunnel can't hide from latency testing. It's just that traceroute just sees a discontinuity.
Center for Applied Internet Data Analysis (CAIDA) <https://www.caida.org/> data collection, curation and distribution
Team Cymru <https://www.team-cymru.com/> query interfaces for mapping IP addresses to BGP prefixes and ASNs
Are they, though? This is, of course, completely anecdotal, but the only real reason I will change the 'location' of my VPN endpoint is exactly that. My suspicion is that most of the VPN users concerns about geography match mine and actually boil down to whether or not it will trick a geofence.
There's going to be some ways in which you can conflate those concepts, since obviously 'virtual locations' will be detectable by some means, as evidenced by the fact that mirmir managed it, but I suspect for a lot of consumers good enough is good enough.
I was thinking more along the lines of law enforcement as well. Where civil & criminal law, data retention, LEO access, and due process may vary significantly in different countries.
Naively, I'd expect that it's where the VPN service is incorporated that determines that stuff. But on the other hand, they would need to find a server to impound it. So I suppose that PoPs could be used strategically to impede investigations.
I dont have adversarial experience with law enforcement, but from what Ive seen it's normally around incorporation, assets, and where the principals are. The government gets access by serving you papers, restricting your assets, or restricting your freedoms.
That side the physical location of the servers will matter for teh same reasons. Two adjacent countries may have different standards of evidence, due process, compelled information, data retention, 'know your customer', etc. Both the country of physical location and the country of incorporation will effectively have access; one directly and one via compulsion.
But I don't know how geofences avoid being tricked. Maybe they check signal velocity. I would, anyway.
And then there's the use of residential IPs for geofence tricking. But those are pretty clearly where claimed.
Companies like MaxMind collect and repackage this based off of information like RIRs, SWIP, DNS PTRs, or even end user billing/shipping addresses in online transactions. I dont think latency is used too much these days. Check out https://www.maxmind.com/en/geoip2-databases for some examples. I'd say that level of detail is about industry standard and most companies would simply pay maxmind or another vendor for this data.
One thought I had. At the beginning where you talk about about reasons for VPN virtual locations I was reminded of an article I read like two years ago that talked a little bit about VPNs using IPs apparently registered in North Korean IP-space. Since then I have scratched my head wondering why exactly a VPN provider would offer something like that. I never looked through the data to confirm that these IPs were abused, but I assume these IPs are (or would be) attractive to black hats who want to obfuscate attack patterns and confuse/alarm unknowing threat analysts. Would VPN providers like HMA actually offer such a thing to boost revenue? Or maybe they were just gimmicks. I don't know...
 https://blog.trendmicro.com/trendlabs-security-intelligence/... (I am pretty sure none of the IPs mentioned in the article are geolocated to NK anymore; also idk if HMA still offers IPs virtually located in NK)
I really have no clue why HMA etc offer so many odd locations. I guess it's the "at least one in every country" brag. Maybe, as you say, to confuse people. But then, HMA doesn't have a good track record for obfuscating abuse ;)
That's correct. Also you get the fastest long-haul transmission with fiber optic, c/1.45, vs. roughly c/2 for wire.
The cutoff does seem to be about 0.7c overall. Although I allowed some slack above that, to be conservative.
I’m curious if you used three or more probe servers in distinct locations if you might be able to roughly triangulate server location?
To increase accuracy, seems the underlying topology should ideally be taken into account. Probably more work than it’s worth, would it be cool to have some kind of skeleton for the wires, to calculate distance metric with higher accuracy than an arc.
I played some with triangulation, with no joy.
From the SE thread "Triangulate with Ping [closed]" I get that it's impossible. Latency depends more on device delays than on distance. Within densely populated continents, router/switch count does depend more or less on distance. But there's too much variability.
Or at least, none of the models that I tried converged.
It might be doable, with enough information about Internet structure. So you could deconvolute device count and distance. But that would be a lot of data, and I have no real clue where to get it. The NSA?
It's quite normal to have a router in Toledo talking to a router in Detroit but the packets ride a transparent SONET circuit to Chicago and back, so the two hops look 50 miles apart but are about 500 fiber-miles apart with consequent speed-of-light delays. And the folks who play at the IP router level are often entirely ignorant of the layers below, to much confusion. (I worked on these systems for years. A conversation with another network admin about this ignorance became the genesis of the Anything But Ethernet contest, which we ran for several years at a midwest tech con. Hijinks ensued.)
Learning anything about the layers below, that don't interact with IP packets, is gonna be hard, I concur. You're flying blind except for speed-of-light, and have a lot of other delays to solve out. "Might be doable" is about my conclusion, too. You'd need a whole crapton of probes, with accurate locations, and a whole crapton of math, which is way over my head. I feel like the equations are gonna start to resemble the way GPS works, but that's the thing — GPS does work! My gut feeling is that it's worth a try.
I also played around with placing my own probes, using VPS hosting that peered with VPN server hosting. Using information from https://bgp.he.net/. But that's too tedious for general use.
In this work, I basically ignored it. And focused on testing for plausible signal velocity.
Still, I do agree that it's an interesting problem. Maybe I can learn enough to make it doable.
However, I'm just looking at signal velocity for the probe with the smallest minrtt. So probes where routing to the server is less direct will just have greater minrtt for a given geographic separation. That is, lower signal velocity. So that doesn't affect my analysis.
The code is pretty simple, really. Just scripts for collecting data. I did the analysis in Calc, mostly. And some in Excel. But I can explain in more detail on GitHub.
I'll look into open sourcing all of the data.
I'll do a Show HN when the GitHub repo, with methods and data, is up. It may be a while, though.
I'd love to learn more about the mechanics of gathering the data. You say you used Ping.pe, CASM, and MapLatency. It sounds like you had to scrape the Ping.pe results because they don't provide an API? Did you consider using RIPE ATLAS probes? Did you do any cross-verification between services where they both claimed to have probes in the same location?
It's all fascinating stuff and very, very thorough! I hope the story gets picked up by an outlet with a journalist who can distill the salient points into an article short enough for casual reading.
I did scrape ping.pe using a shell script with numerous lines like this:
> google-chrome --headless --run-all-compositor-stages-before-draw --virtual-time-budget=25000 --print-to-pdf='[IPv4-date].pdf' http://ping.pe/[IPv4] && sleep 4
Also, if I hit it too long, it started providing bullshit results. So I ended up doing it from Whonix, so each request came from a different IPv4. And I did try to contact them to arrange payment. But got no reply.
I did read about RIPE ATLAS. But it seemed necessary to host a probe. And then get credits to use other probes. But maybe I misunderstood. So is there an API?
I didn't do any systematic cross-verification between ping services. For each VPN server IPv4, I just got the minimum minrtt value, for all measurements from all ping services. However, I did verify that ping probes generally yielded plausible minrtts for VPN server IPv4 that didn't show implausible minrtt with any ping probe.[0 vs 1]
I read the commercial-use permission differently — I think it's granted by default if you give credit and link to them as per these terms. IANAL but do your own reading: https://atlas.ripe.net/get-involved/commercial-use/
The ATLAS toolset is here, including APIs: https://atlas.ripe.net/measurements-and-tools/tools/
Speaking of Github though, if you're looking to take this even further, sharing your scraping and analysis scripts might find even more collaborators and ideas. But I understand if that's not permissible under your work-for-hire agreement or whatever.
Rereading the commercial-use terms, I get that the rules for default permission all work for me.
I got approval to share the data. So I'll do a GitHub repository with that and complete explanation of how I collected and analyzed the data. Before doing more of this, I must get a MySQL instance up, and learn enough R and Python to do the final analysis and charting.
You can also receive credits as a donation, e.g. by someone who hosts a probe.
This is an interesting article and I’m sure someone with a constant supply of credits would surely donate a few of them, for you to continue this research.
I gather that there are lots of RIPE Atlas probes. So yes, it'd be very cool if someone would donate credits. However, there is the fact that commercial use requires approval from RIPE. And that would likely include work done for hire.
I know about the recent hack, but it seems like all of their precautions actually worked in protecting the consumers, and no logs were ever shown to have existed or been stolen.
Or at least that's what I gathered reading the recent HN threads.
This is not the behavior of an organization deserving ANY level of trust.
It's a problem for me. Their ads use scare tactics and make deceptive if not outright false claims. Unethical ads come from unethical corporations. I can't see any reason to give such a corporation the benefit of the doubt by assuming their unethical behavior is limited to how they advertise.
And their US residential proxies do work very reliably for Disney+. There is the possibility that they're using services that obtain those proxies deceptively. But I tracked down one of them, and the guy confirmed that he's knowingly selling his bandwidth. And yes, it's just anecdotal.
IMHO, VPN makes sense for geo-blocking or for corporate use, but for privacy? No way.
Orchid is a P2P VPN network that offers dynamic multi-hop connectivity. Users buy bandwidth with supposedly anonymous Etherium-based cryptocurrency. But so far, it's only available on Android and (soon) iOS.
Some years ago, I wrote a guide for IVPN for doing static nested VPN chains (aka multi-hop connectivity) using pfSense VMs as gateway routers for various VPN services.
And not long ago, published on GitHub about doing dynamic nested VPN chains, with recursive NAT forwarding within a single Debian router VM. There are iptables rules that forward one VPN through another, and restrict traffic to the chain. That's lighter, and easier to setup. And it's dynamic, more like Tor. But there's no compartmentalization between OpenVPN processes.
Unlike Orchid, that's not limited to smartphones. But there's are no ~automatic payments and reputation management.
I was going to recommend ProtonVPN but they have a pretty sketch background/history too. 
Cloudflare's WireGuard-based Warp VPN  might be worth keeping an eye on, but Cloudflare makes it clear that users should not be expecting the privacy guarantees of a "traditional VPN" :
> WARP is not designed to allow you to access geo-restricted content when you’re traveling. It will not hide your IP address from the websites you visit. If you’re looking for that kind of high-security protection then a traditional VPN or a service like Tor are likely better choices for you.
Perhaps we simply shouldn't use VPN services. 
I always did wonder how they could offer so much for so little money.
But now there's this with PIA. Which decreases trust. But yet they were one of the few that had actually "proved" in court that they didn't retain logs. So I guess that I still trust them some.
I also think it's important to understand that despite the marketing, using a VPN isn't a hard privacy improvement. All it does is move the trust you'd otherwise place in your ISP elsewhere. Your traffic is still visible to a third party with unverifiable practices.
Back to the subject of providers, personally I use Mullvad, ProtonVPN, IVPN and NordVPN at various times. Mullvad is great for privacy, IVPN has WireGuard and plenty of locations, ProtonVPN is one of the only providers with a server in Russia (I use it for region-unblocking) and NordVPN has an insane number of locations.
Every service that is widely available / permissionless AND offers real privacy, will end up getting degraded, CAPTCHA'd or blocked. That shows it works1
But I don't trust Tor either, so I'd first use a nested VPN chain. Just in case the entry guard is malicious. And to complicate the traffic analysis a little.
I could go on about this. But I'll be writing more about all this.
So the key is not attracting their attention.
If your threat model includes your ISP, but does not extend to nation-state level adversaries, then a good private VPN should be a decent enough solution, although a public VPN might be easier and still adequate.
Source: I personally pissed off the Director of the NSA in November of 1992 (see http://www.shub-internet.org/brad/cacm92nov.html ). At the time, my clearance was Top Secret/SCI, and I had been read onto multiple compartments — including the ones for ECHELON, KEYHOLE, etc.... So far as I know, I am still on their shit list, albeit not as high as Snowden or Binney.
So the best bet is staying off the shit lists of the NSA and other global adversaries.
However, I have used personal VPNs tunneled through Tor. But I was very careful to be anonymous about the VPS I used. And it was just to get around blocking of Tor exits.
And since you're not, I suppose, reselling your traffic to other users, the liability for problematic traffic will land on you.
Also the bandwidth rates all big cloud providers are charging are just extortionate
Tor might have its vulnerabilities as well, like all software. Not to forget that pretty much anyone can run an entry guard today and at least associate your IP with usage of Tor..
And yes, Tor provides far more anonymity than any VPN service ever could.
However, some Tor relays are malicious. And we have no way of verifying which ones are or aren't, except by trusting the Tor Project. It's true that there's lots more independent oversight, however. But that CMU exploit of the "relay early" bug is a red flag. Because the Tor Project didn't detect the malicious relays for at least weeks.
Anyway, I use nested VPN chains to access the Tor network. So if I get pwned through a malicious entry guard, at least they'll only learn the final VPN exit address.