Hacker News new | comments | show | ask | jobs | submit login
Mapping the internet with Hilbert curves (benjojo.co.uk)
218 points by randomdrake 60 days ago | hide | past | web | favorite | 29 comments

Nice! Perhaps a Gibson quote is appropriate here:

> Program a map to display frequency of data exchange, every thousand megabytes a single pixel on a very large screen. Manhattan and Atlanta burn solid white. Then they start to pulse, the rate of traffic threatening to overload your simulation. Your map is about to go nova...

I don't think straight hilbert curves:IP ranges are the best mapping, and would prefer something built around ping times or hop check routings in 3 dimensions. Or integrate it with geolocation data. But! It's still super useful. Or integrate it with geolocation data.

On top of all of this, I also did a bonus scan of a few APNIC IP blocks every 30 mins for 24 hours. The data from that allows you to see the internet “breathe” as clients come online in the morning and offline at night

Really, I'm surprised there isn't a distributed/crowdsourced system to do this all the time and allow people to study the 'weather' in the datasphere.

H-curves have better locality, might be interesting to use those:

[0] http://www.akt.tu-berlin.de/fileadmin/fg34/publications-akt/...

[1] http://hint.fm/papers/158-wattenberg-final3.pdf

> I don't think straight hilbert curves:IP ranges are the best mapping,

It is a good idea, as IP ranges are a simple (discrete) linear range.

However, maybe this is not the best explanation:

> The problem with displaying IP addresses, is that they are a single dimensional, they only move up and down, however humans are not good at looking at a large amount of single dimensional points.

But rather: Hilbert curves are great because it ensures that every two consecutive points are contiguous in space (i.e., no gaps).

I've done it a few times w/ geolocation data. Makes it easier to see changes in developing countries but other things are much harder to see. A mix of visualizations is probably the best approach depending on the target audience/ use case:

2014: https://imgur.com/aQUHzgu

2016: https://imgur.com/p43QH6v

Also for fun, ipv6 exhaustion counter.


Note: A power law is definitely not the same thing as an exponential fit. Using the two interchangeably is disingenuous.

I need to make an exhaustion counter for the /64 at my house :p

I wish this used a good color mapping, like Viridis or cubehelix, or at least used HSLuv or HPLuv to map the parameters to colors. I bet we could see a lot more patterns in this then.

Edit: I made a github issue for this:


I did something similar few years back, mapping ipv4 address space owners.


You can scan the whole internet in about an hour. I had luck using AWS and zmap.


I'm surprised there haven't been more of those high cadence observations he presents at the end when the scans are that fast now.

If nothing else his little gif shows that just scanning at different times of day could be used to estimate number of personal devices belonging to individuals there are on a certain subnet.

>ZMap can scan the IPv4 address space in under 5 minutes.

That’s misleading, because you’re quoting half of a sentence. Full quote:

> On a typical desktop computer with a gigabit Ethernet connection, ZMap is capable scanning the entire public IPv4 address space in under 45 minutes. With a 10gigE connection and PF_RING, ZMap can scan the IPv4 address space in under 5 minutes.

According to [1] IPv6 adoption is slowed down significantly, so we stick to NAT for a decade at least I think.

[1] https://www.google.com/intl/ru/ipv6/statistics.html

The 9MB PNG is unoptimized. By passing it through optipng and advdef I managed to losslessly squish it down to 7MB.

Also, I would be remiss if I did not point out that this:

cat ping.txt | pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)'

is a Useless Use Of Cat.[1]

It should be rewritten:

pcregrep -o1 ': (\d+\.\d+\.\d+\.\d+)' <ping.txt

[1] http://porkmail.org/era/unix/award.html

>I managed to squish it down to 7 MB

wow, what a stellar compression ratio

>Useless Use Of Cat

Oh My God No One Cares

> wow, what a stellar compression ratio

It's pretty good when compared to uncompressed RGB of the same size, which would be 48M.

you cared enough to respond, syrrim

You can do the same with LBA's of a block device. It's interesting to see where different file systems place the (meta) data.

How do the number of internet connections relate to the number of nodes? Building fat pipes is not the answer just as more highways is not the answer to more destinations. The increase in traffic will consume more resources exponentially (factorially?)faster than the increase of address space

IPv6 Active Webhosts Hilbert also exist based on Akamai data as I found this d3 block by Vasco Asturiano:


The author also has some other cool d3 visualizations of IPv6 Routes, AS, as well as IPv4 allocations.

Surprised that he missed out

He also left out a bunch of other networks that one wouldn't really need to scan: [0] and [1] (and a bunch of /24s [2,3,4] too but, relatively speaking, those are pretty insignificant), although one might find some "interesting" (CGN) stuff in if their ISP was making use of it.

There are also large portions of the 13 /8s (218 million IPs!) assigned to the US Department of Defense [5] that you wouldn't need to scan since there are no routes to them at all: the,,,,,, and networks are, for all intents and purposes, "missing" from the public Internet.

Additionally, there are only four /24s in that are reachable from the public Internet. Out of the 16,777,216 IP addresses that make up, only 255 are reachable ( [6].

There's pretty much no point in scanning -- "mapping" -- this address place (unless you are looking specifically for US government/military stuff).

ETA: In the interest of time, you probably wanna skip over [7] also.

[0]: https://tools.ietf.org/html/rfc6598

[1]: https://tools.ietf.org/html/rfc2544

[2]: https://tools.ietf.org/html/rfc5737

[3]: https://tools.ietf.org/html/rfc3068

[4]: https://tools.ietf.org/html/rfc7534

[5]: https://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_addre...

[6] Interestingly, the ASN (27651) that is advertising into BGP appears to be registered to a company in Chile -- and they're also advertising I would not be surprised to find out that neither of these advertisements are legitimate.

[7]: https://en.wikipedia.org/wiki/AMPRNet

why ? these are non routable...

It is missing from the table of reserved ip addresses.

He mentions: Local System Local LAN Loopback “Link Local” Local LAN Multicast “Future use”

That was infinity0's point -- there's no reason to scan them.

It strikes me that we've "run out" of IPv4 address space but there's entire large blocks of space allocated to entities that don't appear to be using them.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact