Fun with IP address parsing

geoffpado · on Dec 26, 2020

> This is the same IP address: 3232271615. You get that by interpreting the 4 bytes of the IP address as a big-endian unsigned 32-bit integer, and print that. This leads to a classic parlor trick: if you try to visit http://3232271615 , Chrome will load http://192.168.140.255.

This was the source of one of my favorite “bugs” ever. I was working on multiple mobile apps for a company, and they had a deep link setup that was incredibly basic: <scheme>://<integer>, which would take you to an article with a simple incrementing ID. This deep link system “just worked” on iOS and Android; take the URL, grab the host, parse it as an int, grab that story ID. Windows Phone, however… the integers we were parsing out were totally wrong, returning incredibly old stories!

Turned out that the host we were given by the frameworks from the URL was auto-converted to an IP in dotted-quad format, and then the int parser was just grabbing the last segment… which meant that we were always getting stories <256, instead of the ~40000 range we were expecting.

GlitchMr · on Dec 27, 2020

Curiously, this appears to be a bug in Windows Phone. In URIs, part following `//` is called authority, which is essentially a host with some optional additional stuff (like port number).

According to RFC 1123, hostname could legally be entirely numeric, and web browser shouldn't attempt to "correct" it (as it is a valid URI) for schemas it doesn't know anything about - as it doesn't know the rules for hostname for a given protocol. This is also not a valid IP address according to RFC 3986 (which specifies URI syntax), as this specification requires #.#.#.# format with three dots.

That said, using authority for something that isn't technically a hostname is misusing the field. I think using `<scheme>:<integer>` would have been a better idea.

namanyayg · on Dec 27, 2020

Funny bug :) I often prefix integers with a char, e.g. maybe "u12345" here, in places where I'm using integers as id to force a string conversion and avoid any code accidentally doing math on it.

hajhatten · on Dec 27, 2020

Huh, I'm gonna try to remember this. You'd be surprised how lazy people get, even when money's involved.

akoncius · on Dec 27, 2020

I love those kind of stories. it shows on how high-level abstraction we are working on a daily basis when we have no clue what is going on with stuff which we are touching constantly.

I bet it was not “I love it” sentiment when you had to debug this kind of issue though, haha :)

geoffpado · on Dec 27, 2020

Any bug you can learn something from is better than the alternative. :)

Thankfully, I caught this one while building the feature in the first place; I don’t imagine I’d have such fond memories of it if I’d had to recreate it from user reports!

slim · on Dec 27, 2020

maybe it was more appropriate to use a urn instead of a uri. something like urn:namespace:id:scheme:number

https://tools.ietf.org/html/rfc8141

whereistimbo · on Dec 27, 2020

Well I read how foobar2000 dev (Peter) developed his apps on Windows Phone. He said it was so annoying compared to other two major mobile platform and he was considering to refund the crowdsourced money which comes from pledging Windows Phone.

arkadiyt · on Dec 26, 2020

These different representations also lead to frequent server side request forgery (SSRF) bypasses - someone might be blocking local IPv4 but you can still access their AWS metadata endpoint at ::ffff:169.254.169.254, etc.

For anyone using Ruby, I'm the author of a gem [1] that comprehensively protects against SSRF bugs. For anyone using Golang I recommend this [2] blog post.

[1]: https://github.com/arkadiyt/ssrf_filter

[2]: https://www.agwa.name/blog/post/preventing_server_side_reque...

stevekemp · on Dec 27, 2020

For golang I wrote this:

https://github.com/skx/remotehttp

I've found, and reported, a whole bunch of services which take user-supplied URLs and don't filter out access to localhost:8080/server-status, and similar local resources.

A common route to attacking these is to access the AWS metadata URL endpoint. Something at least the Google cloud prevents, by forcing the use of the `Metadata-Flavor: Google` header.

jamespwilliams · on Dec 26, 2020

> ::ffff:169:254:169:254

Just to note, this should be ::ffff:169.254.169.254

proactivesvcs · on Dec 26, 2020

I wonder how many of these bugs are the result of people thinking "Well I've read the spec but most of it is 'cursed' so I'll just implement this subset which fits my idea of 'acceptable'".

jitl · on Dec 26, 2020

It’s more like “I firewalled everything using iptables, job done” but, there are no firewall rules in ip6tables. IPv6, what’s that?

(My solution at home is to blanket block IPv6 entirely)

the_mitsuhiko · on Dec 26, 2020

Unfortunately the blacklisting approach that works on IPv4 is completely broken for IPv6 since you can't really know where your own services are. I still did not find a good generic way to protect IPv6 and ended up just disallowing it so far everywhere.

arkadiyt · on Dec 26, 2020

IPv6 has internal ranges defined just like IPv4 does - anything in an internal range should be blocked, and anything in an external range is safe to pass through.

the_mitsuhiko · on Dec 26, 2020

Since there is no NAT in an IPv6 deployment unsafe services you typically want to prevent access to look like non internal ranges. Whereas in your normal IPv4 deployment you might have your protected service on 192.168.4.111 with IPv6 it will just share the same prefix (potentially) as your host.

icedchai · on Dec 27, 2020

NAT was meant to work around IP exhaustion issues, not act as a security layer. By accident, it often happens to provide some additional security. By keep in mind those "internal" IP addresses may be routable in some cases, due to either accidental or deliberate mis-configuration.

IPv6, with end-to-end connectivity, is how the Internet is supposed to work. It's how it did work in the early 90's, even with IPv4.

If you want to secure your servers, use a firewall. Maybe it's a host-based firewall.

cle · on Dec 27, 2020

What are the practical reasons we should rearchitect our systems to remove NAT?

(I know the weaknesses of NAT but the cat’s out of the bag at this point...the question isn’t really “why should we use NAT”, it’s “why should we go through the pain of breaking it”.)

lmm · on Dec 27, 2020

IPv6 does nothing to break NAT, you could deploy the exact same kind of NAT and it would have the exact same behaviour, if you really want to make your router use a bunch more memory/CPU and make it a pain for users to do anything that needs a direct connection. But you gain nothing from doing that.

the_mitsuhiko · on Dec 27, 2020

It's nice that this is how the internet is "supposed to work". In practice not having a NAT makes "automatic" internal protection of web services hard to impossible.

> If you want to secure your servers, use a firewall. Maybe it's a host-based firewall.

Firewalls do not solve this problem because a you do want service to service communication. What you do not want is code that crawls to user supplied URLs to access your internal services. Do you need application level protections. With IPv6 you're basically forced to declare your CIDR explicitly whereas with IPv4 you could easily achieve a secure by default system.

icedchai · on Dec 27, 2020

In these situations, store your IPv6 prefixes in a config. This doesn’t sound like a hard problem to solve.

the_mitsuhiko · on Dec 27, 2020

Which is a manual process and because it is one it leaves many systems unprotected.

icedchai · on Dec 27, 2020

Systems that crawl user provided URLs are in the minority. For most systems it is irrelevant.

the_mitsuhiko · on Dec 29, 2020

I strongly disagree. Almost every single system has webhooks which by definition are user supplied URLs.

octoberfranklin · on Dec 27, 2020

> IPv6, ..., is how the Internet is supposed to work

No, the way the Internet is supposed to work is that you have one routable address space. If you need to expand it, the previous address space is imported as a subset of the new one.

https://cr.yp.to/djbdns/ipv6mess.html

I will never forgive the IPv6 for not making the 32-bit IPv4 space a subrange of the 128-bit IPv6 space. Years after winning the IPng wars they admitted their mistake and standardized NAT64, but it was too late. NAT64 should have been part of IPv6 from day one, and every IPv6 router acting as a default route gateway should have been mandatorily-required to offer NAT64.

lmm · on Dec 27, 2020

Mandating that every router has to do stateful connection tracking would have been an enormous, wasteful burden. NAT64 is there for those who need it; 464XLAT setups with IPv6-only clients are quietly the reality on networks that don't have too much legacy infrastructure (mostly mobile).

octoberfranklin · on Dec 27, 2020

And having two completely separate internets (IPv6 and IPv4) forever isn't a wasteful burden?

Not requiring backwards compatibility in IPv6 guaranteed that IPv4 would be around forever. IPv4 is never, ever going away because of this.

The only people who couldn't see this coming were from Bell System backgrounds where you could use centralized schemes like "Ma Bell says tomorrow is the Flag Day, flip the switch". In a decentralized system people don't stop using the old system until you give them a new system that is backwards compatible. Then you drop the backwards compatibility in a second, separate upgrade much later, on a timetable dictated by adoption, not flag days.

lmm · on Dec 28, 2020

> And having two completely separate internets (IPv6 and IPv4) forever isn't a wasteful burden?

Your proposal would force maintaining IPv4 for longer and in more networks: every IPv6 router would have to have IPv4 connectivity and probably a routeable IPv4 address, so it wouldn't even solve the address exhaustion problem for long (perhaps not even at all).

> Not requiring backwards compatibility in IPv6 guaranteed that IPv4 would be around forever. IPv4 is never, ever going away because of this.

IPv4 has already been eliminated from newer edge networks, and for those networks the vast majority of upstream traffic is IPv6. No doubt those networks will have to maintain 464XLAT for a long time as the long tail of upstream sites that are only v4-accessible, but they'll be able to have a smaller and smaller pool of 464XLAT servers and outsource the v4 connectivity support further and further upstream (just as with Usenet), until eventually v4 connectivity becomes a paid add-on and then goes away entirely. Home routers for use with PCs will probably have to offer 4over6 for a long time, because it's hard for an ISP to be confident all their users are up to date, but that doesn't actually reduce the benefits that much (all your internal network management can still be v6, only the little home user LANs are v4), and organisations that manage all their endpoint devices don't even need that much.

growse · on Dec 27, 2020

Maybe I'm misunderstanding you, but one of the points of the article in that you can represent IPv4 addresses in IPv6. In other words, IPv4 is a subset of IPv6.

If I'm on an ipv6-only host and blast UDP at ::ffff:1.2.3.4, they should get delivered to 1.2.3.4, no?

The actual, real-world problem with 4 being a subrange of 6 is that 4-only hosts are blissfully unaware that the super-range exists, so have no mechanism to send packets there. This is of course where you're right about NAT64 and the state requirements.

octoberfranklin · on Dec 27, 2020

> If I'm on an ipv6-only host and blast UDP at ::ffff:1.2.3.4, they should get delivered to 1.2.3.4, no?

No, it won't necessarily! That's precisely the problem. Until NAT64 was introduced it was in fact impossible for an IPv6 router to deliver your packet to the IPv4 host 1.2.3.4. NAT64 still isn't mandatory (and likely never will be), so if you're writing software you can't assume those packets will get through even if you have an IPv6 network connection with a default route.

NAT64 didn't come about until long after IPv6 was finalized, and NAT64 support from default-route IPv6 routers is still is not mandatory. That's why we have this mess with dual-stack hosts: you cannot safely assume that your IPv6 router is willing to deal with the IPv4 world on your behalf.

The ::ffff:1.2.3.4 address space does, in fact, date back to the early days of IPv6 (it came from RFC 2765, about one year after IPv6 was finalized), but it was not meant for letting IPv6 clients share a single IPv4 address -- it was only for servers which for some reason had their own IPv4 address but couldn't speak IPv4. Yeah, back in the early 2000s people thought this problem might happen.

The IPv6 committee was viciously hostile to NATs. The way they saw it, NATs were the problem that made IPv6 necessary, so no way were they going to allow any NATs to pollute their precious IPv6. If that meant that the whole world had to run two separate internets (IPv4 and IPv6) for the rest of eternity just to keep the IPv6 network puritanically NAT-free, then so be it!

It took them more than a decade to realize how stupid this mindset was.

growse · on Dec 27, 2020

Ah! T see what you mean. Thanks for clarifying and correcting me.

I possibly have some sympathy with the anti-NAT view taken at the time, even if it ended up being the wrong thing to do it hindsight. Adding more mandatory complexity to implementors would have harmed adoption rates, and I've seen some weird edge cases with NAT64 - it's not necessarily a trivial thing to implement correctly.

octoberfranklin · on Dec 27, 2020

Yes, but having two entirely separate internets, like we do today, is much more complex than any amount of NATting!

I fault the IPv6 proponents for not forseeing our current situation. DJB saw it with crystal clarity in 2001. Lots of people warned them that this would happen.

Dagger2 · on Dec 28, 2020

::ffff:0:0/96 is for representing v4 addresses in v6 APIs. If you tell the kernel you want to send a packet to ::ffff:1.2.3.4, you're actually telling it you want to send a (v4!) packet to 1.2.3.4, you're just doing it using an AF_INET6 socket rather than an AF_INET one.

Since packets aren't APIs, you should never see ::ffff:0:0/96 in packets on the wire. A v6-only host can't use this prefix to send v4 packets to v4 hosts.

(What would the source address of those packets even be?)

Dylan16807 · on Dec 27, 2020

Isn't this what the fd/8 local addresses are for?

philsnow · on Dec 26, 2020

This is awesome; do you know if anybody has written a rails plugin to use ssrffilter by default for all requests?

arkadiyt · on Dec 26, 2020

Rails doesn't provide any standard mechanism/library for sending http requests, so I don't think there's anything in Rails to apply the gem to

lrossi · on Dec 26, 2020

Can confirm that visiting http://127.1 on ipad indeed works and redirects to http://127.0.0.1. This is very surprising and, at least for me, humbling.

I think I will quote this article any time I see someone using regex to validate or parse IPs.

FreshFries · on Dec 27, 2020

This is one of the reasons why I appreciate the geekiness of Cloudflare with their DNS service IP addresses, particularly:

1.1 which to me is the shortest useful IP address I am aware of.

knorker · on Dec 27, 2020

0 connects to localhost. It's shorter.

firethief · on Dec 27, 2020

0 is 0.0.0.0, which is not a valid address for most purposes. Some programs, like iputils ping, have special handling for that case (i.e. using it as an alias for the unroutable host address); some programs, like FreeBSD's ping, do not [1]. Unlike most of these address tricks, it's not standardized, except that treating it as a normal address is technically disallowed.

1: https://unix.stackexchange.com/questions/99336/how-does-ping...

knorker · on Dec 28, 2020

Not quite. Yes, that may be true for ping, but ping is very much special. It builds its raw packets.

But do a "strace -econnect nc 0 22" and you'll see that yes, actually, a connect() syscall to "0.0.0.0" does connect to localhost.

firethief · on Dec 29, 2020

Interesting. That doesn't seem to be specified in the RFC1122 standard or the Linux ip(7) docs, but it's an explicit special case in the kernel (ip_route_output_key_hash_rcu):

        if (!fl4->daddr) {
                fl4->daddr = fl4->saddr;
                if (!fl4->daddr)
                        fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
                ...

smusamashah · on Dec 27, 2020

wow this is shortest way to localhost

kazinator · on Dec 27, 2020

Would you be further humbled if the ipad accepted http://CXXVII.I also?

I'm never writing anything that positively accepts 127.1, or 0127.000.000.0001 as a valid address no matter what garbage implementations do.

The issue we have with this are situations when we have to accept only inputs that are domain names which are sure not to be treated as an IP address by some software downstream of us.

sunsetMurk · on Dec 27, 2020

All my regex (are now) a lie

bigiain · on Dec 27, 2020

... now you've got two^h^h^hthree problems.

z3t4 · on Dec 26, 2020

I'm now going to change my LAN to use 10.0.0.1 instead of 192.168.0.1 so that I can just type 10.1 This will help not only when testing stuff on mobiles only to have to rewrite the whole adress again because you forgot http:// but also when telling the kids what IP to connect to when setting up LAN games. Or coworkers when telling them them some LAN/router IP. Time server is on 10.36

jackewiehose · on Dec 27, 2020

I like the idea and will do the same.

But we'll see how well that works... I just fed the first 4 google results for "ip address converter" with 10.1: Three converters gave an error message and one came up with 0.0.0.10.

mitchs · on Dec 27, 2020

The real question who is using inet_aton(3). I'm betting none of the online converters.

firethief · on Dec 27, 2020

Here I am naming my machines like a chump when I could be memorizing their numbers like oh yeah, I went to MIT

samoa42 · on Dec 27, 2020

> I'm now going to change my LAN to use 10.0.0.1 instead of 192.168.0.1 so that I can just type 10.1

people doing lan-partys around 1995 called ...

just kidding, great you discovered it

chungy · on Dec 26, 2020

> I’m on the fence about that last one, the “IPv6 with an embedded dotted decimal” form. My reference parser (Go’s net.ParseIP) understands it, but it’s not really that useful any more in the real world. At the dawn of IPv6, the idea was that you could upgrade an address to IPv6 by prepending a pair of colons, as in ::1.2.3.4, but modern transition mechanisms no longer offer anything as clear-cut as this, so the notation doesn’t really show up in the wild.

I have to disagree with this conclusion. I see it very frequently on Linux. It turns out that programs can bind their listen address to just ::, and the kernel will still allow connections from IPv4, with the address mapped to ::ffff:0.0.0.0/32 -- outbound connections use the same notation.

thwarted · on Dec 26, 2020

> It turns out that programs can bind their listen address to just ::, and the kernel will still allow connections from IPv4, with the address mapped to ::ffff:0.0.0.0/32 -- outbound connections use the same notation.

This is only true if the sysctl bindv6only or socket option IPV6_V6ONLY is 0, and is defined by RFC3493.

IgorPartola · on Dec 27, 2020

I definitely frequently used this in code I had written and ran. It is very nice to not have to worry about both stacks and IPv6 is the future anyways. It’s nice to make this configurable for your daemons but I think the default should be true. And also this allows you to not have two separate bind address config lines and all the confusion that comes with that.

ArchOversight · on Dec 26, 2020

Also, some applications have built-in filtering of allowed IP addresses and they don't take into account IPv4-mapped on IPv6 and thus rules may be bypassed without the admin knowing because they dutifully entered their filters in IPv4 only and forgot to tell it to bind to IPv4 only by default.

octoberfranklin · on Dec 27, 2020

> At the dawn of IPv6, the idea was that you could upgrade an address to IPv6 by prepending a pair of colons, as in ::1.2.3.4

No, IPv6 explicitly rejected that idea at first. Most of the other IPng proposals did have a backwards compatibility mechanism like that. I'm still sore that the least backwards-compatible proposal was the one that won.

Later the IPv6 cabal admitted their mistake and published NAT64, but at that point it was too late to make it a mandatory required service offered by any default-route router. So now we have all of this crap about dual-stack hosts instead of simply being able to upgrade to IPv6 and trust that you will not lose any connectivity.

This is basically why, twenty years after it was standardized, IPv6 is still merely the "internet of cellphones" and no closer to replacing IPv4.

As usual, DJB saw all of this decades ahead of time:

https://cr.yp.to/djbdns/ipv6mess.html

AnthonyMouse · on Dec 26, 2020

> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:7777:8888

Wait, what? 77.77.88.88 is in dotted decimal. It doesn't correspond to 7777:8888 in hex.

edit: Somebody else already noticed on Twitter:

> And as @alanjmcf noticed, I messed up one of the representations above.

> 1:2:3:4:5:6:77.77.88.88 means 1:2:3:4:5:6:4d4d:5858, not 1:2:3:4:5:6:7777:8888. I missed out a decimal-to-hex conversion in there.

j1elo · on Dec 27, 2020

> It does not process Class A/B notation, or hex or octal notation.

I got to find that notation useful once, to make a shorter one-liner... without even knowing that there were different classes of IPv4 address, and that I was looking at one of them.

It's a tiny function that gives me the IP address of my machine in the LAN, for either Linux and Mac:

  # Get main local IP address from the default external route (Internet gateway)
  iplan() {
      # Note: "1" is shorthand for "1.0.0.0"
      case "$OSTYPE" in
          linux*) ip -4 -oneline route get 1 | grep -Po 'src \K([\d.]+)' ;;
          darwin*) ipconfig getifaddr "$(route -n get 1 | sed -n 's/.*interface: //p')" ;;
      esac
  }

(sorry to people reading on small screens)

Full disclosure, I got the "1 is shorthand for 1.0.0.0" from here (which didn't get into explaining why it is a shorthand): https://stackoverflow.com/a/25851186

dave_universetf · on Dec 27, 2020

Oh no, that's another shorthand that's different from all the others. A single number should be interpreted as a big-endian uint32, and so "1" should be "0.0.0.1". However, I can confirm that `ip` interprets it as "1.0.0.0", even though you should have to write "1.0" for that.

Ugh.

j1elo · on Dec 27, 2020

Well I did think of that, it technically is not a Class-A because it should have 2 parts. My conclusion was that maybe what happens is that "1", while incorrect, is flexibly parsed as "1.0" and thus it would become "1.0.0.0". But you're right that, given the uint32 representation does exist, the most correct thing to do seems to interpret it as "0.0.0.1"...

unless an exception to the rule exists somewhere, and 'ip' is actually doing it right!

gravitas · on Dec 27, 2020

I was curious as well, turns out they're using a non-standard parsing with a comment in the function explaining why:

    /* This uses a non-standard parsing (ie not inet_aton, or inet_pton)
     * because of legacy choice to parse 10.8 as 10.8.0.0 not 10.0.0.8
     */

src: https://git.kernel.org/pub/scm/network/iproute2/iproute2.git...

(the entry point to start tracing down to the above inner function is right around here: https://git.kernel.org/pub/scm/network/iproute2/iproute2.git... )

j1elo · on Dec 27, 2020

Conclusion: explicit is better than implicit, and what's more, in this case the implicit alternative was depending on a non-standard choice made in the specific tool for obscure, legacy reasons.

gravitas · on Dec 27, 2020

This fits almost any situation (explicit vs. implicit) and I'm a big fan - when mentoring I tend to say "yes that was the default when you looked today, how do you know it won't change tomorrow? If you want specific behaviour, be explicit don't trust defaults." (more or less, depends on subject - commandline switches to code loops, same advice)

FreshFries · on Dec 27, 2020

macOS here:

% ping 1 PING 1 (0.0.0.1): 56 data bytes

% ping 1.0 PING 1.0 (1.0.0.0): 56 data bytes

anderskaseorg · on Dec 27, 2020

Did you really gain anything here, given that the omission of those 12 characters required a 38 character comment to explain what’s going on?

j1elo · on Dec 27, 2020

Absolutely no :)

What I wanted to express here (and did badly) is that crossing paths with this arcane Class-A style IP address is something so strange nowadays... in my case in more than 10 years professionally working as a developer, I had seen it exactly once and even then, didn't recognize it for what it was.

The code snippet was just an extra curiosity in case anyone found it useful.

gumby · on Dec 26, 2020

> So, it’s a de-facto standard that boils down to mostly “what did 4.2BSD understand?“

By the way 4.2BSD was being compatible with older or contemporary implementations, like ITS which was running TCP before any Unix was.

For example plenty of machines back then used octal as a preferred human representation. In fact that’s why octal is the default format of numeric constants in C: C, like Unix, was initially developed for an 18-bit (six octal digits) PDP-7. The smaller 16-bit PDP-11 version came later.

lucb1e · on Dec 26, 2020

"All possible notations of this IPv4 address" https://lucb1e.com/rp/php/funnip.php?link&ip=80.100.131.150

It was a surprising amount of work to figure out all the different formats an IP address can be shown in and convert a given IP into all those formats.

jsrcout · on Dec 27, 2020

That's impressive. And somewhat scary :-)

octoberfranklin · on Dec 27, 2020

How about the PGP word list? https://en.wikipedia.org/wiki/PGP_word_list

    $ ping stairway scavenger tracker upcoming

    PING 209.216.230.240 (209.216.230.240) 56(84) bytes of data.
    64 bytes from 209.216.230.240: icmp_seq=1 ttl=50 time=68.2 ms
    64 bytes from 209.216.230.240: icmp_seq=2 ttl=50 time=69.5 ms
    64 bytes from 209.216.230.240: icmp_seq=3 ttl=50 time=67.2 ms

ECBicalho · on Dec 28, 2020

Interesting article. If I understand correctly, the equivalent for the decimal 216 (hexadecimal D8) must be "stormy" or "stupendous".

And in this case "209.216.230.240" translates to "stairway stupendous tracker upcoming".

Thanks for sharing.

friend-monoid · on Dec 27, 2020

This is neat but Im so disappointed by the chosen word list

phoe-krk · on Dec 26, 2020

> Fully canonically, :: is 0000:0000:0000:000:0000:0000:0000:0000.

Nitpick: missed a single zero in the middle there.

skissane · on Dec 26, 2020

The following comment "My apologies to trypophobic readers" makes me think that the mistake was intentional.

dave_universetf · on Dec 26, 2020

It wasn't, but I'm glad to have plausible deniability :) I fixed the typo.

Dagger2 · on Dec 28, 2020

Bigger nitpick: as per RFC 5952, canonically :: is ::. 0000:0000:0000:0000:0000:0000:0000:0000 is a valid way of writing the same address, but it's not the canonical way.

keleftheriou · on Dec 26, 2020

You are a hero

jpxw · on Dec 26, 2020

As Go’s net package IP parsing was mentioned, here’s a fun fact: under their API it is impossible to distinguish between an IPv4-mapped IPV6 address and the equivalent normal IPv4 address.

daenney · on Dec 27, 2020

I find this to be a great feature. net.IPNet.Contains takes this into account, so you don’t have to worry about or deal with shenanigans like IPv4 mapped addresses. It makes implementing SSRF protection much easier.

strenholme · on Dec 26, 2020

Since I write a Lua-parsed DNS server which works with IPv6, even when compiled for an ancient version of MINGW on Windows XP (which has IPv6 support but no built-in IPv6 parser), I had to write an IPv6 address parser (no inet_pton(), which is what most programs use for IPv6 parsing, on that system).

No, I did not add dotted quad notation to the parser. No, you can not have more than four hex digits in a single quad; 00000001:2::3 is a syntax error. It supports “normal” stuff like ::, ::1, 2001:db8::1, and even non-normal stuff like “2001-0db8-1234-5678 0000-0000-0000-0005” (to be compatible with the really basic IPv6 parser I put in MaraDNS’s recursive resolver nearly two years ago), but does not support any of the IPv6 corner cases in the linked article.

The IPv6 test cases in the automated test for the parser are at: https://github.com/samboy/MaraDNS/blob/master/deadwood-githu... (The final three lines are supposed to return errors)

thomashabets2 · on Dec 26, 2020

I especially love it when address parsers on the same OS don't agree:

http://openbsd-archive.7691.n7.nabble.com/inet-net-pton-seem...

cnst · on Dec 26, 2020

> https://marc.info/?l=openbsd-bugs&m=124425104531501&w=2

Love it! No conversation about SUS is complete without Theo bashing up the absurdity of some historic bugs being documented as features. :-)

---

I do like the hex specification, though. Especially in the age of /29 and such, it's way easier to deal with space using such notation than the decimal numbers, which make little sense for network boundaries in such case. It looks like ping supports most of these (try `ping 0x08080808`, or `ping 0x08.0x080808`, but note that 0x0808.0x0808 is not valid, only 0x08.0x08.0x0808 would be), but `dig @` doesn't.

BTW, I guess this finally explains why the netmask is often shown as `inet 127.0.0.1 netmask 0xff000000` on the BSDs, which is actually a valid IP address notation, as it turns out!

proactivesvcs · on Dec 26, 2020

I'm not convinced these are "cursed". They may be the result of bygone networking conventions, implementation ideas that never came to mainstream fruition, flexibility for use-cases etc. Just because we don't understand something that looks strange, doesn't mean it's cursed, nor that one can simply turn one's nose up and say "I don't understand why these exist so I'll just ignore them when I implement x".

dave_universetf · on Dec 26, 2020

I can help here: these definitely aren't cursed, because curses aren't real. I was exagerating for comic effect, because this was just a twitter rant that got out of control :)

That said, many of those representations no longer make sense in the modern world, and I'm actively choosing to not support them. That doesn't mean I don't understand why they came about in the first place, au contraire! I'm explicitly deciding that their historical reason for existing no longer applies.

proactivesvcs · on Dec 26, 2020

Thank you for the clarification, it does sound like you've done more background research than the linked blog entry may explain. Was this the result of simply reading the RFCs or did you come across other resources that expand on the obsolete IP address representations?

paledot · on Dec 27, 2020

I do think "curse" is a valid technical term, but to me a cursed IP address (or number, or edge case, etc.) is one that behaves significantly differently from other addresses for no self-evident reason. None of these examples are cursed, but 127.0.0.1 is definitely cursed.

AmericanChopper · on Dec 26, 2020

The bygone implementers clearly cursed us with their peculiar decision making. Just as we occasionally curse the implementers of the future (either knowingly or unknowingly) with our peculiar decision making.

mirthflat83 · on Dec 26, 2020

He’s joking with that word choice.

proactivesvcs · on Dec 26, 2020

Yes. I do understand the modern-day usage of the word "cursed" in this context :-)

secondcoming · on Dec 26, 2020

Well, adding support for hex or octal IP addresses was a bit much with hindsight!

Dylan16807 · on Dec 27, 2020

Refusing to implement them is probably more good than bad for most of these.

skeletonjelly · on Dec 26, 2020

I think they've got Class A/B/C wrong? Or at least they're using it in a way that I never learnt

> The familiar 192.168.140.255 notation is technically the “Class C” notation. You can also write that address in “class B” notation as 192.168.36095, or in “Class A” notation as 192.11046143. What we’re doing is coalescing the final bytes of the address into either a 16-bit or a 24-bit integer field.

According to this:

https://www.digitalocean.com/community/tutorials/understandi...

Which details my understanding, classes refer to the ranges, not so much grouping the latter part

Happy to be corrected!

voxic11 · on Dec 26, 2020

from the linked article

> Traditionally, each of the regular classes (A-C) divided the networking and host portions of the address differently to accommodate different sized networks. Class A addresses used the remainder of the first octet to represent the network and the rest of the address to define hosts. This was good for defining a few networks with a lot of hosts each.

skeletonjelly · on Dec 27, 2020

There you go, thanks! Should have properly read the article I linked. So it's been repurposed to be as OP's linked article states? Not so much ranges but the amount of bits in the netmask?

dfox · on Dec 27, 2020

It is other way around: in the original class-ful internet the numerical range of first octet directly implied what is in CIDR called netmask length. The original IPv4 implementations probably did not even have concept of netmask and this was instead hardcoded. Implementing the routing decision as netmask is nice optimalization which then probably inspired the CIDR concept, because at sufficently high level the only thing you need for that to work is making the netmask (or at least the length) freely configurable.

m463 · on Dec 27, 2020

An "fun" use of ip addresses is in NTP.

in the ntp config file, you will have stuff like this:

  server 127.127.1.0 # local clock

or:

  server 127.127.20.0 minpoll 4 iburst prefer  # gps clock

where the "ip address" is of the form: 127.127.<clocktype>.<instance>

here's a page explaining the clock types:

https://www.eecis.udel.edu/~mills/ntp/html/refclock.html

but basically it's a weird anachronism. I'm not sure if NTP will actually bind to those addresses using the tcp/ip stack, or if it someone just got lazy and coopted the ip address parser for off-label use.

kortilla · on Dec 26, 2020

What is the use-case of a decimal representation of a v6 address or a 32-bit int representation of an ipv4 address?

I’ve never had someone tell me, “see if you can ping 143267841”. I’ve worked in networking for coming up on 30 years now and just haven’t found the use.

soneil · on Dec 26, 2020

I suspect it's actually the other way around. On the wire, a v4 address is four bytes. uint_32 is the natural type for this. So when we start looking at cidr scopes, /24 means the first 24 bits of those 32. "The first 24 bits of 4 bytes" sounds wrong to me, "the first 24 bits of 32 bits" sounds logical.

So as I see it - 143267841 (or 0x88A1801) is the address, and quad-dotted decimal is a (slightly more) human-readable representation of it.

gizmo686 · on Dec 26, 2020

>32-bit int representation of an ipv4 address

Internally, I would imagine that almost every IPv4 stack uses 32bit ints to represent an address. Its not that crazy to think this would leak out somewhere.

I've written (un)parsers where we would just treat IPv4 addresses as integers because A) that is how they were treated in the binary data and B) given what we were doing with the data, we didn't actually care about the IPv4 field.

augusto-moura · on Dec 26, 2020

Serialization for non human readable code. Usually IPv4 addresses are stored as int32 in databases or memory

morelisp · on Dec 26, 2020

Perhaps rather "ideally" than "usually." I have worked on several codebases that wasted gigabytes of memory / traffic on this.

mordechai9000 · on Dec 26, 2020

Ugh. I would hate to see the code to enumerate a network, or calculate masks, or determine broadcast addresses without using unsigned ints.

augusto-moura · on Dec 26, 2020

I had, unfortunately. Bitwise operations are scary for some people, so they prefer working with 4 ints or worse, using a string

morelisp · on Dec 27, 2020

> 4 ints or worse, using a string

Or worse, in one case I've had to deal with both (plus another surprise twist):

    public class IP {
        byte[] value 
        // if true value is a variable-length ASCII dotted octet,
        // if false it is length 4 - with LSB in value[0].
        bool isString
    }

Sharlin · on Dec 27, 2020

IPC at least. If you want to pass an IP address (whose natural native representation is a uint32) from program to program as text, having to format it as dotted decimal would be just unnecessary and inconvenient.

esnard · on Dec 26, 2020

Not the answer you're expecting I guess, but I've used it to bypass some anti-XSS filters.

jeffbee · on Dec 26, 2020

There is no use case. It's a meaningless outcome of the fact that `strtoul` is involved somewhere.

tomcooks · on Dec 26, 2020

Boomers like me know all of the IPv4 obfuscation techniques thanks to Fravia' Searchlores, may he forever rest in peace.

https://www.theoryforce.com/fravia/searchlores/obscure

kaoD · on Dec 26, 2020

Ohhh you brought so many memories. Fravia and +ORC marked my teenage reverse-engineering years.

Not a boomer myself (I'm just a poor millennial) but I was lucky enough to enjoy the early days of the internet.

May he rest in peace.

alasdair_ · on Dec 27, 2020

Not a boomer but still saddened every time I remember +Fravia is dead. I remember checking his site every day for most of 1995 and 1996.

tomcooks · on Dec 27, 2020

Same here, pity that many things I've learned through that jewel of a site have been rendered useless by the same very constant updates Fravia himself warned us against.

abotsis · on Dec 26, 2020

Wow, this. One thing I didn’t see mentioned was “0”. You mentioned it, but it didn’t grok to something I know to work in some implementations: “ping 0” behaves like “ping 127.0.0.1”.

nealabq · on Dec 27, 2020

Maybe ping is treating 0 like 0.0.0.0 aka INADDR_ANY ( https://en.wikipedia.org/wiki/0.0.0.0 ). And interpreting it as all the IPv4 addrs mapped to the local machine (including localhost).

vzaliva · on Dec 27, 2020

That's why things like IP address textual representation needs to be rigorously and formally specified using non-ambiguous syntax notation. The implementations then can formally verified to comply to this syntax spec. At the end I would love to have a formally verified library implementation of IP address parser for major mainstream programming languages which everybody could rely upon and do not try to write their own parser. That's a dream.

jtvjan · on Dec 26, 2020

I wrote a little applet where you can put in a class A decimal IP address, and it gives you the 3×4 representations mentioned in the article: https://jtvjan.nl/tools/cursed_ipv4.html

If you count mixed representations, there would be 120 possibilities, but the tool doesn't generate those.

beaugunderson · on Dec 28, 2020

I maintain a JavaScript library that does exactly this (called ip-address). Unit tests are very important for handling the esoteric formats, though there are a couple that were new to me in David's post.

One of my motivations for writing the library was being able to grep for IPv6 addresses in text files; it's surprisingly difficult to match all valid representations of a simple IPv6 address as seen in the example here:

https://twitter.com/beaugunderson/status/527393872909828096

I also maintain a site for examining IPv6 addresses that may be useful to people working with IPv6:

http://v6decode.com/

peteretep · on Dec 27, 2020

Raging debate recently at our coworking space about if 24.7.365 is a valid IP (you can certainly ping it)

FreshFries · on Dec 27, 2020

It is a valid representation of the IP address 24.7.1.109. Which will be what you are pinging.

jweather · on Dec 27, 2020

I spent hours debugging an issue that boiled down to an IPV4 parser that treated leading zeroes as octal. Connections to 192.168.123.100 worked as expected. Connections to 192.168.123.034 went to 192.168.123.28. I thought sure it was an issue in my TCP client code, which was handling connections to hundreds of different devices.

Guilty party was Poco::Net library if I recall correctly. I can maybe see this making sense if you provide four octal digits (0377), but not three, and I have a hard time believing anybody has ever used this on purpose.

ccakes · on Dec 27, 2020

That’s correct behaviour fwiw - there are many ways to write valid IPv4 addresses which aren’t necessarily intuitive

SoSoRoCoCo · on Dec 26, 2020

> a big-endian unsigned 32-bit integer

This is how embedded stacks (LWiP) store IPv4. Didnt' know browsers could respond to it thought.

Mixing IPv4 and IPv6 is just evil.

Sharlin · on Dec 27, 2020

It’s how any reasonable software represents IPv4 addresses. Dotted decimal is only for human convenience (and honestly, I’d argue that 0xDEADBEEF would be just as convenient, after all people turned out to handle HTML/CSS hex colors just fine!)

Sami_Lehtinen · on Dec 27, 2020

Reminds me from email addresses, most sites are doing it wrong. *

There clearly should be a common library to take care of these things, which are way too complex for most of developers.

* https://en.wikipedia.org/wiki/Email_address#Examples

sloshnmosh · on Dec 26, 2020

This Dec to hex to Sacco online converter might be helpful:

https://www.rapidtables.com/convert/number/ascii-hex-bin-dec...

rkagerer · on Dec 26, 2020

This is great! If I'm honest with myself, one thing keeping me from configuring IPv6 as an option locally was the intimidating addresses. This is a great explainer, I finally feel like I "get it".

ChrisMarshallNY · on Dec 26, 2020

Might find this project interesting: https://github.com/RiftValleySoftware/RVS_IPAddress

intc · on Dec 27, 2020

Somewhat related: A simple IPv6 subnet calculator written in Lua: https://github.com/intc/ip6snetc

ipv4dhcp · on Dec 27, 2020

At what point is the format parsed? Is http://36475893 sent to the router or converted to 192.168.56.12 in the browser?

tzs · on Dec 27, 2020

I've long thought it would be amusing to arrange to have both the phone number xxx-yyy-zzzz and the IP addresses xxx.yyy.zzzz and xxx.yyyzzzz.

jancsika · on Dec 26, 2020

Where are the weirdo IPv4 forms used in practice?

capitainenemo · on Dec 26, 2020

I don't know if it counts as in practice, but I use the notation he chose not to parse quite a lot on internal networks.. ssh 10.0.0.123 is already a nice quick address to type out, but ssh 10.123 or ping 10.123 is even quicker. Works in all kinds of random things. Web browsers of course, but games work just fine too usually, if they hand it off to the system to look up.

ben0x539 · on Dec 26, 2020

I write 127.1 all the time when I'm too lazy to type 127.0.0.1. Then I'm sad when it doesn't work because the nearest ip address parser wasn't written in the previous millennium.

Oh, yeah, and 1.1 is the only DNS server address I memorized.

chrismorgan · on Dec 26, 2020

I use mtr 1.1 all the time. (Like, literally all the time, I normally have it running in the background so I can see whether it’s my computer’s wi-fi adapter, the wi-fi router or the local ISP that’s playing up this time.)

I remember it was a few days after they came out with 1.1.1.1 and 1.0.0.1 that it dawned on my that I could drop the zeroes. I’d been wondering why they hadn’t chosen 1.2.3.4, but once I realised 1.0.0.1 was just 1.1, it became fairly obvious why they had chosen it.

(P.S. mtr’s stripchart with latency information is super great for this sort of thing; I have MTR_OPTIONS=--displaymode=2 set in my environment.)

sillysaurusx · on Dec 26, 2020

8.8.8.8, 8.8.4.4, 1.1.1.1

I’m sad that there probably won’t be any memorizable addresses in ipv6.

tomc1985 · on Dec 26, 2020

I once used some of these weird notations to pack config data (mainly IP addresses) for remote installations into a product-key-like string that field techs could receive over the phone

kpcyrd · on Dec 27, 2020

It would be nice if we could deprecate some of them instead of embracing those "cute" standards.

erk__ · on Dec 26, 2020

They wrote it into a blog that may be nicer to read https://blog.dave.tf/post/ip-addr-parsing/

dang · on Dec 26, 2020

Changed from https://twitter.com/dave_universetf/status/13426858222863605... above. Thanks!

bryant · on Dec 26, 2020

Can we set the title accordingly? "Fun with IP address parsing"

dang · on Dec 26, 2020

Ah yes. Changed. Thanks!

known · on Dec 27, 2020

https://www.php.net/manual/en/function.ip2long and https://www.php.net/manual/en/function.long2ip.php in PHP

londons_explore · on Dec 26, 2020

Writing a parser and saying "I'm dropping support for all these old ways of doing things" seems like poor form.

Unless there is a big reason, never drop backwards compatibility. In this case, supporting all those forms would be very do-able. The best way to support them would be to find some old BSD parsing code and port it, then you can be sure every corner case is handled the exact same way. Handling corner cases differently is a great way to introduce security vulnerabilities and crash/DoS bugs that every user of your library will have to be aware of.

Maintaining such code isn't really a good excuse here either - the code is only going to be a few thousand lines, is self contained with no dependencies, is easy to test, not going to change much with time, etc.

Basically, there is no benefit to removing this feature, so don't break what isn't broken.

psim1 · on Dec 26, 2020

There is a good reason: many of the unusual forms are unused except as tricks and exploits. The whole internet uses IPv4 classless routing. There is no value in keeping pre-CIDR forms. Graybeards might object because they have been typing "127.1" for forty years. It's merely an old habit. Who is to say how big a reason is required to "never drop backwards compatibility"?

strenholme · on Dec 26, 2020

The way to handle security problems with corner cases is to just return a parse error if something unusual is seen. With security, the rule is to be conservative with what you accept; anything unusual should be rejected.

In cases where backwards compatibility is needed, just use inet_pton() and let the libc maintainers deal with the bug reports (I believe inet_pton() dropped octal and hex support for ipv4 addresses)

secondcoming · on Dec 26, 2020

> I believe inet_pton() dropped octal and hex support for ipv4 addresses

Correct.

It also doesn't support truncation unlike inet_aton. e.g. inet_aton considers "1.2.3" and "1.2.0.3" to be the same address.

bnjms · on Dec 27, 2020

Disagree. We want the four octet form to remain since it mirrors the four octet wildcard form which does not have an equivalent CIDR form.