Ha, weirdly enough I was thinking about this recently! I worked with big Pharma in 2010s (so: old versions of IE, weird intranet-related security policies) and this issue took several days of my life.
I was building a realtime video player in this new cool thing called HTML5 at the time. HTML5 video support was (/still is, to some extent) a huge PITA due to the uneven codec support, minor implementation differences etc... So naturally I spent days troubleshooting that... Turns out `_` in domain names were not handled properly by some versions of IE running on intranet.
The only more annoying bug like this I can remember now is fighting a suspected DOS attack on our servers when I worked in publishing. The culprit: our Android dev contractor decided to save some time by not implementing push notifications used to read the content but instead used the alarm service, thus making sure that almost every Android user fired a bunch of expensive requests at the same effin' time. Happy times.
> domain names were not handled properly by some versions of IE running on intranet
Your choice of words implies there was a bug in IE, but there likely was not. People quickly forget before Google dumbed down the landscape with its Unibrowser and forced its ways and whims upon the world (and continue to do so)... IE had user-accessible settings for EVERYTHING.
IE also had the concept of security zones with different settings for each zone including Intranet, a feature ahead of its time - quirkily enough and nearly undocumented, Chrome on Windows quietly uses IE's security zones for a few purposes of its own.
Indeed, there are separate settings in IE for "Send IDN host names for [non] Intranet URLs" with different default values depending on Intranet or not, and the _ was likely interpreted as an international domain name and triggering this setting. Why does this setting exist? Knowing Microsoft almost surely for some backwards compatibility edge case.
Reminds me of a similar IE issue, I can't recall whether you were capped to 2 or 4 parallel requests max while on the intranet and using a VPN (or some similar nonsense, the memory escapes me), but those were not fun performance issues to debug.
The real question is why would anything allow you to use an illegal hostname?
I first ran into this problem 20 years ago, and have constantly over the next 20 years. Usually *nix Bind DNS servers warn you against self-inflicted gunshot wounds, but Windows DNS lets you do so gladly. I've managed to figure out over the years this was largely due to silly Windows NT4 admins that built domains, hostnames, and other things with underscore, that when evolving to Active Directory with Windows 2000 that used real DNS, and if Microsoft didn't let them use an underscore, they'd have forced (and should have) admins all over to have to rebuild their domains from scratch to conform to DNS RFC's.
The easy (Microsoft) answer: "Extend" the standard to allow underscores for AD DNS... And allow bad habits to extend perpetually and indefinitely, world and standards bed damned. As usual, thanks for all the fish Microsoft.
I've run into several organizations over the years that use Windows domains that have underscores, and every time I laugh, usually explaining to them about how that's technically illegal and breaks compatibility with, oh everyone but them. It's usually shrugged off that because it's "internal only" they don't have to. At least until they setup external partner and extranet networks outside of their own.
Any unix/linux folks that have ever setup a bind server themselves has probably run into this fact, but windows admins tend to remain still blissfully ignorant about the evils they do.
Similarly, during the 2000s, Microsoft recommended the use of .local as a top-level domain for internal networks. This domain name is actually reserved by the IETF for multicast DNS so this practice has also been the source of many interoperability problems.
Part of the problem is that NETBIOS allowed _ in hostnames but forbade - and Microsoft didn’t address the incompatibility by doing any hostname character translation between the Internet and NETBIOS worlds.
Exactly, and what lazy lug of a windoze admin in 2000 wanted to rebuild NT4 domains to a whole new naming standard that actually complied with the choice to use "modern" DNS instead of Netbios? Those admins would have been at Microsoft's door with pitchforks and torches for making work for them, so instead they capitulated and relaxed the standard.
A funny story is that my father's legal first name is Nil. He moved to the US in the late 70s. Computer systems in the 80s would register the input as null and he had to then temporarily change his name to include his middle name. Corner cases are fun.
how about a first name with a dash in it (compound first names are very very common in France... Jean-Pierre, Jean-Bernard, Pierre-Alexandre, etc...) ?
I'd rather see underscore as standard than hyphen. Underscore to me is a space that is not a space because a space might be a problem or misleading. Hyphens on the other hand are more complicated and have various different meanings. Just my 2 cents.
Once you get with the new gods, it can feel limiting and antiquated to go back.
But you might smile and feel a little nostalgic seeing an old timer plowing data in the modern era, just be careful not to spook him with any of your new runes.
I refute thee. My nickname of choice has underscore in it, or at least a minus/hyphen. At the very worst, I'll settle on a period or a space, but some punctuation mark has to be there.
edit: I felt something was off and doublechecked, I had misremembered the rule on dashes, and consecutive dashes in urls is apparently fine, and will depend on the registrar allowing them or no
=================================
yes, non-punycode domains may not start or end with dashes and must not have consecutive dashes
the consecutive dashes in punycode were done on purpose so that the new internationalizing punycode domains couldn't conflict with any existing domain.
One of my favourite support calls was a sev 1, database wasn't happy & it was involved in call setup so nobody could make calls on the mobile network. It hadn't failed over because of a HA quirk.
Somebody had been working on the box logged in as root and accidentally changed the hostname to "-", which royally messed things up.
OMG one of the classic Linux/Solaris footguns! On Linux, hostname queries the host’s name; on Solaris, hostname sets the host’s name. If (say) you have a favourite shell prompt that includes the hostname, and your PS1 line is old enough to predate bash’s \h escape and/or is designed to work with other shells, it might contain `hostname -s`. Carelessly copy your shell prompt to a Solaris box, and fun ensues when it changes its name to -s ! Hopefully you are suspicious enough to notice your prompt has gone weird and discover the problem before breaking a telephone service.
I remember this from work, Windows networking accepts underscores apparently just fine for file shares and I think browsers can access it as well (not sure), but Java didn’t want to receive connections under that name because it was illegal.
I think there are multiple RFCs trying to define hostnames and one of them lifted restrictions on digits-only labels because TLDs contain at least one letter so there’d be no confusion with IP addresses. However I was not able to find out who guarantees that there is at least one letter on the TLD level.
Country-code TLDs are defined as ISO 3166-1 alpha-2 [1] (with the grandfathered exceptions .ac and .uk). Generic TLDs are subjected to a DNS stability check before approval [2] which, amongst other things, requires that they be either made up of characters a-z only or be valid IDNs.
Yes ccTLDs are not a problem. But the ICANN rules you linked are even more restrictive, they don’t allow digits at all on the top level (see bottom of page 2-13, unless it’s an IDN) and say it’s due to the referenced RFCs but I don’t see how it follows from them.
The seminal definition (must start with a letter) was from RFC 952. Then it was lifted in RFC 1123: “One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. […] If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4 [sic]). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.”
Note that section 6.1.2.4 is completely irrelevant, it looks like a referencing mistake. The later RFC 3696 then calls the rule the LDH rule and says: “There is an additional rule that essentially requires that top-level domain names not be all-numeric.”
But it never says where such a rule comes from.
RFC 2181 references section 6.1.3.5 of RFC 1123 (maybe that’s an old 6.1.2.4) but it doesn’t state that rule either.
> The seminal definition (must start with a letter) was from RFC 952. Then it was lifted in RFC 1123: “One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit.
This relaxation was for 3com because you couldn’t very well exclude Bob Metcalf of all people in your networking protocol! I remember there was a bit of chatter at the time about a domain beginning with a digit.
It would still create ambiguities for resolvers that use search zones. Also... there are hexadecimal digits and all kinds of acceptable IP address representations. You can ping 0xbabecafe or visit it in your browser (by specifying the full URL).
Now if it was mandatory to use, say, brackets (like we do for email) or something...
Defined in RFC-1123 (Requirements for Internet Hosts - Application and Support) and updated in 5321 (Simple Mail Transfer Protocol) to call out the removal of underscores in hostnames when specifically dealing with SMTP
Note that mail domains are subtly different from hostnames:
- fully qualified hostnames can have a trailing dot; mail domains must never have a trailing dot
- hostnames can be unqualified (dotless), but RFC 2821 accidentally forbade dotless mail domains so they are even more of an interop minefield than you might expect
- NETBIOS hostname conventions (- forbidden, _ allowed) tend to leak into internet hostnames more than mail domains
Both are subsets of domain names. I have found it useful to be pedantic about the differences, but I have worked as a postmaster and hostmaster …
The logic is that a large proportion of mail servers will reject messages as specified in that paragraph, so if you want to avoid mysterious delivery failures, your mail software and its configuration should follow suit - better to discover the problem sooner rather than later.
My parents gave me an underscore my name. They wanted to teach me a lesson that life is hard and outdated computer systems like the one used for plane tickets will never be kind me.
I think they were inspired by Johnny Cash’s “A Boy Named Sue” but felt that the whole giving a boy a girls name trope hasn’t aged well (and besides its trendy now anyway.)
> Why is it that every single input in AWS has some arbitrary character limitations?
Because AWS is not one thing.
It's a federation of products built by fabled “two pizza teams” relatively independently for uncoupled velocity, without a heavy dose of "enterprise architecture" imposed on it.
That's the point. It's OK to have independence and local decision making authority. At the same time, some amount of centralized guidance is needed to make the combined product or service feel good.
AWS is a group of largely-independent teams.
It's also very much a "thing". Pretty sure all the AWS teams have the same logo on their t-shirts, get yelled at by the same HR department, and get paid by the same company.
Amazon doesn't even have a QA department, so what's there to be proud of in terms of their software engineering practices? The customers are literally the QA. This is not a hyperbole.
Modern reality: see what horrible horrible things modern day macOS / iOS will do to a device’s ’friendly name’ in order to turn it into a conservative hostname.
I was building a realtime video player in this new cool thing called HTML5 at the time. HTML5 video support was (/still is, to some extent) a huge PITA due to the uneven codec support, minor implementation differences etc... So naturally I spent days troubleshooting that... Turns out `_` in domain names were not handled properly by some versions of IE running on intranet.
The only more annoying bug like this I can remember now is fighting a suspected DOS attack on our servers when I worked in publishing. The culprit: our Android dev contractor decided to save some time by not implementing push notifications used to read the content but instead used the alarm service, thus making sure that almost every Android user fired a bunch of expensive requests at the same effin' time. Happy times.