
Buying a single character domain – and 3 character FQDN – for £15 - edent
https://shkspr.mobi/blog/2020/08/buying-a-single-character-domain-and-3-character-fqdn-for-15/
======
chrismorgan
I must quibble on points of technical precision (because otherwise the article
is both amusing and interesting): the title is pretty much entirely incorrect.

I’ll pick on the FQDN part first, because it’s the (only) part that is
unequivocally wrong. FQDN is a very specific technical term in the domain name
system. A _fully qualified_ domain name includes a trailing dot, so Ⅷ.ﬁ. would
be four characters, even if Ⅷ and ﬁ _were_ the actual labels. But they’re not:
DNS is strictly ASCII-only, so this normalisation is happening at a higher
level (as the OP notes in another response here, tools are applying IDNA2008,
per RFC5895). The FQDN is viii.fi., which is eight characters long.

Next I deny the claim that it’s a single-character domain. Perhaps I’m getting
petty here, but even if people do colloquially speak of example.com as a six-
letter domain, counting only the label at the level you register the domain,
so that I would grudgingly allow Ⅷ.ﬁ to be considered a single character
domain (per proletariat vernacular), the domain name that was _“bought”_ was
not that, but viii.fi, which is a four-letter domain. Hair splitting is fun.

But my pettiness knows no bounds. Domain names aren’t _bought_ , they’re
registered for such-and-such an amount per annum. And I bet it wasn’t exactly
£15.00 that was paid.

:-)

———

I went thinking about other TLDs that would work, and ℡ (TEL → .tel) and № (No
→ .no) occurred to me off the top of my head. Haven’t seen a .tel domain name
in yonks. I never did quite see the point of .tel.

~~~
edent
I appreciate your pettiness! I have edent.tel

I also have a list of tld which can be shortened by this process.
[https://shkspr.mobi/blog/2018/11/domain-hacks-with-
unusual-u...](https://shkspr.mobi/blog/2018/11/domain-hacks-with-unusual-
unicode-characters/)

And, yes, £15.30. But let's not quibble :-)

~~~
nwellnhof
The explanation in your blog post isn't quite correct. The conversion of "Ⅷ"
to "VIII" happens when Unicode text is normalized using a "compatibility"
mapping, that is normalization forms NFKC or NFKD, not when text is
lowercased. While IDNA2003 required a custom mapping based on NFKC, IDNA2008
doesn't specify a mapping phase. It does allow custom mappings, though, and it
seems that browsers apply NFKC at some point in the process. See
[https://unicode.org/reports/tr46/#Mapping](https://unicode.org/reports/tr46/#Mapping)

------
as1mov
The more I learn about text representation and Unicode, the more it looks like
a complete clusterfuck, and it boggles my mind that somehow all this works
almost perfectly while hiding all the complexities from the end user.

I suppose this is inevitable when you tasked with representing literally every
symbol in existence. You couldn't pay me enough to touch this problem with a
ten foot pole (this and text rendering).

~~~
gumby
It’s not a clusterfuck and IMHO it’s an unfair characterization. It is
insanely complicated and shouldn’t be touched except when wearing appropriate
hazmat gear.

Writing seems simple — children do it routinely — but like a biological system
it evolved over millennia in a ton of different directions. It’s coupled with
emotional, practical, and even, yes, moral issues that operate on both deeply
personal and social issues. This is hard to capture in software.

Unicode made a couple of hard decisions right up front. I hate them but they
were smart and Unicode would not have survived had they not made them. One was
round trip with legacy character sets, which meant encoding a lot of redundant
characters (English and German “A” have he same code point, but Greek “A” and
Russian “A” do not, nor does an “A” that appears in a Japanese code table.
Second was abandoning attempts at Han unification, which had its own
linguistic, emotional and political issues.

People are complicated and so are their languages so wrestling the whole thing
into a tractable system has been worth the effort.

~~~
canjobear
> abandoning attempts at Han unification

Huh? Han unification happened.

~~~
gumby
It is quite different from what was originally proposed but You are right I
should not have phrased it that way.

------
dweekly
For brevity and wit, the most impressive email address I ever saw was up@3.am
- and the most impressive website was [http://ws./](http://ws./)

Sadly, ws. is not serving right now it seems - I had no idea root ccTLDs could
vend an A record but there you go; I guess technically that means the root
servers themselves could vend A records, which would let you have the ultimate
website at [http://./](http://./)

~~~
macintux
A friend of mine, Garrett Smith, has g@rre.tt, which always impressed me.

I’m still waiting for someone to launch a .ux domain so I can grab macint.ux

~~~
chrismorgan
Two letter TLDs are reserved for country codes (ccTLD), so barring policy
change and ignoring minor subtleties in the actual rules, .ux will only happen
if a new country comes around and gets assigned the ISO 3166-1 alpha-2 country
code “ux”.

A more probable course of events would be “tux” being registered as a new
gTLD.

~~~
dweekly
Well, for the low low price of $185k and a bunch of paperwork you could be the
proud new owner of .tux:

[https://newgtlds.icann.org/en/applicants/agb/guidebook-
full-...](https://newgtlds.icann.org/en/applicants/agb/guidebook-
full-04jun12-en.pdf)

As it looks like that TLD is not currently issued:

[https://data.iana.org/TLD/tlds-alpha-by-
domain.txt](https://data.iana.org/TLD/tlds-alpha-by-domain.txt)

~~~
macintux
I oft rue my failure to be vastly wealthy.

------
mcherm
Congratulations -- that's a really clever hack. Exactly the kind of thing I
love to read about on this site.

------
dahfizz
You seem to use "character" and "codepoint" interchangeably, but its important
to note that you are not saving on bytes.

You mention this can be used to avoid filters, so I guess this is specifically
to trick "string".length?

~~~
edent
You are quite right. Ⅷ in UTF-8 is 0xE2 0x85 0xA7

That's still shorter than V I I I though.

------
jpxw
Is the other character you hint at ㎉?

~~~
edent
Yes!

------
rakic
How about $12, like [https://xn--bj8a.com](https://ꑮ.com) or ꑮ.com (which
currently resolves in Safari on macOS and iOS only).

~~~
mdpye
I can highlight the domain and open fine in Firefox on android. It doesn't
render as the unicode glyph in the address bar after resolution though

------
bugmen0t
If you go search regularly, there are some cheap two-dot-two you’ll find that
are ascii. Bit longer in string length but shorter on the wire.

I own 0e.vc, which is on GitHub as a general purpose xss domain if you need
it. Iirc it does eval(window.name), or location.hash. Whatever works for you.
It’s also on the public suffix list which makes it almost like a top level
domain for security purposes. So I can have subdomains that can’t ever share
cookies :-))

~~~
zjs
> It’s also on the public suffix list

A bit of a tangent, but: how and why?

~~~
bugmen0t
How: File an issue at
[https://github.com/publicsuffix/](https://github.com/publicsuffix/) Why: It's
fun. Also, web Security testing gets easier when you have can make pages for
all likely and unlikely scenarios (cross-site/domain/origin).

------
jpxw
Technically, the shortest domain name is 2 letters:

[http://dk](http://dk)

~~~
snthd
Is [http://](http://). _possible_?

~~~
unicodepepper
Apparently it is possible to do it with local DNS resolution, using your hosts
file Not sure how possible it would be on a remote DNS server, or whose
authority you'd need to actually do it.

[root@host ~]$ curl -v [http://](http://).

* About to connect() to . port 80 (#0)

* Trying 127.0.0.1...

* Connected to . (127.0.0.1) port 80 (#0)

> GET / HTTP/1.1

> User-Agent: curl/7.29.0

> Host: .

> Accept: _/_

>

< HTTP/1.1 400 Bad Request

< Date: Sat, 15 Aug 2020 15:38:54 GMT

< Server: Apache

< Content-Length: 347

< Connection: close

< Content-Type: text/html; charset=iso-8859-1

<

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">

<html><head>

<title>400 Bad Request</title>

</head><body>

<h1>Bad Request</h1>

<p>Your browser sent a request that this server could not understand.<br />

</p>

<p>Additionally, a 400 Bad Request

error was encountered while trying to use an ErrorDocument to handle the
request.</p>

</body></html>

* Closing connection 0

------
beeforpork
What about ﷺ ? Arabic letters in domains should be OK, right? It decomposes
into a lot(!) of characters.

~~~
sp332
Here are the rules for the .sa TLD [https://www.iana.org/domains/idn-
tables/tables/man_ar_1.0.tx...](https://www.iana.org/domains/idn-
tables/tables/man_ar_1.0.txt)

I suppose other TLDs could have different rules. But in general you'd want to
have a canonicalization step so you don't have two domains that are just
different ways of composing the same thing.

~~~
slim
it looks like .man

~~~
sp332
Oh true... I must have misread the comment I copied it from.

------
maxrmk
I tried to go one character smaller, by using the combined unicode character
"⒕", in an attempt to eliminate the ".". I combined it with the "ms" TLD,
which can also be represented a the single character. Unfortunately all the
browsers I tested refuse to treat it as a domain name without a full stop, so
I'm stuck using the three character [http://xn--1rh.xn--ryk/](http://⑭.㎳/)
instead of the two character [http://xn--6sh056c/](http://⒕㎳/) I was hoping
for.

On the bright side, even the three character version is unlinkable on facebook
-- it just redirects me to [http://invalid.invalid/](http://invalid.invalid/).
I'll take that as a win. I still managed to get a pretty cool domain name for
around $40, and it was definitely fun to mess around with this idea.

EDIT:

Interestingly, HN has automatically punycoded the URLs. That should be ⑭.㎳ and
⒕㎳

------
AndyMcConachie
It's not clear what domain he actually registered.

xn--jxb.fi would appear to be unregistered.

(U+2167) (U+002E) (U+FB01) is not a valid domain name.

viii.fi (all ASCII) is registered, and the registrar is gandi. But so what?

I don't get it.

~~~
dom96
Yeah, I read the article and just now went back and read the Minimal Viable
XSS article linked as well. I am also rather puzzled.

This seems to be only useful if you manage to find a website with an XSS flaw
and one that also limits the input to 20 characters? Are these situations
really common enough to warrant this attack? It all seems rather arbitrary to
me.

~~~
edent
I'm a rather amateur bug hunter but, yes, some sites do use string length
limitations as a way of filtering out dodgy code.

They shouldn't - but they do.

(See
[https://www.openbugbounty.org/researchers/edent/](https://www.openbugbounty.org/researchers/edent/))

------
Ndymium
Note that you are responsible for checking some company and trademark
registers for clashes with existing names before registering a .fi domain.
With a quick check there is at least one trademark close enough that it could
theoretically cause you a headache, though I doubt they care.

My brother's old .fi domain was taken by a company with the same name (because
they have priority) and they didn't even do anything with it...

~~~
dividedbyzero
Where do you check for such things?

------
JimWestergren
I own n.nu - is that the shortest domain possible?

~~~
jsjohnst
I also own a four character domain too, but the domain part is a single
Unicode character and HN seems to strip it from text here.

The punycode version is xn—x6h.ws

~~~
Thorrez
[http://xn--x6h.ws](http://xn--x6h.ws)

Recycling Symbol for Type-4 Plastics

~~~
jsjohnst
Yep! I chose it as the domain’s total character length is 4 (when visible in a
browser obviously, not in punycode form) and the recycling symbol felt
appropriate for url shortening. I hosted it on Bitly (hence you see a Bitly
landing page if the short url doesn’t exist).

I don’t use it as much anymore because of the previously mentioned Unicode
filtering some sites use, like HN.

------
Arkdy
You can use emoji with a .to domain.

Does that count?

------
tyrion
It's not as hard or expensive as you say to find two characters domains on two
characters TLDs. You may want to do some better research. Hint: search on
hn.algolia for "short domain names" :)

------
fegu
4 character names were readily available when Norway eased regulations. I
bought a couple just last year when (.no) opened for 2-letter domains. Some
are still available, I believe.

~~~
wartijn_
Do you do anything with them or are you just keeping them for fun or future
profits?

------
blunte
If you don't mind mixing one letter with one number, and then you choose a
2-letter TLD, you obviously can have a a very short and still inexpensive
domain.

------
lepouet
In French, Ⅷ.ﬁ is pronounced "wifi".

------
Giorgi
Is there list somewhere which symbols decompose into characters? (other than
that 4 character symbol example)

------
he991z
I tried opening viii.fi (no special characters), and that also directs to this
website. How did that happen?

~~~
jrochkind1
Cause `viii.fi` is indeed what OP registered. They are counting on browsers to
turn `Ⅷ.ﬁ` into `viii.fi` before resolving, by running `Ⅷ.ﬁ` through a unicode
denormalization routine first.

That may be a standard thing to do with unicode in domain names, run it
through the standard unicode denormalization first? Understanding what
browsers are "supposed" to do with unicode in domain names (and URLs
generally) is very confusing for me.

I would be curious to learn more about what standards govern how browsers
handle unicode in domain names, the history of it, how compliant browsers are,
etc. I also don't entirely understand the goal here -- the original `Ⅷ.ﬁ`
isn't actually only two _bytes_ in any encoding... what is the value of having
something that shows up as two "glyphs" even though it's more bytes and
denormalizes to something else with a yet different number of bytes?

~~~
fireattack
I don't get it then.

Since browser already turns it into ascii format before resolving, how would
it work in XSS for server-side max length limitation as he mentioned in his
other article, "Minimum Viable XSS" [1]?

[1] [https://shkspr.mobi/blog/2016/03/minimum-viable-
xss/](https://shkspr.mobi/blog/2016/03/minimum-viable-xss/)

~~~
jrochkind1
Yeah, I don't really get that either; I don't understand the value of these
denormalizing domain names to XSS.

------
karthik_m21
Is there any site where I could get a list of single character or a double
character domain names ?

------
wyxuan
speaking more maliciously about unicode, I know that there are ways to attack
the domain names, for example replacing the o in Google with a cyrillic o or
some other character that looks like o for the purposes of phishing.

