
Domain Name Analysis - pcopley
http://datagenetics.com/blog/march22012/index.html
======
matt1
_When coming up with a new business name, sure, it's probably possible to find
a suitable name in .net space, but these days, why bother? Unless it's unique
you'd not be able to find the same name free in the .com space, which is where
everyone would probably look in first. Better to simply research/brainstorm
further and find a name you can acquire/repurchase in the .com arena and
bypass all the confusion/customer education._

On that note, I recently launched a new domain search tool called Lean Domain
Search [1] which makes finding available .com's infinitely easier than it's
ever been. It pairs your search term with 2,500 other keywords commonly found
in domain names and instantly shows you which are still available, returning
on average 1,200 available domain names per search.

Given the abundance of great .com's still out there, there is no good reason
_not_ to use a .com for your site over any of the other TLDs especially since
as the author points out, for most normal people websites === .com.

[1] <http://www.leandomainsearch.com>

~~~
bostonpete
I don't know much about the business of domain names, so this may be a dumb
question, but... if this service got popular, could you use some of the common
(and/or recent) search terms to inform speculative domain purchases? If so,
are you concerned that would make your target audience less likely to want to
use a 3rd party service like this (b/c they don't want their potential domains
snatched up)?

~~~
matt1
Of course I _could_ do that. As a web app developer, I could also sell details
about who my users are, what their passwords are (if I didn't salt+hash them),
what actions they've taken on my sites, etc. Both would ruin my businesses and
be highly unethical, which is why I would never do either.

------
eevilspock
They misdefined "TLD" as the "Amazon" in "Amazon.com" when it actually refers
to the "com".

Such a glaring ignorance makes it hard to trust the author's domain expertise.
Pun intended. Reading the rest of the article proved my instincts.

~~~
jackalope
Where do you see flaws in the analysis?

------
squeakynick
I am the article author.

Thank you to the numerous people who took to time to email me and correct me
about a definition. In this article, I refer to the entire root of a domain
name e.g. Amazon.com as the TLD. I made a mistake, it is just the .com
component of this name that is the TLD. I hope this error didn’t mask the
enjoyment of the article for you. I appreciate all the feedback I receive.

~~~
eevilspock
You're welcome.

Your graphs on the frequency of each length domain name is misleading as it is
too easy to interpret it to mean that the most popular lengths are 10 or 11
characters long, when in fact shorter names are more popular but have a
limited since shorter means fewer combinations. You discuss saturation later,
but it would be more informative to combine the two pieces of information. For
example, on the same graph you could plot a ceiling line representing the
total available combinations for each length. It would be obvious that the
frequency bars for lengths less than 10 are shorter only because they're
bumping up against the ceiling.

Secondly, a lot of experts disagree with you on the importance of having a
.com domain name, and many successful sites are on different domains.

Thirdly, what actual utility do the couplet/triplet and start/end character
data and graphs provide?

If I were looking for an expert to select a domain name, I would choose
someone who understands what matters, not someone buried in inconsequential
minutia.

~~~
squeakynick
Tough Crowd, Tough Crowd :)

The graphs aren't misleading (IMHO), they show the distribution of lengths.
However you slice it, if you have to type the domain name, you have to push
that number of keys. It's far more important to know that, than the ratio of
the length normalized by the possible combinations of characters at that
length. (I did try looking at that as well, but the graphs were almost
meaningless - even at log scales, the increase in the number of combinations
of characters dwarfs the number of names, and after you get more than 10
characters, the percentages drop to so small that comparing is meaningless).

"a lot of experts disagree with you on the importance of having a .com domain
name". Well, yes they can! So go with the wisdom of the masses. It's a free
market, and as a company you can select your own domain suffix. Yet most go
for .COM, because well, right or wrong, that's what people expect. (I will
agree that, in the end, it's not as important because, if you read my article
I state that many now find web sites through typing keywords in search
engines, and the exact domain is not important), but, Ask youself the question
though: if you are company XYZZY, and there is a XYZZY.COM domain there, and
it's not yours would you be just as happy with XYZZY.NET and not worry about
it (Listening to those experts?) I think not, you want to preserve your brand,
and avoid confussion, and make it as easy as possible for the masses to find
your site (the non-experts who make up the majority of your consumers). It's
the tastes of the fish, not the tastes of the fisherman after all :)

This article was written for fun. The couplet/triplet was generated out of
interest and to see common combinations of letters. I find it fascinating, and
I'll be happy to explain some business utility of it if you want to send me a
personal email.

I'm not trying to sell my services as an expert domain name seller; it's not
what I do. I make a living as someone who mines data and helps find
'inconsequential minutia' in data to leverage (when dealing with hundreds of
millions of users, moving the needle just fraction of a percent can make a
difference to a bottom line).

But anyway, the article was created as a trivia/fun article. I'm sorry you
don't find it interesting/relevant. (Though again, wisdom of the crowd: since
someone posted it here this morning, my inbox/twitter has been alive with
comments/retweets about how fun and interesting an article it is - to date,
it's been one of the most promiscious articles I've written)

------
mproud
"It’s interesting to note that the distribution differs from the the
traditional pattern used in the English lanuage: E,T,A,I,O,N,S,H,R,D,L … Some
of this can be explained by the fact that domain names are not just for the
consumption of English speaking people. Even though other regions have their
own domains, since .com has become the lingua franca, many businesses simply
default to .com (For those interested, there is an interesting article on
Wikipedia about the differing relative frequencies of letters in other
languages)."

That may be part of it, but the author doesn’t recognize at all the likelihood
the letter I is used more frequently probably due to Apple’s product naming
influence, imitation from other companies pre-pending the letter before their
prouducts and services, and the fact that ‘I’ is a strong, powerful pronoun.

~~~
squeakynick
The "i" does make a difference, sure, but not as big an influence as you might
think. You can only put the "i" infront of so many words, and if you look at
the initial letter charts, it's not massively dominant there.

Far more important, for instance are the substrings like "FREE" (which can
apply to all things, not just computer related) and this has a couple of "E"s,
or anything that has the "%ING%" substrings (which is a very common letter
combination in the English language)

~~~
mproud
I won’t cite any sources, but ETAOINSHRDLU is well-established as a fairly
accurate English letter frequency. The point in the article is the frequency
found in domain names has a "higher" frequency of I’s and a relatively lower
number of T’s, despite ETAOINSHRDLU, but doesn’t really explore why. (-ING
endings are already taken into consideration with ETAOINSHRDLU.)

Also, I have to disagree with you; there are thousands of companies that have
capitalized on the Apple product ecosystem (iSkin, iLounge, iPodResQ, etc.)
and in the commonly associated abbreviation of “Internet” to i. I would say
there are many more prefixes with ‘i’ than ‘e’ or any other letter.

~~~
squeakynick
Please do cite sources; It's what adds weight to your comments and
differentiates them from speculation.

Yes ETAOIN SHRDLU CMFWYP VBGKQJ XZ is an accurate distribution of letters in
common English text _BUT_ this is a distribution of letters in written
English. As it turns out, however, written English is full of very common
little glue words, like THE, OF, AND, A, TO, IN, IS, YOU, THAT, IT ...

One third of all printed English material is made up of the top 25 words, and
the most common 100 words account for almost half. Domain names are not
typically sentences, and are often just one or two words. Instead the
frequency of occurence of letters in distinct words should be used. For all
distinct words in the English dictionary, this distribution is a little
different: ESIARN TOLCDU PMGHBY FVKWZX QJ

Already "T" is much further down the list.

Interestingly, this distibution varies by length of word. By the time we get
to words of length 13, for instance the order has changed to IENTS ... (So
already the letter "I" is _the_ most common letter for longer words without
any influence of Apple).

You can read about a full analysis of the distribution of letters and see a
complete table of letter frequency against word length here:
<http://www.datagenetics.com/blog/april12012/index.html>

There may be "thousands of companies" that have added an "I" to their compnay
name (though it would help your arugment if you quote sources). But even so,
this is dwarfed by the 102 million names. Even tens of thousands of new "I"
companies is a fraction of a percent change against this denominator.

There are millions of companies/organisations that have domain names, and not
all are tech related.

I'm happy to continue debating and will gladly run any queries you suggest
using the entire domain name database and the English language database to
generate numbers.

------
3pt14159
I don't know if it is fair to say that a period is part of the domain name. It
just separates out the subdomain. news.ycombinator is the subdomain 'news' on
the ycombinator name, whereas dashes have no actual information pertaining to
them.

~~~
squeakynick
I'll not agree or disagree :)

I simply processed the file provided by the good folks at Verisign, and used
however they classified things.

------
jaylevitt
Random fun fact: Sanford Wallace once crashed AOL's mail server by sending
mail that was allegedly from
"howmuchwoodcouldawoodchuckchuckifawoodchuckcouldchuckwood.com".

------
taylorbuley
_Having access to a database of domain names, I decided to run some more
analysis on the .com and .net databases._

Anyone know of such a database that is also public?

~~~
squeakynick
If you sign a legal agreement with Verisign and have a legitimate need, they
will grant access. Good Luck.

------
hessenwolf
Looks familiar:

<http://blog.nametoolkit.com/domain-names-taken>

~~~
huggyface
Heh, I pioneered this space -
<http://blog.yafla.com/Interesting_Facts_About_Domain_Names>

I accidentally discovered that any chimp could sign up to receive the
database, did that basic analysis, and have watched as it rinses and repeats
every six months or so.

~~~
squeakynick
Cool!

------
alinspired
I'd love to compare the domain metrics with english language as a whole, or
with a popular encyclopedia

------
pcopley
A couple months old, but there's some really interesting data analysis in
here.

