Show HN: Sift through recently dropped .com's in order of pronounceability, etc.

etfb · on Oct 18, 2012

OK, that's a good start, but I hope you're devoting a fair amount of effort to improving your decision algorithm. At the moment, it considers tretz.com to be 60% pronounceable, but llmyw.com, qenis.com and domyh.com to be 100%. I can kind of see how a naive algorithm would calculate that, and I'm pretty sure you can tweak it to make it better. Bonus points for using some sort of neural net or Bayesian technique to improve scores, perhaps with a button next to the score to allow people to adjust it.

jules · on Oct 18, 2012

Here's a Python program that builds a markov chain model of a corpus (with laplace smoothing) and then calculates the probability of the domain name being generated by that markov chain.

    from collections import Counter
    from math import log

    corpus = """training corpus here"""

    n = 1 # token size

    def trans(w):
        return ((w[i:i+n], w[i+n]) for i in range(len(w)-n))

    tokens = Counter(t for t,l in trans(corpus))
    transitions = Counter(trans(corpus))

    def score(w):
        return sum(log(transitions[t,l]+1)-log(tokens[t]+26**n) for t,l in trans(w))

    for w in [" llmyw "," domyh "," tretz "," qenis "," debts "]:
        print w, ': ', score(w)

When I use the European sovereign-debt crisis Wikipedia article as the corpus, I get

    llmyw  :  -25.9316238289
    domyh  :  -23.311005645
    tretz  :  -21.4220068707
    qenis  :  -20.7421233042
    debts  :  -15.3287127006

Higher numbers are considered more pronounceable, so that's in order of pronounceability. Note that scores of words of different length are incomparable.

The variable `n` sets the number of letters of memory the markov chain has. If you set `n` higher then you need a bigger corpus. If you set n=2 and get a larger corpus and preprocess it to filter out anything that isn't [a-z]+ it would probably work fairly well (I just copied the article in as is).

mrmaddog · on Oct 18, 2012

Doing a naive dictionary lookup would also help your algorithm, i.e. limoship.com should rank higher than leonadare.com in my opinion. Cool website though! I've always wondered how often decent domain names lapsed.

verroq · on Oct 18, 2012

I'm fairly sure qenis's high ranking is the result of using Levenshtein distance on dictionary words. You want phonetics based analysis instead. Then again qenis isn't that hard to pronounce either, so I don't know.

s_henry_paulson · on Oct 18, 2012

queenis... kewnis... kenis... Yeah, I don't know about that.

sejje · on Oct 18, 2012

None of those are hard to pronounce. Not knowing which pronunciation to use is secondary.

pbhjpbhj · on Oct 18, 2012

LeonaDare.com would be an exact match for a name check whilst limoship would be a made-up word but pretty good for a local limo-to-port service or a high class pleasure-cruise or something.

pbhjpbhj · on Oct 18, 2012

Llmyw is probably easy to pronounce in Welsh. Apparently ...

Lli is a variant of llif - life. Ll is a letter - [phlegm production noise]. Myw is a mutation of byw - tide.

So in Welsh it's most likely as hard to pronounce as "ktide" in English. Maybe he's using a multilingual dictionary.

PanMan · on Oct 18, 2012

I quite like the idea. However, of the about 10 domains I checked (5 letter ones), only one was available. You probably should use a different, more reliable, way of checking for availability?

RileyJames · on Oct 18, 2012

Great site, excellent idea. If you want to make this useful to slightly larger audience you could add pagerank and backlink information for the domains. Pagerank is simple enough to determine (libraries will check it against the google toolbar). For backlinks you can easily create a link to reports on majesticseo.com or a similar service.

thomaslutz · on Oct 18, 2012

Great idea, how do you find recently dropped domains?

HNaTTY · on Oct 18, 2012

I think the more important question is, how do you find domains that are about to expire? I don't know what the domain hoarders do but here is what I do.

I discovered that pool.com maintains a list of domains that are set to expire. I download and filter the list and then email myself a list of domains that match my requirements (.coms under a certain length, no numbers or other funny characters, maybe .coms with a specific word in them). I actually just wrote this script, it had been on my to-do list for over a year. The daily email contains hundreds of domains so I might have to filter it more.

https://gist.github.com/3914495 Here's my script, it only uses PHP to get tomorrow's date, otherwise it's standard linux utilities like wget, egrep, unzip, cut, sed...

I have it set up as a daily cron job.

I'm interested in suggestions on how to snipe/reserve/etc domains as soon as they become available.

binarysolo · on Oct 18, 2012

Would love to know this too -- and am curious how the players in the market end up snatching these sites.

thinkdevcode · on Oct 18, 2012

Check my reply to the article as I provide a source for domain lists.

sejje · on Oct 18, 2012

godaddy has some on their ftp: ftp://ftp.godaddy.com/

It includes member auctions etc.

I'd guess other services have similar?

jnazario · on Oct 18, 2012

update .com TLD zone maps? look for the ones that got removed ... that's my guess.

RileyJames · on Oct 18, 2012

Couple of sources. http://www.freshdrop.net/ & http://www.premiumdrops.com/ both require subscription, but the data is pretty good.

wrath · on Oct 18, 2012

Very cool and useful. Could be very useful once you add keyword search and more filtering. Good job.

I have a somewhat off-topic question. How do you get access to these domain names?

maayank · on Oct 18, 2012

Posted the question to StackOverflow[1]. Debated with myself whether to post there or Server Fault, but here we are. Maybe it will be closed by the-powers-to-be as I'm not sure of the focus (I do want a programmable solution, i.e. an API)

[1] http://stackoverflow.com/questions/12954630/how-does-one-pro...

timkly · on Oct 18, 2012

I've done a little bit of domaining myself and the first thing to take away from any list is that recently dropped doesn't always equal available. Also you have guys that are running massive lists that perform thousands of buys requests per second across a massive number of domain name sellers in order to ensure they get to purchase the domain as soon as its released.

In terms of obtaining information, most whois queries can be performed via command line utilities... so to start you off here is a good list for whois servers (http://code.google.com/p/whois-servers-list/). Finally, check out each service, some will allow queries which will return true or false to being registered and generally you get a lot more of these requests then complete lookups (without being IP blacklisted)

Finally, in terms of building and managing an index, I believe manual crawling is the only option available... and start with dictionary terms and work out.

Edit: Read this as well - http://www.dotweekly.com/pending-delete-domain-name-drop-lis...

thinkdevcode · on Oct 18, 2012

Some companies, such as GoDaddy have access to DNS zone files - this is the then resold to even more companies. Look at my reply to this article to see where you can get a list for free.

thinkdevcode · on Oct 18, 2012

To everyone looking for expiring/dropping domain lists, ive been building an app called dropparser that uses a free source (though I dont see it being mentioned in here yet). Some python code for you:

  domain_file_url = "http://www.odditysoftware.com/download/dldoms.php?domdate=

  def today():
      return str(datetime.date.today())

  def get_raw_domain_file():
      return urllib2.urlopen(domain_file_url + today()).read()

thomasgd · on Oct 18, 2012

Great idea ! However, I have a problem : let's say I want all the domains starting with the letter "C" with 10 characters max. It says me "Limit of 1500 results reached. Narrow your search parameters to view more results", but there is no way for me to narrow my search parameters while still obtaining the domains starting with the letter "C" with 10 characters max. Anybody has a hint, or is this an error ?

mrspeaker · on Oct 18, 2012

Nice work - It's very effective: I went to buy like, 3 of the domains. Then realised I had no need for them and put my credit card away!

mikeknoop · on Oct 18, 2012

You can force it to show 4-letter domains http://www.swola.com/index.php?limit=4 but none are actually available, they just don't have DNS records.

http://instantdomainsearch.com has this same problem when deciding if a domain is actually available or not.

jeromegn · on Oct 18, 2012

Just bought punypic.com with it. Thought it was a cool name. No idea what to do with it (except the usual image hosting service)

minikomi · on Oct 18, 2012

Picnics for toddlers

ollysb · on Oct 18, 2012

More useful than pronouncibility would be spellability. I'd score domains with only a single spelling as higher e.g. twang.com over base.com (bass.com). To do this the algorithm would just need to scan for graphemes that share a common phoneme with other graphemes and weight those lower.

tocomment · on Oct 18, 2012

Would this be a good way to name a startup or product?

geelen · on Oct 18, 2012

This is so pro. I don't mind that the algorithm's a little wonky, or the availability isn't quite right, it's just a sweet little mashup.

w33ble · on Oct 18, 2012

Pretty neat. I ended up grabbing a domain, now I need to figure out a project for it.

pppggg · on Oct 18, 2012

babymail.com : you site says expired on 10/14/12 . However, http://who.godaddy.com/whois.aspx?domain=babymail.com&pr...

bwood · on Oct 18, 2012

I'm not getting a scrollbar on your site...is that deliberate?

hayksaakian · on Oct 18, 2012

Site renders like ass on my phone, galaxy nexus, both in terms of speed and appearance.

abootstrapper · on Oct 18, 2012

Another way to say this could be, "The site does not render well..." That would take the edge off your criticism, and be a more polite way to help your fellow hacker.