Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Sift through recently dropped .com's in order of pronounceability, etc. (swola.com)
111 points by lolkittens on Oct 18, 2012 | hide | past | web | favorite | 35 comments

OK, that's a good start, but I hope you're devoting a fair amount of effort to improving your decision algorithm. At the moment, it considers tretz.com to be 60% pronounceable, but llmyw.com, qenis.com and domyh.com to be 100%. I can kind of see how a naive algorithm would calculate that, and I'm pretty sure you can tweak it to make it better. Bonus points for using some sort of neural net or Bayesian technique to improve scores, perhaps with a button next to the score to allow people to adjust it.

Here's a Python program that builds a markov chain model of a corpus (with laplace smoothing) and then calculates the probability of the domain name being generated by that markov chain.

    from collections import Counter
    from math import log

    corpus = """training corpus here"""

    n = 1 # token size

    def trans(w):
        return ((w[i:i+n], w[i+n]) for i in range(len(w)-n))

    tokens = Counter(t for t,l in trans(corpus))
    transitions = Counter(trans(corpus))

    def score(w):
        return sum(log(transitions[t,l]+1)-log(tokens[t]+26**n) for t,l in trans(w))

    for w in [" llmyw "," domyh "," tretz "," qenis "," debts "]:
        print w, ': ', score(w)
When I use the European sovereign-debt crisis Wikipedia article as the corpus, I get

    llmyw  :  -25.9316238289
    domyh  :  -23.311005645
    tretz  :  -21.4220068707
    qenis  :  -20.7421233042
    debts  :  -15.3287127006
Higher numbers are considered more pronounceable, so that's in order of pronounceability. Note that scores of words of different length are incomparable.

The variable `n` sets the number of letters of memory the markov chain has. If you set `n` higher then you need a bigger corpus. If you set n=2 and get a larger corpus and preprocess it to filter out anything that isn't [a-z]+ it would probably work fairly well (I just copied the article in as is).

Doing a naive dictionary lookup would also help your algorithm, i.e. limoship.com should rank higher than leonadare.com in my opinion. Cool website though! I've always wondered how often decent domain names lapsed.

I'm fairly sure qenis's high ranking is the result of using Levenshtein distance on dictionary words. You want phonetics based analysis instead. Then again qenis isn't that hard to pronounce either, so I don't know.

queenis... kewnis... kenis... Yeah, I don't know about that.

None of those are hard to pronounce. Not knowing which pronunciation to use is secondary.

LeonaDare.com would be an exact match for a name check whilst limoship would be a made-up word but pretty good for a local limo-to-port service or a high class pleasure-cruise or something.

Llmyw is probably easy to pronounce in Welsh. Apparently ...

Lli is a variant of llif - life. Ll is a letter - [phlegm production noise]. Myw is a mutation of byw - tide.

So in Welsh it's most likely as hard to pronounce as "ktide" in English. Maybe he's using a multilingual dictionary.

I quite like the idea. However, of the about 10 domains I checked (5 letter ones), only one was available. You probably should use a different, more reliable, way of checking for availability?

Great site, excellent idea. If you want to make this useful to slightly larger audience you could add pagerank and backlink information for the domains. Pagerank is simple enough to determine (libraries will check it against the google toolbar). For backlinks you can easily create a link to reports on majesticseo.com or a similar service.

Great idea, how do you find recently dropped domains?

I think the more important question is, how do you find domains that are about to expire? I don't know what the domain hoarders do but here is what I do.

I discovered that pool.com maintains a list of domains that are set to expire. I download and filter the list and then email myself a list of domains that match my requirements (.coms under a certain length, no numbers or other funny characters, maybe .coms with a specific word in them). I actually just wrote this script, it had been on my to-do list for over a year. The daily email contains hundreds of domains so I might have to filter it more.

https://gist.github.com/3914495 Here's my script, it only uses PHP to get tomorrow's date, otherwise it's standard linux utilities like wget, egrep, unzip, cut, sed...

I have it set up as a daily cron job.

I'm interested in suggestions on how to snipe/reserve/etc domains as soon as they become available.

Would love to know this too -- and am curious how the players in the market end up snatching these sites.

Check my reply to the article as I provide a source for domain lists.

godaddy has some on their ftp: ftp://ftp.godaddy.com/

It includes member auctions etc.

I'd guess other services have similar?

update .com TLD zone maps? look for the ones that got removed ... that's my guess.

Couple of sources. http://www.freshdrop.net/ & http://www.premiumdrops.com/ both require subscription, but the data is pretty good.

Very cool and useful. Could be very useful once you add keyword search and more filtering. Good job.

I have a somewhat off-topic question. How do you get access to these domain names?

Posted the question to StackOverflow[1]. Debated with myself whether to post there or Server Fault, but here we are. Maybe it will be closed by the-powers-to-be as I'm not sure of the focus (I do want a programmable solution, i.e. an API)

[1] http://stackoverflow.com/questions/12954630/how-does-one-pro...

I've done a little bit of domaining myself and the first thing to take away from any list is that recently dropped doesn't always equal available. Also you have guys that are running massive lists that perform thousands of buys requests per second across a massive number of domain name sellers in order to ensure they get to purchase the domain as soon as its released.

In terms of obtaining information, most whois queries can be performed via command line utilities... so to start you off here is a good list for whois servers (http://code.google.com/p/whois-servers-list/). Finally, check out each service, some will allow queries which will return true or false to being registered and generally you get a lot more of these requests then complete lookups (without being IP blacklisted)

Finally, in terms of building and managing an index, I believe manual crawling is the only option available... and start with dictionary terms and work out.

Edit: Read this as well - http://www.dotweekly.com/pending-delete-domain-name-drop-lis...

Some companies, such as GoDaddy have access to DNS zone files - this is the then resold to even more companies. Look at my reply to this article to see where you can get a list for free.

To everyone looking for expiring/dropping domain lists, ive been building an app called dropparser that uses a free source (though I dont see it being mentioned in here yet). Some python code for you:

  domain_file_url = "http://www.odditysoftware.com/download/dldoms.php?domdate=

  def today():
      return str(datetime.date.today())

  def get_raw_domain_file():
      return urllib2.urlopen(domain_file_url + today()).read()

Great idea ! However, I have a problem : let's say I want all the domains starting with the letter "C" with 10 characters max. It says me "Limit of 1500 results reached. Narrow your search parameters to view more results", but there is no way for me to narrow my search parameters while still obtaining the domains starting with the letter "C" with 10 characters max. Anybody has a hint, or is this an error ?

Nice work - It's very effective: I went to buy like, 3 of the domains. Then realised I had no need for them and put my credit card away!

You can force it to show 4-letter domains http://www.swola.com/index.php?limit=4 but none are actually available, they just don't have DNS records.

http://instantdomainsearch.com has this same problem when deciding if a domain is actually available or not.

Just bought punypic.com with it. Thought it was a cool name. No idea what to do with it (except the usual image hosting service)

Picnics for toddlers

More useful than pronouncibility would be spellability. I'd score domains with only a single spelling as higher e.g. twang.com over base.com (bass.com). To do this the algorithm would just need to scan for graphemes that share a common phoneme with other graphemes and weight those lower.

Would this be a good way to name a startup or product?

This is so pro. I don't mind that the algorithm's a little wonky, or the availability isn't quite right, it's just a sweet little mashup.

Pretty neat. I ended up grabbing a domain, now I need to figure out a project for it.

babymail.com : you site says expired on 10/14/12 . However, http://who.godaddy.com/whois.aspx?domain=babymail.com&pr...

I'm not getting a scrollbar on your site...is that deliberate?

Site renders like ass on my phone, galaxy nexus, both in terms of speed and appearance.

Another way to say this could be, "The site does not render well..." That would take the edge off your criticism, and be a more polite way to help your fellow hacker.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact