

Ask YC: How does News.YC figure out domains for a story? - nrao123

Does anybody know how PG/YC figures out the domain name for a story?<p>Is it as simple as:
Take the first occurrence of "/"
and then count backwards till the 2nd ".".<p>Therefore:
www.wordpress.com/12345-65758 = wordpress.com
john.wordpress.com/12345-65758 = wordpress.com
123.john.wordpress.com/12345-65758 = wordpress.com<p>============
If that is the case, quite a few blogs (wordpress, tumblr, posterous etc...) would only show by the blog hosting service domains (e.g. tumblr.com, posterous.com, wordpress.com etc...)<p>=======<p>But this doesn't always seem to be the case:
http://news.ycombinator.com/item?id=592268<p>Is there white list of domains to count to the 3rd "." ?
======
dbul
I've only skimmed the arc code, but there is a list of exceptions for domains.
The major blogs are on this list.

OK, I checked:

    
    
      (= long-domains* '("blogspot" "wordpress" "livejournal" "blogs" "typepad" 
                       "weebly" "blog-city"
                       ; "sampasite"  "multiply" "wetpaint" ; let's just try banning
                       "eurekster" "blogsome" "edogo" "blog" "com"))

~~~
nrao123
Thanks. This is great.

------
byoung2
It seems to have changed recently, as it always used to just show domain.tld,
cutting off the subdomain even when it was meaningful. I would imagine that it
shows the subdomain if it's not www.

