

How do news aggregators get a list of news sites? - kevinfat

If I wanted to make a news aggregator site how do I get a curated list of news sites such as http:&#x2F;&#x2F;www.latimes.com&#x2F;
http:&#x2F;&#x2F;www.washingtonpost.com&#x2F;<p>Its not realistic for me to manually compile a list of them all by myself. How did the popular news aggregator sites build a comprehensive list?
======
ScottWhigham
I think, logically, the question we should ask you is, "Why are you
considering making a news aggregator site when you don't have the time to
curate/determine which sources are good sources?" As a user, when I see a site
that has "a list of news articles", I'm utterly underwhelmed and/or
intimidated by the sheer number of articles. However, when I see a site that
has a list of articles from good sources that I'd find interesting, I'll
bookmark it.

------
ig1
Google news sources as of 2011:

[http://img.labnol.org/files/Google-
News.txt](http://img.labnol.org/files/Google-News.txt)

~~~
hugovie
Great job. Thank you so much!

------
al1x
Why not scrape Alexa's list of the top 500? --
[http://www.alexa.com/topsites/category/Top/News](http://www.alexa.com/topsites/category/Top/News)
As a side note, not to ruin your party or anything, but over the years a
handful of HN users have made news aggregators as side projects and none of
them have really gone anywhere. You might want to think about putting your
effort into something else. Google News is a pretty sweet product.

~~~
ScottWhigham
Alexa's list is a good idea. You'd still need to curate considerable though -
that would be the only issue. For example, Shutterstock is listed as the #16
news site on Alexa.

------
aviv
It's your lucky day. There are two data sets you can purchase for a decent
price:

\- 30M news headlines and 500K web sources, 30gb of JSON data ($300)

\- 15K news domains that are the most popular in US market ($100)

These were gathered by Andrew Montalenti, co-founder of Parse.ly. See more
info here: [http://pixelmonkey.org/pub/python-crawling-
slides/](http://pixelmonkey.org/pub/python-crawling-slides/)

------
steerpike
You might find something useful in this list of News related APIs

[http://www.programmableweb.com/apis/directory/1?apicat=News](http://www.programmableweb.com/apis/directory/1?apicat=News)

