

Ask HN: I need a brainstorm on how to collect several thousand quality RSS feeds - iamelgringo

Google and Bing aren't doing right by me.  All I'm finding are crappy SEO optimized sites.  As you might surmise, there's a lot of web scraping and natural language processing in my near future, but I need a corpus of decent RSS feeds to start scraping.<p>What I'm looking for is a way to collect several thousand decent economic and financial news RSS feeds for my startup, Newsley.com.<p>Aside from gathering these by hand, has anyone found a great directory of RSS feeds?  (If you have a list of a whole lot of economic and financial feeds in your Google Reader account, export the list, email it to me. I'll buy you beer or your beverage of choice. )<p>Alternatively, I'd love to hear a brainstorm or suggestions on where to find collections of RSS feeds. or ways in which a bunch of RSS feeds could be gathered.<p>Suggestions?
======
IgorPartola
I wonder if you could use services like StumbleUpon and XMarks to find
financial blogs that users have ranked and then just grabbing their RSS feeds.

~~~
iamelgringo
That's interesting. I'll definitely have to check in to that. That's a great
idea. Thanks.

------
jacquesm
Spider feedburner ?

wget -r google news and see what their sources are ?

<http://www.feedage.com/categories/News/1/4>

~~~
iamelgringo
Feedburner: I've thought about this one, but I haven't been able to find a
good starting point. Their vast collection of RSS feeds don't seem to cross
reference nicely and lend themselves to an obvious spidering strategy like web
pages do. Suggestions on how to start are welcome

karma++ for wget google news. I looked at feedage, but signal:noise ratio
kinda sucks.

~~~
jacquesm
Tread carefully when spidering google, mail me for some more info on this.

Being IP banned from google is not funny.

~~~
scrollbar
Ouch, the Google ban stick does not sound fun. An alternative for spidering
high-risk avenues like that might be 80legs.com, in essence a legal/opt-in
crawling botnet =P I've used them when they were in open beta and the service
was free -- now you need to pay, although it's pretty cheap for smallish jobs.

------
Barnabas
Have you considered using the PostRank Topic API?

<http://www.postrank.com/developers/api#topic>

~~~
iamelgringo
Very cool. Thanks.

------
willwagner
Bloglines has a pretty decent feed search.

[http://www.bloglines.com/search?q=finance&t=f&ql=en&...](http://www.bloglines.com/search?q=finance&t=f&ql=en&s=f&pop=l&news=m)

plus a top 1000 feeds:

<http://beta.bloglines.com/topfeeds>

~~~
iamelgringo
I had not thought of bloglines. Nice choice. Thank you sir.

------
jplewicke
<http://seekingalpha.com/> syndicates a whole bunch of financial blogs. They
don't have feeds for individual contributors and I'm not sure of the quality
of their topics feeds, but if you crawl carefully you may be able to find your
way to the original blogs of many of their contributors.

------
dpritchett
Marshall Kirkpatrick asked his internet friends for OPML files for his
birthday:

[http://marshallk.com/its-my-birthday-you-should-make-me-a-
pr...](http://marshallk.com/its-my-birthday-you-should-make-me-a-present)

It could work for you too...

------
blakeweb
Check delicious? I myself sometimes tag blogs I find that only _might_ be
useful later, rather than subscribe to them. At first pass I pulled a bunch,
with a pretty low false positive rate at least on the first few, using
economics+finance+blogs as tag filters.

------
dabent
Maybe Regator?

<http://regator.com/#feedsearch:section:20:0>

------
fauxfauxpas
You might find YQL (Yahoo Query Language) useful for parts of what you want to
do.

------
pramit
Postrank.com has plenty of feeds.

~~~
iamelgringo
Good call, sir. Thank you.

------
apowell
Have you tried Alltop.com?

~~~
iamelgringo
Great collection of feeds. I hadn't tried those. Thanks.

