
My Hacker News firehose - davewiner
http://scripting.com/stories/2010/11/08/myHackerNewsFirehose.html
======
tptacek
Hacker News isn't a potentially valuable Internet resource waiting to be mined
for interesting new applications. That's Twitter. Hacker News is an actual
community.

I don't know or care why the API got blocked, but I've been as happy with as
many HN add-on apps as I have been incredibly irritated by them (paging: guy
who scraped all HN job postings and made his own job site with them). From
what I can tell, it's not actually part of this site's philosophy to be a
building block for other people's software ideas.

~~~
davewiner
You misunderstood what I was saying. Start with the firehose. I don't want it
to make money, I want it so I can link this flow in with other flows that I'm
following without having to visit all the sites. Not trying to make a
business, just scratching my own itch having no idea where it leads.

~~~
mikeryan
I'm confused why you needed a feed of all stories as they're submitted.

Is that substantially different from the new feed?
<http://news.ycombinator.com/newest>

~~~
davewiner
Same content but instead of being HTML it's RSS 2.0.

<http://static.scripting.com/hackernews/rss.xml>

That feed stopped updating this morning. That's the issue. I want the API that
it's built from to be turned back on.

Hope this helps alleviate the confusion. :-)

------
jrockway
I think the idea behind HN is to have a place where you can go to chat when
you are bored or feeling distractable. Having every story posted as its
submitted is counterproductive -- it makes HN become a job (like reading
Email) instead of something to do once in a while when you feel like it.

I miss probably about 90% of the stories on HN. I enjoy it anyway.

HN is a place to go, not a list of articles.

------
elq
Why is it that every time I visit scripting.com (about twice per year), I'm
met with niggling little issues like reload loops or the lovely javascript
link to the loopback address -

"[http://127.0.0.1:5337/scripting2/editor/controls?username=da...](http://127.0.0.1:5337/scripting2/editor/controls?username=davewiner&url=http://scripting.com/stories/2010/11/08/myHackerNewsFirehose.html)

I think Dave needs a new tool.

~~~
davewiner
Just curious, does that link cause you any problems in your browser? If so,
what browser and on what platform?

~~~
elq
Yes. Caused both firefox 3.6.12 and safari 5.0.2 on osx to have fits.

firefox on ubuntu and chrome seem to work without trouble.

~~~
davewiner
Sorry about that. What do the fits look like?

How much can they penalize you for opening a web page that has (what amounts
to) a broken link?

BTW, I use Firefox on the Mac, on machines that don't have my CMS running on
it (that's what the link connects to) without problems.

------
daxelrod
>Apparently that's because Hacker News has blocked his API.

Does anyone know if this is actually the case? <http://api.ihackernews.com>
gives me a 404, but I'm posting this from ihackernews's browser interface,
which obviously is getting updates.

~~~
ronnier
I took down the site after YC blocked my IP address. I believe it was blocked
as a result of increased usage from the explosion in traffic, resulting in
some really heavy usage of the API. My guess is that YC's software did this
automatically to prevent abuse.

In addition, many folks weren't happy with me distributing the HN database. So
there really wasn't a reason to keep it up.

I'm going to rework the API such that it will work within the boundary of an
acceptable scrapping rate. I'm not putting the database back up.

~~~
ajg1977
That's a real shame, I was thinking of building something on it too. I wish
there was a way to get HN pages in simple JSON/XML.

~~~
ronnier
That's exactly what my API did. JSON, JSONP, and XML for pages, comments,
profiles.

------
nir
I'm curious about Giles Bowkett's comment on that story - is he really blocked
on HN? If so, why?

~~~
allenbrunson
yep, he really is blocked. based on his comments, i'm guessing the reason is
because he wasn't able to remain civil and respectful.

<http://news.ycombinator.com/user?id=gilesgoatboy>

that's just the first one i found, i'm sure there are others.

~~~
blasdel
There were other accounts too that are still active but that he may be locked
out of. That particular account got silently hellbanned for sarcasm:
<http://news.ycombinator.com/item?id=551664>

Some historical threads: <http://news.ycombinator.com/item?id=196390>
<http://news.ycombinator.com/item?id=1015591>

------
davewiner
Update: the firehose feed is working again.

[http://scripting.com/stories/2010/11/09/myHackerNewsFirehose...](http://scripting.com/stories/2010/11/09/myHackerNewsFirehoseIsFlow.html)

------
scott_s
In the past, clever things that others have done have been blocked because
they placed a substantial burden on the servers.

------
lwhi
What incentive does Hacker News have to allow access via an API? At the
moment, introducing an API would cost YC money because it would take time to
develop and it would potentially increase server costs.

However, allowing others to do the job seems less troublesome.

Creating artificial restrictions to _prevent_ other people from doing so,
seems like effort.

I would imagine that the owners of a site would only seek to stem the
wholesale flow of information to another property, when inaction is likely to
lead to a loss in value.

Where is the value in HN? It could be argued that the value lies in the posts
and the related combined wisdom, but I think the true value is related to
attention and focus. We come to HN to participate. If the content is allowed
to spread to third-party applications (via an API), this primary source of
value is lost.

~~~
davewiner
How many Ycombinator startups build on the APIs of other sites? Wouldn't it be
reasonable to expect them to reciprocate?

~~~
lwhi
It would be nice to expect HN to reciprocate, but I don't think that it's
likely.

API access always needs to be part of a strategy, because granting it will
change the landscape within which your business operates.

------
ElbertF
The page doesn't load for me, Google Cache got it:

[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://scripting.com/stories/2010/11/08/myHackerNewsFirehose.html&hl=en&strip=1)

------
itsnotvalid
As we are seeing in here there are numerous APIs available (like the one made
available via a torrent download, twitter bots, searching, or alternative look
& feel) which they must have been doing their own parsing of this site. Would
that actually be good to have a official API for all sorts of hacking needs?

------
adrianwaj
The best way to get around the ban I think is to have low-latency distributed
scraping.

Basically, a plugin that submits to RR's site whenever a user is surfing on
HN. RR's site then parses it as usual.

The plugin could have some setting like max KB submitted per minute, and RR's
site would only request the full page if it was needed at the time.

------
nym
I think it's interesting that Dave Winer decided to do this as a blog post
instead of a HN discussion.

Instead of having a conversation with the HN community he decided to have a
conversation with his readers.

~~~
davewiner
That's not what I "decided" at all.

I do my writing on my blog. That's the way I work.

Happy to participate in a discussion here.

~~~
nym
That's still a decision.

~~~
kyro
What is your point?

------
mike-cardwell
FWIW, either your feed, or the feed you're pulling from is not correctly
handling encoding.

"Announcing Brightbox Cloud â the UKu0027s first true cloud hosting
platform"

------
aditya
Am I the only one who wants the firehose and the API to come back? As long as
it can be done without taking the servers down and killing HN, that is :-)

------
FluidDjango
Regarding the "API blocked"...

Seems to be feeding fine as of Mon eve (Nov 8)

------
alttab
What all of these posts fail to consider is that philosophically pg may want
you writing code instead of consuming his community 101 different ways.

Just my 02. I visit hn daily, but I'm busy enough on my own work that the
website suits my consumption needs.

Anyone else like me out there?

~~~
alttab
I almost expected to get down-voted. What I was trying to convey is managing
your time wisely.

