
Hacker News RSS Feed with Full Text Articles - talison
http://www.nirmalpatel.com/hacks/hnrss.html
======
robotrout
I love that you did this. Good work. I intend to use it, and look at the code
as well.

It does raise ethical concerns. Obviously, the folks aren't selling any
advertising, if we're accessing their site content another way...

Which, as you correctly state, is exactly what RSS already allows. I've never
understood why people put RSS feeds on their blogs. It's insanity. Give people
a way to NOT come to my site and view my ads. Fine if ads are not your model,
but for 95% of blogs, ads are the model. RSS is a fad, and eventually, saner
heads will prevail.

Until that time comes, as you say, you're presenting the same info that an RSS
feed does, but with the benefit of having been ranked and vetted by the HN
community first.

~~~
Jebdm
RSS may go away, but something better will take its place.

People put RSS on their blogs for multiple reasons. I use it on mine because
it greatly increases the usability of my site. Another way it helps is that it
makes sites "stickier"; without RSS, people would not have something
automatically telling them "Hey, that site you liked updated! Go check it
out!". They would have to either remember the URL, or remember a link path to
the site, or bookmark it, _and_ they'd have to periodically check back. (Not
that I particularly care—while it is nice to have readers, I write mainly for
my own amusement, and my site is not commercialized at all.)

And I think that these two factors are a net win for most sites, whether
commercial or not. There are ways around the advertising issue—offer truncated
feeds, put ads in your feeds, provoke comments, etc. Plus, I thought that the
known wisdom was that regular readers don't use ads much, it's the ones that
stumble onto your site from Google, etc.? I haven't seen a study, though.

Also, for the record, I basically don't check sites that don't offer RSS. If
it's particularly good, it'll get bookmarked, and maybe I'll come back to it
in a few months when I've got some free time. But most sites don't even get
bookmarked. On the other hand, if I find a site that had a good article, I'll
often add it to my RSS reader on a trial basis.

------
knieveltech
Counting the seconds until someone flings a takedown order. Aggregating full
content without permission definitely exceeds the bounds of "fair use".

~~~
RiderOfGiraffes
For what it's worth I agree with you, but that also makes me wonder how it is
that scribd can copy an entire copyrighted article without asking permission,
and then say that to take it down they require a request that "meets all
criteria for validity as a DMCA affidavit."

That seems wrong to me. If it's copyrighted, and they copy it the entire thing
without permission, how is that not wrong?

EDIT: I see this has been discussed before many times, but this time I ask -
scribd are apparently effectively immune because users upload things to them.
They then claim that if it's copyrighted, it's not their fault. HN has
uploaded content to them without the author's permission. Does that make HN a
copyright violator?

~~~
qeorge
IANAL, but here's my understanding of the DMCA's Safe Harbour provision, which
is why scribd and other UGC sites (e.g. YouTube, isohunt, flickr) aren't held
responsible for copyrighted material appearing on their sites. UGC websites
can be protected by the Safe Harbour provision under the following conditions:

\- The site provides a legal, legitimate service (i.e. fair use)

\- The amount of UGC on the site is too much to reasonably police

\- They have an employee registered with the US govt as the agent for handling
all copyright claims. Simply providing an email address, contact form, or even
a mailing address doesn't count.

\- When they receive legitimate removal requests _from the copyright holders_
they remove the content in a timely manner.

As long as these conditions are met they can't be held liable for copyright
violations which occur through the use of their product.

HN is a different case, because the amount of content copied from the source
site (the headline, and only in some cases) is a small portion of the original
work. Furthermore, HN is adding value to the content through ranking and
community, which makes it a distinct piece of intellectual property.
Similarly, you can legally make a collage which contains copyrighted material
because the collage itself is a distinct entity (more than the sum of its
parts).

How much of the page content can scraped and presented, or how large the
pieces of a collage can be, is vividly debated. For me personally, its a bit
of a Potter Stewart situation, and this feed is absolutely copyright
infringement.

------
talison
I didn't realize this had been submitted before (by nirmal). It was hosted on
a different domain in the previous submission which is why this wasn't caught
as a dupe.

Anyway, you can see some of the author's comments in this thread:

<http://news.ycombinator.com/item?id=542334>

------
jjames
I brought up some concerns about full content scraping last time this was
discussed: <http://news.ycombinator.com/item?id=542552>

I used the feed for a few months but found the feed-readerized articles
lifeless, frozen in time and free of discussion. I'm also not a fan of the
title-only RSS feed. Some sites simply lose too much in feed form imo.

------
dawie
I love it! I have been looking for a RSS feed like this for quite a while now.
Good Job!

------
schwanksta
Thanks, I was looking for this the other day!

------
trezor
Either it is a bug in Firefox 3.5 RC or the mime-type provided is wrong. It
just renders like text, and not like an RSS-feed I can just click to subscribe
to.

Since RSS feeds seems to work on other sites, I'll assume the problem is with
the feed. Correct mimetype should be application/rss+xml.

Confirmed:

    
    
       $ wget -O /dev/null http://nirmalpatel.com/fcgi/hn_feed.fcgi
       --2009-06-22 21:01:56--  http://nirmalpatel.com/fcgi/hn_feed.fcgi
       Resolving nirmalpatel.com... 207.210.105.86
       Connecting to nirmalpatel.com|207.210.105.86|:80... connected.
       HTTP request sent, awaiting response... 200 OK
       Length: 211754 (207K) [text/plain]
       Saving to: `/dev/null'
       
       100%[==================================================>] 211,754      254K/s   in 0.8s
       
       2009-06-22 21:01:57 (254 KB/s) - `/dev/null' saved [211754/211754]

