

Hacker News RSS Feed + Readability - nirmal
http://hacketal.com/#hnrss

======
nirmal
Now that this made it to the front page, I just realized that the page content
is not pulled correctly by the Python. :)

The Readability bookmarklet seems to do it just fine. I will need to
investigate.

It seems to work fine with most blogging systems. I don't use one. I use
Markdown, Python and rsync.

UPDATE: I received a request to add a Comments link to the top and did that.
Makes sense if you like to check the comments before deciding what's worth
reading. It will probably take a while before the code is reloaded.

~~~
SomeIdiot
Yeah, apostrophes seem to turn into some wierd jumble (e.g. â€™) and so do
other charaters. Aside from that, very nice work. Added to my reader.

~~~
nirmal
Any clues as to why that happens? The code simply inserts the HTML of the
content sections of the linked sites into the RSS feed. It doesn't seem to
happen with all sites.

~~~
bd
Probably it happens when the real text encoding is different from the declared
one (at least that's the problem I encountered when aggregating content from
unfiltered wild web).

If it really bothers, you can use chardet [1] to try to detect the real
encoding (BeautifulSoup should use it if it's installed). But even this is not
100% foolproof.

[1] <http://chardet.feedparser.org/>

~~~
jauco
Yep, â€™ means that the pages contains the UTF-8 quote character, but the
browser renders the bytestream as if it's a single byte character stream.

The crux is that the basic alphabet is encoded the same. So you only notice it
with special characters such as the curly quote and the em-dash.

------
jjames
Is there any concern here for the missed click-throughs for content creators
for purposes of metrics and/or advertising revenue?

I've not been in a situation to have to worry about either but I know that
site owners make a conscious decision when they formulate the contents of
their own RSS feeds. Some clearly don't want to provide all the goods in feed
readers without an ad stream to compliment it.

I haven't gone through a lot of the feed yet so I apologize if I am missing
some way you've already addressed this.

Regardless, Thanks! I really appreciate this feed.

~~~
nirmal
I suppose I could try to go to each linked article and find their RSS feed and
see if they normally expose the content but that seems quite complicated. I
don't know if missed click-throughs is why pg doesn't do something similar for
the existing HN rss feed. I suspect that it's hard to make it always have the
correct content. This is clearly evident in the fact that my page doesn't get
parsed correctly. :)

~~~
jjames
I'm not super familiar with pg's RSS feed but doesn't it just provide a link
to the article? That doesn't bypass the click-through to the content provider.
It encourages it, just like the HN homepage.

~~~
nirmal
Right, so maybe that's why pg doesn't do something like this feed for the
regular feed. Also, there's the fact that no heuristic will be a 100% reliable
in determining the actual content area.

------
nirmal
I just replied to an email about my server hanging when trying to retrieve the
python code. If you have a similar problem just email me I will reply and
attach the code.

Email address and twitter are in my profile.

Also, any fixes are much appreciated. Patch file not required. I've gotten a
few emails that just show me where I should add code and why. :)

------
hsvieira
As also pointed out by BeaufifulSoup's author, I would suggest you to use lxml
(<http://codespeak.net/lxml/>). It is much faster. I've migrated a couple of
my projects to it.

------
adrianwaj
Check out:
[http://pipes.yahoo.com/pipes/pipe.info?_id=672ce7db13a3ac1ec...](http://pipes.yahoo.com/pipes/pipe.info?_id=672ce7db13a3ac1ec3e22c92eadf748f)

~~~
nirmal
There seems to be a lot of extraneous HTML and CSS in the feed. Good to know
about pipes though.

------
maarek
Nice. I only follow HN in Reader, and this makes that even easier. Thanks a
lot. Now if we could only get full feeds from the NYTimes...

~~~
nirmal
Same for me. This combined with the Google Reader keyboard shortcuts makes it
much easier and faster to go through HN content.

------
djhomeless
Wow, combining my favorite bookmarklet with my favorite news site. Well done!

------
mikecuesta
This is great, thanks so much for this!

