

RSS mistakes: let's not make them again - julien
http://blog.superfeedr.com/rss-mistakes/

======
lwf
> PubSubHubbub evolved and is now able to work with any kind of data (not just
> RSS or Atom), opening the door to a JSON based syndication format.

It is not made clear why this is desirable. The author makes no case for why a
JSON-based system would be any better than the current XML system. Perhaps the
implied "eww XML", but XML is __the standard __.

~~~
bct
People are (slightly?) less likely to do the stupid "let's generate this
structure by gluing strings together" thing that produced the need for liberal
Atom and RSS parsers.

(But yes, I agree that "turn this already-existing thing into JSON just
because" is a stupid trend.)

~~~
saurik
The thing that screwed RSS was not it being XML: I've seen the exact same
issues happen any time someone has a text field in any file format. (Hell, I'm
guilty of this.) They are more related to poor stewardship of a shared
protocol than due to the usage of any particular transport encoding.

The first issue is that you have a field, and that field is rendered in a text
box, and is defined to be text; at some point you go "man, I wish I could add
a hyperlink to my text", and so you now want to put some HTML in there.
However, the field is just text, so what do you do?

Let's say this was JSON, and this text is a string; do you put HTML in the
string? This is somewhat equivalent to putting escaped HTML into the text node
of the XML document. Alternatively, you could replace the string with an
object (vaguely equivalent to putting HTML elements in the RSS).

The result of some people choosing the first option (which seems more
reasonable at first glance than the second option, as it provides a better
experience for existing readers and better fits the existing protocol) is
having to look at a string and guess whether it should be parsed as HTML or
not.

The second problem is that as people make changes in the second direction, if
they are not carefully organized and centralized in a specification, you end
up with haphazard and incompatible changes: someone decides to add a "type"
field with a mime type, someone else adds a different element.

Having a ton of people generate the format, and having the thing parsed by
some liberal canonical language, leads to too much flexibility in fields like
dates: someone puts a weird date format into their date field, and it is
parsed correctly, and then tons of other people do it, and you're screwed.

This also isn't helped by going with JSON: the field is just more likely to
end up being "anything that a JavaScript Date object is willing to convert
from a string to a date", which assuredly supports irritating corner cases
that are not supported by the Date parsers from other random languages.

With RSS, it was really "the peoples' protocol", with the specifications only
encoding random changes that had become popular over time. What we were left
with was a total mess: I can't find it now, but someone once wrote a proof
that you couldn't actually parse RSS due to conflicting standards.

------
zanny
> For RSS, I have first to “guess” (or rather hope) that there is a feed,
> since browsers now hide the infamous orange icon. I can then select the url
> of that tab and copy it. Then, open a new tab, go to the reader I chose to
> use and paste the url. Hopefully, that reader is smart enough to actually
> find the feed url from this page’s url. If not, I’m screwed anyway and I’ll
> have to look into the HTML code of the page!

On sites that provide a feed link, I just click that link, which opens that
feed in feedly, and I hit add to my feedly. Process is as simple as twitter.
The choice of some sites to hide their feeds and provide no links is the
burden of site designers.

~~~
jessaustin
If one doesn't like how her browser handles certain links, it's easy enough to
get an extension or bookmark for that browser that does exactly what she
wants. This is especially true for feeds.

~~~
julien
Do you really think people who don't know what their browser is will install
an extension or a bookmarklet?... Again, compare that experience to a click on
"follow me on twitter" button.

~~~
mpweiher
I don't have to install any extensions in my browser. It Just Works™.

------
th0ma5
Oblig XKCD: [http://xkcd.com/927/](http://xkcd.com/927/)

I think also however that this post brings up a lot of good points.
Specifically I feel it touches on a very important one subtly... we do have
now a "web of data" even if it isn't exactly the W3C spec of such a thing, we
still have one. Now what?

------
haakon
It's too bad FeedTree ([http://www.feedtree.net/](http://www.feedtree.net/))
never caught on. Decentralised, peer-to-peer push distribution of RSS-style
updates. Instead we ended up either hammering servers checking for updates, or
using centralized hubs like FeedBurner, and eventually moving to completely
proprietary services like Twitter and Google+.

~~~
wmf
A feed is fundamentally centralized anyway, so I don't see the problem with
relying on the feed to push updates to you. Unfortunately because Google build
a super-centralized crutch into PuSH everybody thought mooching off Google was
the point of PuSH so sites never installed their own hubs.

------
mpweiher
Hmm...when I click on an RSS feed link in Safari, my RSS reader (NetNewsWire)
opens and asks me if I want to add this feed. (Worked for the comments feed,
didn't see a link for the blog, and the "subscribe" button didn't work).

Yes, polling has issues, but the author doesn't explain how changing the file
format from RSS to some JSON-based format makes those problems go away.

Maybe I am just dense, but neither does the author show real problems, nor
does he offer solutions. Well, the latter being somewhat unsurprising
considering the former.

------
jrochkind1
_The “level 0” solution is to periodically fetch each feed, parse it, diff it
and hopefully find something new._

Um, less solutions to this exist, but they are much simpler than 'PubSubHubbub
or RSSCloud'. And don't require any changes to RSS.

You are requesting this thing over HTTP. You (and the server delivering the
feed) simply need to use standard HTTP caching headers. etags, last-modified,
etc.

No need to fetch the XML and parse it and diff it.

~~~
julien
I don' see how using header will remove the need to fetch the resources, parse
them and diff them. Maybe you'll do that less often, but you'll still do it a
lot...

~~~
jrochkind1
Are you familiar with how HTTP caching headers work?

From an etag in the header, you can tell if the current remote resource is
identical to the one you have locally or not, without any
fetching/parsing/diffing.

Same with from a Last-Modified header, right?

In fact, you send a conditional GET rather than first retrieving only a HEAD
and then deciding whether to continue with a full GET, but either way you
aren't fetching/parsing/diffing to tell if there's new content.

------
stephen_mcd
The main criticism here doesn't really apply to any readers that apply the
slightest effort in making the subscription process streamlined.

Take [https://kouio.com](https://kouio.com) for example (a Google Reader
replacement I've built), in kouio you just enter a website's address and it'll
discover the feed automagically.

~~~
seanzieapples
I like your app so far but I input
[https://news.ycombinator.com/](https://news.ycombinator.com/) and it's not
able to locate an RSS feed (Just says "Loading"). I input
[https://news.ycombinator.com/rss](https://news.ycombinator.com/rss) and it
works.

~~~
stephen_mcd
Ha!

Well as jerf mentioned below, HN doesn't correctly provide a link reference to
its RSS feed. Now that's fine in our case, as a surprisingly large number of
websites fail this test, and simply have a visible link with "rss" or "atom"
in either the link's text or URL.

Now we pick these up, but in this case the front page of HN has an invalid
link with "rss" in the text, namely this very thread itself. So normally the
HN front page URL works fine, but right now it doesn't. This gives us a great
chance to refine our discovery code a bit, so thank you!

~~~
malkarouri
That is one of the coolest bugs I came across lately.

------
MatthewPhillips
> The “level 0” solution is to periodically fetch each feed, parse it, diff it
> and hopefully find something new.

Wait, do feed readers really not do a HEAD request first and check Last-
Modified ?

~~~
gboudrias
I don't think those values are always reliable.

~~~
jrochkind1
So which is easier, getting servers to fix them to be reliable, or getting
everyone to change to a new json pubsubhub whatever thingy, that they'll
probably do unrealiably anyway too?

------
jimbobimbo
Polling may suck, but it just works. Throw an XML file on your server and you
are done, without writing a single line of code.

~~~
julien
Same as hunting for/picking your own food or sending snail mail to tell things
to people. It works, but is not very efficient! Efficiency opens the door for
innovation and new behaviors.

------
ajanuary
I've been wondering if feeds would be more discoverable if browsers integrated
it with bookmarks. Bookmarking a page with an RSS feed becomes a 'live
bookmark' that acts like a bookmark folder with unread counts.

But there's probably a billion edge cases that make it too difficult.

~~~
opminion
[http://www.mozilla.org/en-
US/firefox/livebookmarks.html](http://www.mozilla.org/en-
US/firefox/livebookmarks.html)

------
chenster
RSS specs doesn't change is actually a great benefit to services build around
it. Both [http://dealbert.net](http://dealbert.net) and
[http://joboyster.com](http://joboyster.com) use RSS as the main data source
format.

------
asdf3
We should add query parameters onto RSS requests, and gradually upgrade while
remaining backwards compatible. Create clients and server features that are
compelling enough, and people will update their server code.

~~~
ancarda
What do you mean? /feed?version=1

That's a bad idea if you cache RSS in an xml file, i.e /rss.xml. It would make
it more complex if you had to handle query parameters. The parser should just
check for the version attribute on the rss node; <rss version="2.0">. Or, even
better, it could just check for the availability of server nodes such as
"<cloud>".

Or did you mean something completely different?

------
ancarda
SubToMe doesn't load for me. I just see a blank page and the subscribe button
doesn't work.

~~~
julien
Weird. Any JS error?

------
bmelton
Offtopic somewhat, but if you're building a feed reader in Python, you can use
feedfinder.py[1], to allow for feed discovery. The last feed reader I wrote,
years ago, utilized it to great effect.

Every new feed reader on the block shocks me when I have to enter the exact
URL to the feed (e.g., domain/xml/feed.xml, or what have you).

[1] -
[http://www.aaronsw.com/2002/feedfinder/](http://www.aaronsw.com/2002/feedfinder/)

