Atom vs RSS is a great example of how technical correctness is trumped by social factors, in this case namely support from makers of popular software and content as well as social influence and documentation skills of creators.
The person who pushed RSS to success (IMO) Dave Winer was superb at communicating and evangelizing his goals, connecting partners like Netscape and NYT, and documenting his work including the RSS related tools he built.
His spec was “worse” in the sense that it was under specified but better in the sense that it achieved wide support (both in text and podcast form) among people who made content. This is partly because Dave had first an influential email newsletter and Wired column (DaveNet) and second an influential very early blog Scripting News. He had also been working with news companies for years at prior startups. He could write well. He showed up for and arranged meetings with people who did not at first understand the need for something like rss. He was clear and relentless in his promotion which was borne out of what seemed to be a genuine desire for open standards in this area rather than greed / trying to do lock in.
People with technical backgrounds in places like this tend to fixate on the technical aspects of Atom vs RSS. There is no question Atom is more technically correct. There is also no question (IMO) it came too late and focused on the wrong things — being correct and complete at the expense of being complicated and hard to understand — and more importantly was led and promoted by people who lacked the social skills to make it popular outside of technical circles. (These folks could be brutal about rss's flaws without seeming to have awareness of this shortcoming in their own effort.)
I’d also add Google dropping reader and trying to force open web content into various proprietary schemes. Atom’s many improvements over RSS didn’t matter as much when the oxygen was getting sucked out of the room and few people were investing more time in feeds.
IIRC RSS was not originally an XML document, so CDATA tags (to prevent XSS) didn't work; and the issue remains with content syndication: feed elements should somehow HTML escape their content to prevent XSS (arbitrary JS on a different Origin)
The whole internet was broken, and RSS helped us realize it: the one-way, one-time syndication advantage.
These days it's all about https://schema.org/CreativeWork JSON-LD instead of RDFa, which you can try to sanitize with Mozilla/bleach like arbitrary HTML in comments on the page.
AFAIK RDF Site Summary (the original meaning of "RSS") used XML from the start¹. You may be thinking of spiritual predecessors like the Apple-developed MCF².
It really amused me that people think "RDF never caught on" when it is the basis of so many major standards such as RSS, Adobe's XMP metadata, and, I just found out, ActivityPub.
I was working on a standards committee for years and made the discovery that you can turn most XML Schema Definitions into an OWL ontologies and thus automatically transform conforming XML documents into RDF documents.
(People had attempted this before but all the systems I saw were lossy or didn't make valid and decidable ontologies.)
For better and worse my write up[ is about to become one of those ISO standard documents that costs 166 swiss franc but it looks like I'll get the opportunity to apply this method to a major financial standard and release the software that does it as open source.
I'd contrast that to the truly awful RDF/XML spec where people never really understood where the XML stopped and the RDF began. It turned a lot of people off to RDF and people never got to see how easy it is with Turtle. Unfortunately JSON-LD hasn't got the love it deserves because it fixes most of the major problems with JSONs except for the lack of /* comments */ and you can frequently add a touch of JSON-LD to a JSON document you find on the street and get instant RDF you can query w/ SPARQL.
It looks like the W3C has started the process of a SPARQL 1.2 spec which could be a very good thing if it catches up with what is possible w/ document database query languages like N1QL, AQL and such.
> It really amused me that people think "RDF never caught on" when it is the basis of so many major standards such as RSS, Adobe's XMP metadata, and, I just found out, ActivityPub.
What does it mean to have caught on or not, though? How many RSS, XMP, ActivityPub implementations are actually considering things as triples and not just considering parent-child relationships within the XML tree?
Pretty sure I can remember the actionscript XML parser failing on RSS in like 2000 and the feed was to spec.
With that app, at least, I don't remember trying to just include RSS feed elements in HTML4 or XHTML; that app tried to parse the RSS and failed, it wasn't copying the feed elements without escaping href=javascript: links etc.
Note that it's an XML file with <channel> elements, which contain <item> elements, which contain <title>, <link>, and <description> elements. Pretty familiar, no?
Dave famously introduced the <enclosure> element in 2001. This has made a lot of people very happy and has been widely regarded as a good move. http://backend.userland.com/rss092
DonHopkins on May 11, 2014 | parent | context | favorite | on: The Unix Haters Handbook (1994) [pdf]
[...]
The worst use of the <BLINK> tag ever was the discussion held in the early days of RSS about escaping HTML in titles, whose attention-grabbing title went something like this: "Hey, what happens when you put a <BLINK> tag in the title???!!!"
The content of that notorious discussion went on and off and on and off for weeks, giving all the netizens of the RSS community blogosphere terrible headaches, with people's entire blogs disappearing and reappearing every second, until it finally reached a flashing point, when Dave Winer humbly conceded that it wasn't the user's fault for being an idiot, and maybe just maybe there was tiny teeny little design flaw in RSS, and it wasn't actually such a great idea to allow HTML tags in RSS titles.
Ya I remember this period. A lot of people got caught up in these types of questions. “What if you want to put an unescaped greater than symbol in your post title??” People spent years on this sort of pedantry.
Meanwhile Dave added enclosures and popularized podcasting making RSS even more important. He knew where to focus and what mattered.
The craziest thing I see though is people importing the Atom namespace into RSS feeds to get some of the elements. At that point, I can't fathom what advantages there is to the producer to not just produce an Atom feed.
RSS 1.0 was basically the first attempt to produce something like Atom without reinventing too much, but Dave threw his toys out of the pram, so the only option was to reinvent it with a different name and clean up the remaining sharp corners.
It's tragically funny. Atom is both more correct and easier to produce and parse than any RSS variant.
Were I to join Automattic in the morning, the first thing I'd try is to attempt to get them to abandon their weird RSS mashup as their default feed format. There's no good reason why Wordpress still generates RSS feeds.
Yes, I'm a bit bitter, but I was in the feed wars, and it still stings.
> The craziest thing I see though is people importing the Atom namespace into RSS feeds to get some of the elements.
Interestingly enough you can’t to do the reverse: because RSS never got a proper XML namespace, its elements can’t be embedded into other XML. I can understand that the early RSS 0.9x efforts didn’t do XML namespaces since those were rather new then, but the publication of RSS 2.0 could have been the right moment to introduce one, I think.
Atom is a great feed format with a rock-solid spec.
I wouldn't call it a weird mashup for WordPress to use atom:link in their RSS feeds. It does things no RSS element can do, such as allow a feed to identify its own URL.
For those who don't know, every WordPress site with an RSS feed also has an Atom feed available by adding "/atom" to the end of the RSS feed URL, such as this:
> Atom vs RSS is a great example of how technical correctness is trumped by social factors
I don't follow. Out there in the world, RSS feeds provide their feeds in Atom format. The technical format is called "Atom" and the functionality that the Atom format implements is called "RSS".
In what sense did technically-correct Atom get trumped by anything? This is like complaining that social factors caused "the SAT" to get trumped by "standardized testing".
Your confusion probably comes from the fact that RSS is older, so it's sometimes used as the name of the functionality but it's improper. They're 2 different formats.
There are feed formats named "RSS". There is a feed format named "Atom". And there is a concept named "RSS". The RSS formats, like the Atom format, are all implementations of the RSS concept. Generally, when you subscribe to an RSS feed somewhere, it will be in the Atom format. It's still called an "RSS feed" for the simple reason that that is the name of the concept.
> it's sometimes used as the name of the functionality but it's improper.
No it isn't. What would you gain by insisting that RSS feeds delivered via Atom have to be called something different than RSS feeds delivered in legacy formats? Did we rename TLS when we updated the cipher suites?
Why, in your opinion, is rssboard.org even bothering to write about Atom?
I wouldn't say that an Atom feed is an implementation of the "RSS concept." That muddies the water too much. Because RSS and Atom are distinct feed formats, calling an Atom feed "RSS" would confuse a lot of people.
Instead I'd say that Atom feeds and RSS feeds are each an implementation of the syndication concept.
> Out there in the world, RSS feeds provide their feeds in Atom format.
I just checked a few of the ones I follow, and ... turns out I don't immediately know how to distinguish when there's not a specific xml namespace reference in the doc or such.
But according to Wikipedia the RFC822 timestamps I'm seeing suggest they're RSS2 instead of Atom?
> turns out I don't immediately know how to distinguish
Atom feeds will almost always have <feed xmlns="http://www.w3.org/2005/Atom"> as the root element. In some cases they may use namespace prefixes but that tends to be rarer and less interoperable.
RSS feeds have an <rss version="2.0"> root element.
1) Atom has separate <updated> and <published> fields, while RSS just has <pubDate>. Moreover, RSS wants to you add a redundant day of the week in the date, i.e., "Sat, 07 Sep 2002 0:00:01 GMT", which is dumb.
2) Atom allows you to use <content type="html"><![CDATA[]]></content> where you can just stick in HTML, whereas RSS <description> just specifies "entity-encoded HTML is allowed".
3) RSS has redundant <guid isPermaLink="true"> vs. <link>. Which one is a feed reader supposed to use?
Is it really bifurcated for the end user though? Its the same tooling to view both feeds. The same workflow to add either feed to your feed reader. You don't even realize whether its an atom or rss feed unless you start looking into it.
> Is it really bifurcated for the end user though?
Yes. Suppose you are build feeds search engine. User enters query that matches some blog. This blog exposes both RSS and ATOM feeds (very common scenario). Which one should your engine show in the results? If it shows both, which one the user should select for reading? And what if they differ, but only slightly? Why should the user be exposed to that?
Who's relying on one, who's relying on the other? Dunno! I guess we'll support both, because we're good Internet citizens that believe that breaking links is the absolute worst thing you can do.
As a consumer, I don't really have any preference. Both work fine with my software. My personal blog uses Atom only because my static site generator had an Atom plugin.
The rss pubdate format is the date time spec from rfc822 for email so is very widely supported. Cdata is supported in rss too — it’s in the xml spec so would be a bit redundant for rss to explicitly support it. You’ll find it in description elements very very often. I never understood why a link would not be a guid but the difference seems clear enough - use guid if you need a guaranteed unique idea and use link if you need a url to the list.
These specs and the discussion about them at the time really are from a different era of the web. The Syndication Protocol fully embraced REST which was also white hot then. There was a real feeling that with a good format and a standardized way to consume and interact with the resources, it would allow for easier sharing of not just blog posts but other data as well.
As intense as the discussion was around the development of RFC-5023, it was basically ignored from the moment it was released and even the main spec author declared it basically dead not very long afterward:
There were more stuff thought of, as far as I recall, and I bookmarked a lot of them. But on Delicious. Somewhere in some backup there must be my archive.
Whenever I need to provide a feed, I always choose Atom because (a) the spec is better and (b) anything that can handle RSS will generally also handle Atom.
Does ActivityPub these days replace the traditional RSS/Atom feeds? Feels like it would be the natural successor. Is there anything missing besides people publishing and consuming?
ActivityPub, despite using JSON rather than XML, is a much more complex protocol with a lot more moving parts. You can technically write an RSS feed by hand, drop it on a virtual hosting provider via FTP, and you have a feed. On the server side, all you need is static file hosting, which is ubiquitous, available for free and amenable to things like CDNs. On the client side, all you need is a single device, which may be behind NAT, and is capable of occasional network connections to pull the updated feeds.
With ActivityPub, you're supposed to have an account at an instance and receive content through that instance. This means that the server needs to keep track of followers and automatically broadcast all new content to all of them, which requires a database, some kind of content-publishing API, and probably a job queue and Redis to boot. On the client side, you need a box that is online 24/7 and can be connected to, so that you can receive your content. You get much faster delivery, but at a much higher operational costs and with many more scalability issues. Hosting static content at scale is a solved problem, and you can reuse all that existing knowledge for RSS. Sending AP activities at scale is technically solvable, but the infrastructure just isn't there yet.
IMHO no. I thought about this as I run a feed-reader and it would be cool to support ActivityPub as a feed protocol.
The main issue is that subscribing shows up on follower lists. Maybe for individual users this is fine but as I ran a service I didn't want to do this. I ended up with a number of reasons why ActivityPub push wouldn't work well.
1. I didn't want to appear to be advertising the service with a generic account subscribed to many feeds.
2. I didn't want a generic account to provide access to "followers only" toots to unintended users. To properly allow access approval I would need to put some subscriber info into the account. It would also be important to make sure that this can't be used as a form of spam (for example if I allowed them to put whatever name/message they wanted).
3. I didn't want to reveal who was subscribed to what.
4. I didn't want to have dozens of different accounts subscribed to popular feeds.
5. If the user also wants to comment on a post themselves they will need a separate account.
You can still poll a user's outbox like any other feed, but now you are back to an equivalent to Atom/RSS with no WebSub support. (I mean you could use WebSub, it works for any URL but no one does, and why would you when you already have ActivityPub for push?). So it seems that the anonymity of the older feed formats can be useful in some scenarios.
So in the end if I was just going to poll as any other feed format, and most services that support ActivityPub also have other feeds there was really no point to doing this. Feature requests welcome if there is a use case that I missed.
> So it seems that the anonymity of the older feed formats can be useful in some scenarios.
Huge understatement; it's more than just "some". None of the blogs in my feed reader are written by people that I have a public (reified) follower/followee connection with over social media. Nor do I have that kind of relationship with the authors of the books I check out from the library, for that matter.
I'm in the same boat, and came to the same conclusion.
One more point I'd add:
6. Not every ActivityPub enabled service allows for federation. (Some of the more popular block all federation, unless you get allowlisted). So you're stuck polling the outbox regardless.
Simplicity. Atom/RSS can perfectly be a static file (it doesn't even need internet connectivity, an Atom file can be passed around on a usb key with 100% of functionality). ActivityPub requires a live server with computation. Think about the step it requires for every website and every account and every service to have an atom feed vs an activitypub actor.
I'm writing some ActivityPub stuff, and wanted to make sure to get RSS in, done right, etc. However i never use RSS so i have little insight to good/bad/irrelevant RSS impls.
Any wants or must-haves to an RSS implementation over a Fediverse instance?
My rough plan is to just include RSS in any places i expose JSON endpoints for data. Though i'm a bit undecided if there's any value in user activity stuff, like comments on an article, etc.
XMPP does use Atom as its (micro)blogging format (XEP-0277), and making followers/following lists (subscribers/subscribed in XMPP terms) public is opt-in with XEP-0465.
Note: I'm very involved in XMPP, and the author of the latter XEP.
Edit: forgot to mention that it's also available to ActivityPub thanks to the XMPP <=> AP gateway (that I've authored too)
Atom is an order of magnitude more complex and strict standard by people who really love xml in contrast to the really simple and less strict rss 2.0. For example almost everything is optional in rss 2.0 so you can have a reasonable feed for stuff like tweets or linkblogs where there is no obvious title. In contrast atom enforces a title for every item which makes this a messy expirience.
I have implemented rss 2.0 parser faster then understanding the atom specification. Atom can do encode stuff like encode html inline the xml instead of as a CDATA string. In theory this sounds great, but is ends up in a big mess of complexity (e.g. a blogpost with handwritten invalid html).
These days there is also JSONFeed which is really easy to parse, simple and flexible, but it is not supported everywhere yet.
> An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed; see examples)
Note "is allowed", not "is required". This caused SO MANY problems back in the day, because the spec didn't clarify if you should or should not include HTML in that element - and there was no way of telling, when parsing a feed, if the author was in the "entity-encoded HTML" or "YOLO and just stick plain text in there" camp.
IIRC, Atom came about precisely because the RSS specifications didn't provide the level of detail needed for a spec to be truly interoperable.
In practice, the description is universally considered to be html encoded. Everything is decoded. If you try to stick unencoded html in there it gets rendered as text. If you really don’t want to encode you can stick it in CDATA and it will just work per the xml spec. I’m trying to remember what the downside of this approach is - I think maybe it kept people from sticking unencoded ampersands in plain text or something.
But I think it’s worth noting that a cultural tradition emerged that papered over the flawed spec. I think that is actually pretty common with specs, even if the rss2 one is extra loose.
This is definitely a good example of "RSS culture."
(This particular one isn't papering over flaws in the spec, many of thse are advising against doing things that violate either RSS or XML spec, or are subjective opinions additive to the spec (e.g. always have a datetime). But ya this is basically what I mean.)
Atom is not a magnitude more complex or strict. It has _two_ places where it requires something even slightly onerous, and that's in the summary and content elements, where, shockingly, it allows you to specify if the content is XHTML, HTML, or text, and for HTML, it's just a matter of escaping the contents or putting it in CDATA. That's it.
I don't know what you're doing that RSS 2.0 is somehow faster to parse than Atom. I've written parsers for both over the past twenty years with a negligible difference between the two besides the fact that the RSS feeds often need hacks. I've also wrote a whole bunch of blog and linkblog backends that produce Atom feeds, and have never and issue with any. Let's look at the required elements of an entry: updated, title, id. Nothing remotely onerous there. In fact, it's purposely minimal, more minimal than RSS. And in RSS 2.0, title is a required element (because if something it's explicitly noted as optional in the RSS 2.0 spec, it's assumed to be optional).
In my personal linklog, I use the title of the target page of the link as the title, because it's the sensible option. With tweets, you have half a point. Only half a point, because title is required, but Twitter also post-dates the early 2000s considerably. But here's the thing: 'title' is required in RSS and Atom, but there's nothing saying it can't be empty. I know, I've blown your mind!
And then there's JSONFeed, which, of course, can somehow gracefully cope with people dropping '"' in random parts of the file because people generate JSON files like that by hand, right?
> I have implemented rss 2.0 parser faster then understanding the atom specification. Atom can do encode stuff like encode html inline the xml instead of as a CDATA string. In theory this sounds great, but is ends up in a big mess of complexity (e.g. a blogpost with handwritten invalid html).
The same thing can also happen in RSS feeds (and JSON Feeds): Entity-encoded HTML strings or CDATA HTML strings do not have any guarantee of well-formed-ness. The direct embedding of XHTML into Atom as namespaced elements just surfaces potential invalid markup higher up.
I was talking about the (X)HTML in that RSS feed and its well-formed-ness.
In a perfect world people would construct their XML documents with an API which guarantees that the generated serialisation is a well-formed XML document. E.g. the API guarantees that the element tree is nested, that namespaces are declared and that the serialiser escapes any text nodes. Then people could add their well-formed XHTML fragments as a child to <atom:content type="xhtml"> and then serialise the whole document, guaranteeing well-formed-ness across namespaces.
In practice people have a tagsoup string from their data store which they concatenate inside their RSS template in <description>. If you’re lucky, they replace "<" and "&" beforehand or do the CDATA thing. But in XML terms that is just a string, not well-formed markup.
Interesting, thank you. Every podcast RSS feed (a tiny subset of RSS feeds) I've seen in the wild is well-formed in the strict XML sense, so the tagsoup problem must be more endemic on the text syndication side.
I can imagine that that is potentially a result of Apple’s dominant podcast directory. Podcasters submit their feeds to Apple’s Podcast Connect, which I think flags warnings and errors. Other forms of feed don’t have that big motivation to validate.
The TLDR is RSS is messy shit while Atom is well specified and works for anything you need outside of podcasts.
As long as you don’t need to consume feeds, just use atom.
The less TL is that RSS1 and RSS2 are basically two different branches of the original:
- Netscape released RSS 0.90 as an RDF application (RSS literally stood for RDF Site Summary)
- RSS 1.0 was an update / direct evolution of RSS 0.90 by a dedicated working group using final RDF 1.0 semantics (as RSS 0.90 had been based on an earlier working draft)
- RSS 1.1 an evolution of RSS 1.0 by unrelated people
This is called the RDF branch, for obvious reasons.
However a few months after RSS 0.90 Netscape also released RSS 0.91, which dropped RDF entirely, rebranded to “Really Simple Syndication”, and added some elements from Userland’s own syndication format.
This is the start of the “Harvard” (formerly “Userland”) branch, Userland / Dave Winer released his own variant of 0.91 (timeline with netscape has never been super clear to me), then went on to release 0.92 with an <enclosure> element, followed by 0.93 and 0.94. He then released RSS 2.0 to mark a bit of a compatibility break, as RSS 2.0 adds namespace support and removes some elements from his 0.9, and also to fuck with the RSS WG’s 1.0 release.
Because the Harvard branch was the first to support enclosures (embedding audio) and Userland had built support for that, it became the de-facto format for podcast feeds, Atom also supports enclosures but I’m not sure any podcast client (or podcasting source) supports them.
While RSS is, as you say, quite messy, Atom has also brought lots of headaches to this poor feed aggregator developer over the years. Its specification is a lot tighter than RSS', but there is still enough wiggle room for feed generators to get creative and make you want to strangle somebody.
It's been a long time since I spent my time on that, so I do not remember anything specific. But it was difficult figuring out reliably if a post in a given feed matched an already existing (locally cached) post, whether it was an updated version of that post, or whether if it was a completely new, hitherto unseen post, so it had something to do with the post IDs, timestamps and URLs.
I believe iTunes/Podcasts since its beginning also supported Atom podcasts until they deprecated it some years ago. But I believe they were some of the very few.
If you want to expand even more, there are even more options:
• RSS 1.0 – the RDF/XML serialisation.
• JSON Feed
• h-feed/hAtom – embedding the “feed“ as Microformats markup inside the HTML.
• schema.org/Blog – embedding the “feed“ inside the HTML, either with RDFa or with JSON-LD or with Microdata.
• If you want to annoy the most people at once, there is a great solution: The data model from RSS 1.0 is of course RDF-based. The modern serialisation of RDF is JSON-LD – simply use the RSS 1.0 vocabulary in a “JSON-LD-Feed”.
My strong impression at the time was that the primary impetus for establishing Atom was that Dave Winer was abrasive and opinionated, and some people just didn't like him very much.
"If and when it reaches closure, I will recommend to UserLand that they support the format, both for input and output, and I will help to the extent I can to write drivers for the software. I will continue to answer questions for the people working on Echo, and offer my opinion when it appears to be welcome, but will step back when it is not." — http://backend.userland.com/2003/06/26
And, true to his word, Dave added support for Atom in UserLand products in 2007.
It did seem extremely petty, yeah -- on both sides tbh. As an outside observer who just wanted to be able to build a feed and have it work, it was really disheartening to watch.
I and another old mutual friend of Dave's were both on his pre-RSS DaveNet mailing list for a long time past when it stopped being interesting, and we would always joke about how we were both afraid to ask him to remove us from DaveNet, because whenever somebody else would ask to be removed, it would hurt Dave's feelings, and he'd talk smack about them behind their back after begrudgingly removing them, so we suspected we weren't the only ones afraid to unsubscribe. At least RSS made it possible to unsubscribe from his feed without hurting his feelings. (My friend's mom was his high school teacher in New York, and his wife was Dave's first kiss in high school, so they had some hilarious stories!)
In case anyone is looking for a free rss reader for mobile here's the one I built for myself after Feedly became unbearably slow https://www.justfeed.io/
I think the inspiration was that an atom is a foundational building block of matter and a lot of things could be built with a well-specified universal feed format and publishing format.
The time spent debating names, disqualifying names, requalifying names and voting on names was like the battle of the newscasters on Anchorman. Pure carnage.
In what format do you think the website you're reading is written in? Sure html is not 100% super duper pure xml, but for all intents and purposes, it's xml.
The person who pushed RSS to success (IMO) Dave Winer was superb at communicating and evangelizing his goals, connecting partners like Netscape and NYT, and documenting his work including the RSS related tools he built.
His spec was “worse” in the sense that it was under specified but better in the sense that it achieved wide support (both in text and podcast form) among people who made content. This is partly because Dave had first an influential email newsletter and Wired column (DaveNet) and second an influential very early blog Scripting News. He had also been working with news companies for years at prior startups. He could write well. He showed up for and arranged meetings with people who did not at first understand the need for something like rss. He was clear and relentless in his promotion which was borne out of what seemed to be a genuine desire for open standards in this area rather than greed / trying to do lock in.
People with technical backgrounds in places like this tend to fixate on the technical aspects of Atom vs RSS. There is no question Atom is more technically correct. There is also no question (IMO) it came too late and focused on the wrong things — being correct and complete at the expense of being complicated and hard to understand — and more importantly was led and promoted by people who lacked the social skills to make it popular outside of technical circles. (These folks could be brutal about rss's flaws without seeming to have awareness of this shortcoming in their own effort.)