Hacker News new | past | comments | ask | show | jobs | submit login
RSS Feed Best Practices (kevincox.ca)
220 points by nalgeon on May 7, 2022 | hide | past | favorite | 67 comments



Re: Discovery

A pet peeve of mine, is where some enterprising web developer has either built a theme, or hacked an existing one and removed or not added the <link> tag for the RSS feed, when the site engine in question DOES have a working RSS feed.

You see this a lot on customised WordPress sites. I end up having to try all variants of feed URLs I can think of until I find the feed. Surprisingly, most non-bespoke site engines still have working feeds, so the hit rate is quite high.

I also miss the days of browsers auto-extracting the <link> tag and showing an RSS icon when a feed is found.


I'm disappointed browsers removed that functionality, but extensions solve the problem. I've been using Feedbro, which is also a reader, but there are multiple options.


Vivaldi does the auto-extracting, and displays the RSS icon. You can click on it to subscribe. Vivaldi’s built-in feed reader is a bit clunky, but it works.


Yay for another vivaldi user!


It’s full of weird bugs, and not too snappy, but it’s the only highly configurable Chrome-based browser.


If it’s Wordpress based, the feed is usually found by adding /feed and if it is a Tumblr website, by appending /rss


Possibilities include:

  .../rss , .../rss.xml , .../.rss , .../rss_full.xml , .../feed , .../rss-feed , .../feed/all/ , .../MySection.xml , .../MySection.atom , feedserver.example.com/section/index ...


I've yet to see a Tumblr site that didn't have working autodiscovery. But maybe I just haven't seen a custom-enough theme yet.


I work more with podcast feeds vs article feeds, but the most important lesson I learned when it comes to RSS is that what readers actually do has is not necessarily what you would think, or sometimes even not what the spec says. It's the wild west, even with major readers, and you have to do the work and test in different players if you're producing your own RSS.

We have a huge internal document of how RSS elements map to UI in every player and all the gotchas, and we still discover new things years after initial development.


The problem is that there are huge swaths of use cases that are not covered by the current rss standard. For example, want to use standard RSS for your video feed? Ooops! RSS does not support thumbnails, duration and a bunch of other features.

So everyone goes ahead and makes their dialect of RSS and pushes that. Apple did this with iTunes and so did YouTube.

What clearly needs to happen is that RSS needs to have a refresh. The common uses cases need to be integrated into the standard.


“Dialect” implies that this is ad hoc, but the RSS 2.0 standard specifies a way to extend the format and that is through namespaces, which is exactly what Apple does with the iTunes namespace. This is a good way to extend the format IMO: no breaking changes and the opportunity to grow.


In practice the incompatibility comes less from new invented fields (that happens rarely and when it does it has a namespace attached to it most of the time), and more how certain fields are interpreted, which fields are used, etc. Something as simple as showing a short summary of an item varies significantly by player.


With podcasts at least there's a solid effort behind this with the podcast namespace (https://github.com/Podcastindex-org/podcast-namespace) - unfortunately there's very little uptake on the player side.


>We have a huge internal document of how RSS elements map to UI in every player

From another comment:

>The problem is that there are huge swaths of use cases that are not covered by the current rss standard.

Have you considered turning that document into a RFC to establish a standard?


The problem is there's not really even a defacto standard. At least as far as podcasts are concerned, some players will use certain fields in completely different ways, ignore other fields, include their own vendor specific fields (with some other players adopting those vendor specific fields).


It would be nice if you’d made those findings publicly available.


Yeah, that's a really good idea and made me put on my list to make that public (there's a couple of issues with it being public in the state it is in now)


This sounds like a lot of fun (to me!), where do you work, if you don't mind me asking, and are you hiring by any chance?


Always! Reach out at ryan at supercast dot com



This list should include an exhortation to add an xml-stylesheet processing instruction to your feed, so people who have no idea what RSS is aren't dumped into a blob of raw XML before backing out and concluding that they're not supposed to be there. There's an already-baked XSL template available on aboutfeeds.com that you can pretty trivially copy onto your Web server and link to from your feed. We're better off if more people are aware of RSS, and having a simple way to teach them is useful.


That sounds interesting. Do you have an example of this in the wild?


I offer my site as another example: https://chrismorgan.info/blog/tags/make/feed.xml (a subset chosen, just because the feed with everything is fairly large and loads stuff from a few origins). Stylesheet at https://chrismorgan.info/atom.xsl.

This is considerably more correct, robust and featureful than any others I’ve seen, just because I could (I don’t even use most of the stuff I’ve implemented for it, though some like enclosures I have draft content using). It handles Atom’s text constructs (where content can be provided in XHTML, HTML or text format), and the common attributes xml:lang and xml:base, none of which I’ve seen any other stylesheet handle correctly.

I also produced an RSS variant, since you basically have to use the inferior RSS for podcasts. That’s not in use on my site at this time, so I’ve dropped a copy at https://temp.chrismorgan.info/2022-05-10-rss.xsl for now. But please, use Atom. RSS is a lousy format that causes some genuine problems and should have been completely retired fifteen years ago, and the only thing that actually needs it still is podcasts.


Here's the RSS for the blog of the aboutfeeds.com maintainer (genmon here on HN—including here in the comments to this submission): <https://interconnected.org/home/feed>

If you wanted an informal list of people making use of this pretty-feed.xsl specifically, it would be possible to solicit that sort of thing through GitHub and see if others come out of the woodwork to identify themselves.


Thank you, this is helpful. :)


FeedBurner is well know for doing this. For example https://feeds.feedburner.com/GoogleAppsUpdates


I know it used to be that, but it looks like an ordinary HTML web page now.


Hmm, I swear that I viewed source and saw XML but you are right. It is definitely just HTML now.


Another best practice would be for a page to prioritize the order of multiple feeds: eg many Wordpress have the comments feed listed first and feed readers assume that that is the page feed.


I’ve fallen for this so many times. I like an article and decide to subscribe to the rss feed only to have a bunch of comments show up in my reader. If I have to view source to find the right url, I’m probably not going to subscribe.


Thanks, I'll add a note on the page. I don't know how consistent ordering is but it seems like a good idea anyways.


That blog post is packed with useful info.

See also https://discovery.thirdplace.no/?q=kevincox.ca for feed discovery. Disclaimer: I built it.


RSS Feed Best Practice #1: make sure your service/site/resource has a feed.

Am I missing something or is this site missing one?


The site definitely has a feed. Although I don't have a visible link to it. I am sort of intentionally ignoring my recommendation to provide a link and RSS icon as an experiment to see how well feed discovery is supported and used. I wouldn't recommend this but I've been going back and forth with this on my personal blog as it isn't very important if the maximum number of people can find the feed.

If you just paste the page into your feed reader it should find the feed for you.


By the way, Kevin: it may seem overly picky, but would it not be cleaner to have

   <link href=/feed.atom rel=alternate title="Blog Posts" type=application/atom+xml>
instead of

  <link href=../../../../feed.atom rel=alternate title="Blog Posts" type=application/atom+xml>

?


It could, but my site is available via IPFS so I use relative links everywhere in case a gateway is used. For example https://ipfs.io/ipns/kevincox.ca/2022/05/06/rss-feed-best-pr.... That being said with subdomain gateways being widely available now it is pretty safe to go back to absolute links as https://kevincox-ca.ipns.dweb.link/2022/05/06/rss-feed-best-... handles absolute references just fine.

And of course none of this matters much if you have a non-IPNS reference because the feed will never change making it mostly useless.


NetNewsWire seems to have a problem discovering the feed from index pages (e.g. https://kevincox.ca/2022/05/).



Vivaldi discovers the feed.


> is this site missing one

Which site? For the submitted one, you have to search for 'atom' in the source markup; for HN, for 'rss'.

--

kevincox.ca

  <link href=../../../../feed.atom rel=alternate title="Blog Posts" type=application/atom+xml>
HN

  <link rel="alternate" type="application/rss+xml" title="RSS" href="rss">


This is indeed the first rule. And offer useful feeds too. If you’re a newspaper, hardly anyone will be interested if you offer just one “firehose” feed of everything you publish.

A nice example on how to do it is The Guardian. Basically any category page you can visit is also a feed.


> Other common formats are earlier RSS standards and JSON Feed or Microformats h-feed. I would avoid using these—or even less common formats—as they are less widely supported.

Agree that RSS is imperative; don’t agree about excluding JSON feeds. One can do both.


One can, but I've found little value to this. What are you giving your readers with JSON feeds that the RSS or Atom feed isn't giving them?


Some prefer it. It offers another two or three fields that RSS doesn't.

In any event, it requires virtually zero extra effort after initial setup, so I see no reason not to offer both. Different strokes...


I wish creators offered trending RSS feeds with a new title & guid every time the views or comments/interactions double above the norm.

    RSS Feed Best Practices
    10 | RSS Feed Best Practices ~~~
    37 | RSS Feed Best Practices ~~~~
Many already have the comment count on hand and it's not really respam in your combined feed as you get fresh data at the old position. The ~ just lets you search for hottest things when you are pressed for time as most search boxes require at least 3 chars.


This sounds like it would be better as a reader feature than a feed feature. However I am not aware of any extensions for providing this information to readers.

A simple proposal would be something like <comments count="23" last="TIMESTAMP"> and <score>72</score>. Then the reader could re-surface these items to you if they pass particular thresholds.

Of course there are some details such as ensuring that entries with recently updated comment counts appear back on the first page (or that old pages are scanned for updated entries).

Also I think WebSub doesn't work here because it excludes entries that have been seen before (which won't be an issue for your original proposal).


I was hoping for an article with best practices for readers of rss feeds. Every attempt to use rss in my past has resulted in a useless feed of way too many unread, stale items.

The goal of rss is for me that everything I am interested in should come to me. This breaks down the moment a high frequency feed (such as news) is in the mix, which is why I don't use it (outside podcasts).


I either don't sub to high frequency feeds, or have them not appear in the general uread feed, so I can go look at them when I want, but they don't swamp everything else.


Both RSS2 and Atom has official validators. I also have a linter here: https://roastidio.us/lint That check the metadata consistency between the blog and the feed.


RSS feed best practices? While it's nice some may be giving more serious thought to blogging/revisiting blogging - - what year is it? There aren't 500 posts like this from 2008?


Is it typical to have non-blog site updates distributed as a feed? I have a blog page and a link aggregator page and I provide a feed for when the link aggregation page has updated. I just can’t tell if people ever use it like that?


Definitely! It doesn't hurt much to have more feeds. Sure, you don't want to advertise 100 feeds on each page but using it smartly can be very helpful for users. Some of the non-blog feeds I am subscribed to:

- Videos. (Basically the same use case as blogs though)

- Releases of various projects. (GitHub for example supports this)

- FeedBurner has a feed for problems detected in your feeds. (Although I don't use FeedBurner anymore)

- Hacker News posts on the front page with >400 points.

- A handful of Reddit searches.

- My Reddit inbox.

- A feed of WebMentions to my blog.

- A feed of packages that I maintain for nixpkgs that are out-of-date.

Really anything that someone may want to be notified about can be a useful feed. For example I can imagine a feed of price changes for a product so people can wait for it to go on sale.

The search feeds can be expensive for the site if they aren't careful. But between "materialized" and caching it can be made pretty cheap. If you are careful you can even make these work with WebSub. (If you run your own hub you can know what the subscriptions are and either run them periodically or actually check the queries against new items in real-time)


I use RSS to follow software releases. This is handy for example for packages hosted on github.


On a similar note Repology is very useful for package maintainers. It has feeds for out-of-date packages that you maintain: https://repology.org/maintainer/kevincox%40kevincox.ca


put your feed at /feed/

As every wp site (that didn't switch it off) has a feed there it is more reliable than favicon.ico you will also find lots of feeds there not linked any place. I think many authors are not even aware they have it.

I like it as you don't have to visit the domain, you can simply point a feed reader at a list of (slightly modified) domain names and discover many precious shiny things.


I wish it covered how to properly do images. All the SO posts on the topic suggest embedded HTML


I develop Unread (https://www.goldenhillsoftware.com/unread/).

I have some complex logic to find a hero image that is sometimes in a feed entry outside the HTML, and prepend it to the top of the article -- but only if there is not another reference to that image in the article. It is complicated because sometimes there will be hero-image.jpeg in the article and hero-image-1200x500.jpeg outside the article, and I have to judge whether the images are the same.

I am in favor of embedding the image into the HTML, and not adding it elsewhere.


I'm curious to see the point of view of another feed reader developer. Does all of this advise make sense to you and work well in your app? Is there anything that seems backwards or anything that I missed?


I mostly nodded as I was reading your advice. I particularly appreciate that you recommend including full article content in feeds, not changing entry IDs, and adding discovery `link` elements to HTML pages.


I don't think I have enough experience to write that advice. But from what I have seen embedded HTML is the most reliable.

You can also use enclosures but not all readers will display these. Notably WordPress puts every embedded image into an enclosure so a lot of places just ignore them to avoid duplicates.


It’s just crazy to me that RSS doesn’t have a property specifically for an image URL. Like, what’s the problem with that? Does ATOM have this support?


The question is what do you mean by "image URL". Is this a poster, hero image, site logo, the content of the post itself? This is what is lacking.

In RSS the main media attachment is the <enclosure> element. However like a lot of things in RSS it is poorly specified. For example it isn't even clear if you can have multiple. If you have multiple are they different representations of the same resource or different resources?

https://validator.w3.org/feed/docs/rss2.html#ltenclosuregtSu...

The solution to this is the Media RSS spec. (Which is also commonly used in Atom. XML namespaces are actually kinda nice.) This is a lot better if only because they contain the <group> element which makes it clear what different things mean. (Different representations or different resources.) Unfortunately support is still varied and it isn't always clear if this is just a copy of every image in the post (WordPress...), only semantically relevant images, or images that are the point of the entry but don't appear in the post.

https://www.rssboard.org/media-rss

So unfortunately the best path forward is probably yet another standard that makes these clear from the outset, and hoping that people actually follow the rules. Not an easy path forward.

Again, I'm not an expert on media in feeds, but based on the feeds I have seen trying to write a feed reader I would do the following if I was creating a feed with media in it.

1. Use Media RSS. Don't use <enclosure> at all unless you are a podcast.

2. Only reference interesting images directly. You are welcome to include whatever you want in the HTML for stylistic reasons but don't put reference it outside of the HTML unless it is actually an interesting image.

3. Put every media element in a group. Basically because WordPress puts everything into a media element and it is unclear if these are the same images, duplicates or even interesting at all (I don't need the author's avatar listed in the media elements WordPress...). Even if you only have one representation put it in a group just because it is unambiguous what you mean.

If anyone has more experience here I would love to hear what you think.


Presumably the parent commenter means some way to specific some additional per-item pizazz in the vein of Twitter cards.

Nothing stopping a reader from fetching the linked resource and pulling useful stuff out of the existing Open Graph annotations, of course...


I love RSS, I've built News aggregators which consumed RSS feeds in the past i.e. before Social Media took over as News aggregators and News outlets stopped caring about their RSS feeds(Not sticking to standards, garbage feeds etc.).

I'd like to get into News aggregation with RSS again now that I think there are some significant population who think Social Media may not be the best way to consume news. So I made a cursory exploration of the status of RSS feed support for major news outlets.

To my surprise, Many do still support RSS incl. those with paywalls. It's surprising because AFAIK Google doesn't mandate RSS to index their news, Apps like Feedly which started out as pure RSS aggregator went onto HTML parsing of major news websites by the time I abandoned my RSS feed aggregation apps.

Are the news outlets still supporting RSS because?

1. Legacy systems which depend upon RSS.

2. There have been a rise in demand for RSS feeds.

3. It's the right thing to do, for journalism.

4. None of the above.


Where's your feed icon then? ;)


Oh, feed icons. That is a good one to add!


I want to follow your website's feed (at https://kevincox.ca/feed.atom). Good article!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: