
RSS Box – RSS for websites that do not support RSS - mrzool
https://rssbox.herokuapp.com/
======
latexr
> FYI: If you enable JavaScript then you be able to access additional options
> in the dropdown menus. The website should still be somewhat usable, but
> recent versions of Firefox will try to download the RSS feeds.

This is how you communicate with people with JavaScript disabled! Kudos. Most
sites either present you with a blank page with no information, a blank page
asking to enable JavaScript (even when the content is just text), or silently
break some features.

That kind of respectful messaging alone is making me want to take a closer
look. Though it should be “then you will be able”.

> Important: Please do not overload this service. Do not make more requests
> than you need.

What if we don’t have control over the frequency of requests (e.g. using a
service like Feedly)? Do those happen often enough that we’d need to host the
app ourselves?

~~~
oefrha
Feed aggregation services tend to minimize the frequency of requests
especially for unpopular feeds since more requests = higher load for them as
well. The incentives on the publishing and consuming sides align. Many
services only offer to increase crawling frequencies for premium users, and
even then only for a limited number of feeds. Not to mention they only need to
crawl once for however many subscribers.

It’s really people who don’t use aggregation services and set their clients to
update very frequently (say every minute) that pose a problem.

~~~
hinkley
An aggregator most likely would prioritize feeds by popularity and then run
most of their crawlers 24/7\. Telling some users that a popular feed has new
entries and not telling the free tier sounds complicated. Unless the free tier
is getting a read replica of the database, in which case it’s throttled only
by how far behind the read replicas get.

Prioritizing lower subscribership feeds if a smaller number of premium
customers add them makes some sense though.

~~~
oefrha
Of course popular feeds are crawled constantly, and no, there’s no “telling
some users that a popular feed has new entries and not telling the free tier”.
But you bet services don’t want to crawl those one-subscriber feeds (e.g. some
websites have personal feeds for paid subscribers) constantly for free. And
RSS Box feeds apparently tend to fall in the latter category.

------
pedro1976
I tried to tackle this issue in a more general way, so i wrote rss proxy [0],
that analyzes the dom structure and derives feed candidates from it. Feel free
to try the demo [1]

[0] [https://github.com/damoeb/rss-proxy/](https://github.com/damoeb/rss-
proxy/) [1] [https://rssproxy.migor.org/](https://rssproxy.migor.org/)

~~~
endlessvoid94
This is amazing. I've been looking for this exact solution. Thanks for your
work!

------
s0l1dsnak3123
You can actually get RSS feeds for YouTube: emacs users like myself who
consume YouTube via elfeed have been doing it like so:

[https://www.youtube.com/feeds/videos.xml?playlist_id=<THE_ID...](https://www.youtube.com/feeds/videos.xml?playlist_id=<THE_ID>)

Or

[https://www.youtube.com/feeds/videos.xml?channel_id=<THE_ID>](https://www.youtube.com/feeds/videos.xml?channel_id=<THE_ID>)

More info here: [https://joshrollinswrites.com/help-desk-head-
desk/20200611/](https://joshrollinswrites.com/help-desk-head-desk/20200611/)

It really helps to break away from the addictive properties of YouTube's "Up
Next" algorithm

~~~
bscphil
Wow... I wrote my own WebSub receiver and put it on an always-on server to get
around the fact that I assumed YouTube doesn't have RSS from the fact that I
couldn't find this information anywhere.

~~~
hombre_fatal
It's crazy how rare it's become to see an RSS link on websites that actually
have it if you add /feed or /feed.xml to the url.

I have to wonder how much more dead RSS would be if Wordpress (like, 90% of
blogs/news sites) didn't create a /feed by default.

~~~
vezycash
Check this out: Puts an RSS/Atom subscribe button back in URL bar.
[https://addons.mozilla.org/en-US/firefox/addon/awesome-
rss/](https://addons.mozilla.org/en-US/firefox/addon/awesome-rss/)

Helps discover rss feeds for sites.

~~~
hombre_fatal
That is great, thanks. Got back into RSS lately and doing the ol "look for RSS
link in footer and then guess at /feed" rigmarole every time gets old fast.

The screenshot of the RSS icon in the url bar brings back a memory I don't
know if I had. Didn't Firefox used to do this by default or am I
misremembering?

------
bdz
Does anyone remember Yahoo Pipes? That was awesome, I mostly used it (around
2008-2010) to make RSS for sites where it wasn't available
[https://en.wikipedia.org/wiki/Yahoo!_Pipes](https://en.wikipedia.org/wiki/Yahoo!_Pipes)

~~~
diablo1
If you dig around, you can find some alternatives to YP

[https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...](https://hn.algolia.com/?dateRange=all&page=0&prefix=false&query=yahoo%20pipes%20&sort=byPopularity&type=story)

~~~
onli
I run one of those, [https://pipes.digital](https://pipes.digital). It
supports scraping of sites that don't have RSS, creating a feed via xpath or
css selectors. From the sites this cool project supports it has integrated
support for Youtube and Twitter. But I think it would be a great extension to
embed rssbox and offer blocks for all those services.

------
qznc
Off topic but since we are talking about RSS feeds: Is there a web service to
replay feeds?

Use case are old blog archive one wants to (re-) read sequentially from the
start but not in binge mode. So maybe one post per day or week. I'm thinking
about the old posts of Aaron Swartz or Steve Yegge.

Just adding the feed to a feed reader is often not sufficient because the feed
only contains the last 20 entries or so.

~~~
identity0
Well if the feed doesn’t contain older entries, it’s impossible without
something like the Wayback machine (and I doubt they store rss feeds)

~~~
qznc
Some feeds are just paginated. Following "next" links would work. Feeds
readers don't do this though as they assume people are usually interested in
the new posts.

------
heavyset_go
There's an open source service that does something similar, does anyone
remember its name?

 _edit_ : Both rssbox in the OP, and RSS-bridge[1] are open source. I was
thinking of the latter. There's also RSSHub[2].

[1] [https://github.com/RSS-Bridge/rss-bridge](https://github.com/RSS-
Bridge/rss-bridge)

[2] [https://github.com/DIYgod/RSSHub](https://github.com/DIYgod/RSSHub)

~~~
hombre_fatal
There's
[https://easylist.to/easylist/easylist.txt](https://easylist.to/easylist/easylist.txt)
for universal content-blocking rules, youtube-dl for universal video
extraction methods.

While building a feed reader of my own, I had a recent idea for a project for
universal content crawling rules: how is the content hierarchy organized on
each site and how do you extract it from each content page. A single community
project that any other project could use to crawl websites for their content.

Looks like rss-bridge comes close to that.

~~~
k1m
To help extract article content, you might be interested in this collection I
help maintain: [https://github.com/fivefilters/ftr-site-
config/](https://github.com/fivefilters/ftr-site-config/)

It's used, in addition to an automatic article extractor, in Full-Text RSS:
[http://ftr.fivefilters.org](http://ftr.fivefilters.org)

------
gravitas
The service [https://feed43.com](https://feed43.com) will enable you you to
build a RSS feed out of pretty much anything with a URL. I use it to build RSS
out of sha256 release files, vendor client download release pages, changelogs,
etc.

~~~
harias
Similar service: [https://politepol.com/en/](https://politepol.com/en/)

~~~
gravitas
I would say this is a simpler service overall and not an even competitor, it
appears to be using HTML elements as keys and creating entries based on that
but stops there. I would not consider this on par with Feed43 based on the few
samples I tried, it lacks in depth parsing required to handle the expected
result formatting.

------
derefr
Anyone ever did something like this for Facebook?

I know it'd have to be subjective per user (security ACLs ⇒ different accounts
seeing differing subsets of other accounts' posts); but I'd be fine with just
getting my own account's subjective view, by logging into such a service using
Facebook OAuth (or, if that isn't enough, then I'd be fine with handing over
my Facebook creds themselves, ala XAuth, provided the service is a FOSS one
I'm running a copy of myself in e.g. an ownCloud instance.)

I also know that it'd likely require heavyweight scraping using e.g.
Puppeteer, to fool Facebook into thinking it's real traffic. But that's not
really _that_ much of an impediment, as long as you don't need to scale it to
more than a dozen-or-so scrapes per second. (Which you'd automatically be safe
from if it was a host-it-yourself solution, since there'd only be one
concurrent user of your instance.)

Anyone done this?

~~~
stefansundin
RSSBox used to have Facebook support (but only for public pages, no personal
content), but when Facebook started cordoning off their API two years ago, I
had to turn it off since I was unable to get my application approved. The code
is still there, but I am doubtful it would work even if you manage to get an
API key that works. I think the best option may be to scrape the web content
now, unfortunately.

~~~
trog
I have assumed for a while the only way to convert FB -> RSS would be to
scrape the home page, but from what I recall the HTML & DOM is all kinds of
messed up - intentionally obfuscated to prevent adblocking. From a quick look
just now it does seem like it would be a nightmare to try to parse it as-is -
and I would guess FB changes a lot of the output regularly anyway to defeat
adblockers, making efforts to keep up pretty challenging.

~~~
derefr
It almost sounds like a problem best solved with OCR, rather than scraping per
se. Build a simple model to recognize “posts” from screenshots, and output the
rectangular viewport regions of their inner content; then build some GIS-like
layered 2D interval tree of all the DOM regions, such that you could ask
Puppeteer et al to filter for every DOM node with visibility overlap with that
viewport region; extract every single Unicode grapheme-cluster within those
nodes separately, annotated with its viewport XY position; and finally, use
the same kind of model that lets PDF readers you highlight “text” (i.e.
arbitrary bags of absolute-positioned graphemes) in PDFs, to “un-render” the
DOM nodes’ bag of positioned graphemes back into a stream of
space/line/paragraph-segmented text.

------
goblin89
A somewhat unconventional UI for following content, which incidentally works
with RSS Box, is Fraidycat[0]. It groups recent posts under “individuals” with
a visualization of how much recent activity there is in a given feed, and
allows to choose “follow intensity” which works in a nice and transparent way.

[0]
[https://news.ycombinator.com/item?id=22545878](https://news.ycombinator.com/item?id=22545878)

~~~
kickscondor
Hey thanks for this lovely pitch, goblin89. <3 all my goblin friends out there

~~~
hombre_fatal
I learned of Fraidycat in an RSS related HN comment yesterday and have been
trying it out. Love some of the homemade quirkiness that I forgot software
used to have -- the video is great too.

Only question I have: do you really have to assign every Github issue to
yourself, the sole developer? Something about it cracks me up:
[https://github.com/kickscondor/fraidycat/issues](https://github.com/kickscondor/fraidycat/issues)

> "Thanks for the bug report. Fortunately for you, our best man is on the
> job!"

> _kickscondor has assigned the issue to kickscondor_

Anyways, just playing. Great product and great shepherding of the Github
project.

~~~
kickscondor
Oh I feel such a sense of progress just assigning bugs to myself. When I get
around to writing a blog post about it, I certainly hope you will be there to
upvote it, hombre f.

------
carapace
Related (maybe) but tangential, does anyone know of a good web to text
converter? Back in the day you used to just use Lynx, is that still the way or
has it been surpassed?

~~~
input_sh
I personally use postlight/mercury-parser[0] to convert articles to Markdown
files, a small script to add extracted metadata (like author, featured image,
original link, date it was scraped) to the top of a Markdown file, and put
those Markdown files within Hugo for a DYI Pocket alternative.

You can use the --format flag to pick between Markdown/text/HTML output, so it
should serve your purpose.

If anyone here has looked for a reader view on Chrome, odds are you've
probably stumbled upon Mercury Reader[1]. This is what powers it.

[0] [https://github.com/postlight/mercury-
parser/](https://github.com/postlight/mercury-parser/)

[1] [https://chrome.google.com/webstore/detail/mercury-
reader/okn...](https://chrome.google.com/webstore/detail/mercury-
reader/oknpjjbmpnndlpmnhmekjpocelpnlfdi)

------
maple3142
The are also RSSHub, which supports more site, but many of them are Chinese
websites.

[https://docs.rsshub.app/en/](https://docs.rsshub.app/en/)

------
orblivion
Soundcloud actually does have RSS feeds. I'm not sure how they're exposed to
users (I forgot how I got this URL) but they exist:

[https://feeds.soundcloud.com/users/soundcloud:users:16977412...](https://feeds.soundcloud.com/users/soundcloud:users:169774121/sounds.rss)

------
diablo1
I worry about this being swarmed by traffic and hugged to death. Since it's
popular on HN, I imagine the particular Heroku instance is overwhelmed. I was
surprised that it worked when I used it. I guess I'm gonna have to pony up and
donate then...

~~~
stefansundin
You are correct in that it is somewhat starved of resources. The free Heroku
instance that I host is running on the free Heroku dyno (512 MB RAM). I do not
have a good caching solution currently, which is why Twitter and Instagram are
almost always returning errors now. I suspect a single person is responsible
for most of the issues (see GitHub issue #38). It's actually amazing how well
it runs considering how much traffic is thrown at it.

At some point I hope to get enough time to implement a caching solution, which
should hopefully resolve most of these issues.

------
sebsauvage
In the same vein : RSS-Bridge

[https://github.com/RSS-Bridge/rss-bridge](https://github.com/RSS-Bridge/rss-
bridge)

(you can find multiple instances on the web)

~~~
Havoc
Nice one. As much as I approve of services I'd rather self-host. Tired of
people pulling the rug out under my feet

~~~
latexr
You can self-host this one as well. They mention it on the page and link to
the GitHub repo:
[https://github.com/stefansundin/rssbox](https://github.com/stefansundin/rssbox)

------
WCityMike
I don't know if anyone will particularly care, but both Substack and MailChimp
newsletters have RSS feeds, in case you prefer those over mail. With Substack,
you merely append "feed/" to the end.

With Mailchimp, well, you look for a "view in browser" or "share this issue
with friends" link in the newsletter. On the archive page it takes you to, an
RSS link is on the righthand corner.

------
k1m
RSS Box is great - perfect for the kind of sites where scraping from the HTML
is problematic because the HTML changes so much.

I work on a somewhat similar project called Feed Creator which can be used for
less popular pages where you can select elements for the feed using CSS
selectors:
[https://createfeed.fivefilters.org](https://createfeed.fivefilters.org)

------
dewey
I wrote a similar tool (but not as polished) that requires you to write custom
plugins. This works well if you have websites that are hard to scrape in an
automated way. Maybe it's useful to someone else:
[https://github.com/dewey/feedbridge](https://github.com/dewey/feedbridge)

~~~
johnx123-up
Is it necessary to write a plugin for each website?

~~~
dewey
You could in theory combine some, but it was just a very specific use case I
built it for. Just a fun project, and on Github in case anyone else has a
similar niche problem.

------
pmoriarty
My own problem with services like this is that I don't want to tell their
owners what blogs I read. It's a privacy concern.

I'd feel much more comfortable using a standalone tool that I could run on my
own laptop (ideally one that didn't require running a web server or even a web
browser).

~~~
cascader
I wanted to check out the service in the OP’s post, and realized I didn’t have
an iOS based RSS reader app on my tablet. In line with your privacy concerns,
I wanted an app that didn’t require creating an account. I couldn’t find one
in the first several I loaded. Any tips?

~~~
hombre_fatal
None of the first few apps that show up for me in the iOS store need accounts.
I've used Reeder in the past. But now I use NetNewsReader which is free and
available on github.

[https://apps.apple.com/us/app/netnewswire-rss-
reader/id14806...](https://apps.apple.com/us/app/netnewswire-rss-
reader/id1480640210)

------
jadell
All I want is something that lets me get an RSS feed of Instagram accounts I
follow, by giving nothing except the URL or username. I've tried 4 separate
services that all work at first, then -- a week, a month, an indefinite time
in the future -- stop working and never resume again.

~~~
zufallsheld
[https://github.com/RSS-Bridge/rss-bridge](https://github.com/RSS-Bridge/rss-
bridge) provides this and works for me. It sometimes throws an error but
recovers after that.

------
EvgeniyZh
Is there any way to turn a twitter feed (i.e., multiple users) into a single
RSS feed?

~~~
benrapscallion
If you use NewsBlur, it has an inbuilt twitter client and allows subscribing
to twitter lists, which could accomplish this.

------
null_deref
How would you guys use it? What're the use cases for this website?

~~~
dewey
If you want to subscribe to a website / blog that doesn't offer an RSS feed.

------
zouhair
[https://feed43.com/](https://feed43.com/) does it for almost any website
provided you fiddle with a bit of "code"

------
unicornporn
I selfhost this: [https://github.com/RSS-Bridge/rss-
bridge](https://github.com/RSS-Bridge/rss-bridge)

I highly recommend it.

------
todsacerdoti
If you want to process an RSS feed programmatically, you have to run code to
poll the feed and keep track of items already processed. This isn't hard to
write, but it's often not core to your app's logic.

You probably just want to run code on each new item in the feed.

Pipedream lets you treat an RSS feed as an event source. Pipedream runs the
code to poll the feed, emitting new items as the feed produces them.

RSS for Hackers - [https://rss.pipedream.com](https://rss.pipedream.com)

------
theblackcat1002
"There was a problem talking to Instagram. Please try again in a moment."
blocked by Instagram?

~~~
stefansundin
Instagram used to have an open API, but that is closed down now. The app is
currently using some private-ish endpoints, but they are ratelimited. I need
to add caching. More people have started using my app recently, and I have not
had time to add caching yet.

------
mromanuk
this is really cool. Twitter didn't work, though.

~~~
superkuh
For twitter I use the perl backend scripts
([https://github.com/ciderpunx/twitrssme/tree/master/fcgi](https://github.com/ciderpunx/twitrssme/tree/master/fcgi))
from [http://twitrss.me/](http://twitrss.me/) by itself. It's pretty easy to
scrape twitter users/searches and generate RSS feeds on disk for my native
reader.

# in a bash script called by cron every handful of hours there are many, many
lines like this:

    
    
        perl twitter_user_to_rss.pl gnuradio > ~/limbo/www/rss/gnuradio.xml
    
        perl twitter_search_to_rss_wtf.pl "rtlsdr" > ~/limbo/www/rss/rtlsdr.xml

------
hankchinaski
serious question: does anybody use RSS nowadays?

~~~
anoncake
Nope! That's why someone made that service, because neither they nor anyone
else needs it.

