That kind of respectful messaging alone is making me want to take a closer look. Though it should be “then you will be able”.
> Important: Please do not overload this service. Do not make more requests than you need.
What if we don’t have control over the frequency of requests (e.g. using a service like Feedly)? Do those happen often enough that we’d need to host the app ourselves?
I know the wording is a bit vague, and I know that most services don't let you customize this. I added it after I suddenly started receiving tons of traffic caused by, I suspect, a single user. This person was purposefully fetching feeds multiple times a minute.
Anyhow, if you aren't actively trying to abuse the service, you should be good. Some RSS readers have "boost" features to fetch feeds more frequently (often a paid feature).
Once I am able to add some good caching, then I may be able to remove that notice. But right now, the service is kinda overloaded and that is why some of the services (Twitter and Instagram in particular) may give you errors at the moment.
It’s really people who don’t use aggregation services and set their clients to update very frequently (say every minute) that pose a problem.
Prioritizing lower subscribership feeds if a smaller number of premium customers add them makes some sense though.
I believe most RSS reader-clients and aggregator backends are programmed to respect HTTP cache-control headers; so as long as the developer of this service sets those headers appropriately for their endpoints, there shouldn't be a problem.
The warning is likely more for people's custom scripting using curl(1) et al, where there isn't an HTTP cache in the code path.
For example, the button for “exclude retweets” would be:
<button type=submit name=include_rts value=0>Exclude retweets</button>
And download links like the Ustream one would use formaction="/ustream/download" on their submit button.
I run a similar service where I rate limit. I've whitelisted the IPs of the centralised feed readers like Feedly (at least the ones I've been able to identify), but the reate limit for non-whitelisted IPs is generous enough that it's really there to only catch the really problematic scripts which crop up from time to time.
Fixed. Thanks! :)
More info here: https://joshrollinswrites.com/help-desk-head-desk/20200611/
It really helps to break away from the addictive properties of YouTube's "Up Next" algorithm
I have to wonder how much more dead RSS would be if Wordpress (like, 90% of blogs/news sites) didn't create a /feed by default.
Helps discover rss feeds for sites.
The screenshot of the RSS icon in the url bar brings back a memory I don't know if I had. Didn't Firefox used to do this by default or am I misremembering?
Use case are old blog archive one wants to (re-) read sequentially from the start but not in binge mode. So maybe one post per day or week. I'm thinking about the old posts of Aaron Swartz or Steve Yegge.
Just adding the feed to a feed reader is often not sufficient because the feed only contains the last 20 entries or so.
This would also be useful for going through historical incidents - e.g. replaying the top 10 politics blogs, day by day, during momentous events. It's simple enough to just treat it as an offset; the display would clearly say the original date of publication. This is a lot better than simply adding old RSS feeds since it comes in at the same rate it would have happened in the moment.
Something similar could be set up for newspapers. Imagine receiving all space-related stories from Life, NYT, Guardian, Spiegel, for the time period from say 1965-1970.
The current best version of this requires a huge amount of research into old newspapers, and also reading books and then manually connecting each book's timeline. If instead you could in parallel consume multiple sources, the correlation would be natural.
Even better would be adding in later commentary about those events.
Example: take "History of the Decline and Fall of the Roman Empire", and annotate each page with contemporary thought from each era. So every page would a section on what the scholarly response was when it came out, then 50 years later, then 100, each adding in new methods of investigation and validation/testing of the claims as archaeology, linguistics, carbon-dating, anthropology, etc. developed in the background.
Sounds like internet archive for RSS.
I don't intend to maintain it further but all source code is available and it's not terribly complicated (all the hard stuff is done by other python libraries).
edit: Both rssbox in the OP, and RSS-bridge are open source. I was thinking of the latter. There's also RSSHub.
While building a feed reader of my own, I had a recent idea for a project for universal content crawling rules: how is the content hierarchy organized on each site and how do you extract it from each content page. A single community project that any other project could use to crawl websites for their content.
Looks like rss-bridge comes close to that.
It's used, in addition to an automatic article extractor, in Full-Text RSS: http://ftr.fivefilters.org
I tried a whole lot of solutions before discovering this service, its the only one I've found flexible enough to handle these random tech endpoint formats, it's basically like RSS-Bridge (mentioned in other comments) with a visual regex parser built in to avoid having to write actual code like RSS-Bridge requires. $0.02 YMMV :)
I know it'd have to be subjective per user (security ACLs ⇒ different accounts seeing differing subsets of other accounts' posts); but I'd be fine with just getting my own account's subjective view, by logging into such a service using Facebook OAuth (or, if that isn't enough, then I'd be fine with handing over my Facebook creds themselves, ala XAuth, provided the service is a FOSS one I'm running a copy of myself in e.g. an ownCloud instance.)
I also know that it'd likely require heavyweight scraping using e.g. Puppeteer, to fool Facebook into thinking it's real traffic. But that's not really that much of an impediment, as long as you don't need to scale it to more than a dozen-or-so scrapes per second. (Which you'd automatically be safe from if it was a host-it-yourself solution, since there'd only be one concurrent user of your instance.)
Anyone done this?
Only question I have: do you really have to assign every Github issue to yourself, the sole developer? Something about it cracks me up: https://github.com/kickscondor/fraidycat/issues
> "Thanks for the bug report. Fortunately for you, our best man is on the job!"
> kickscondor has assigned the issue to kickscondor
Anyways, just playing. Great product and great shepherding of the Github project.
You can use the --format flag to pick between Markdown/text/HTML output, so it should serve your purpose.
If anyone here has looked for a reader view on Chrome, odds are you've probably stumbled upon Mercury Reader. This is what powers it.
Also I reckon pandoc is worth a try.
At some point I hope to get enough time to implement a caching solution, which should hopefully resolve most of these issues.
(you can find multiple instances on the web)
With Mailchimp, well, you look for a "view in browser" or "share this issue with friends" link in the newsletter. On the archive page it takes you to, an RSS link is on the righthand corner.
I work on a somewhat similar project called Feed Creator which can be used for less popular pages where you can select elements for the feed using CSS selectors: https://createfeed.fivefilters.org
I'd feel much more comfortable using a standalone tool that I could run on my own laptop (ideally one that didn't require running a web server or even a web browser).
Even stripping everything out but plaintext with an HTML parser to put it in a text view, I realized I could wrap the links with native Cocoa labels that act as hyperlinks. And then do the same with images. Hmm, what about tables and stuff? Soon I realized, why would I even want this? It's annoying to visit the origin site when the RSS reader can just render it, and it kinda defeats the purpose.
It's a feature-rich RSS reader that runs completely in the terminal, presenting text-only views of each RSS feed.
The links open in the browser of your choice (which for me is a text-only version of emacs-w3m, which I also run exclusively in the terminal).
However, some RSS items can be read in their entirety within the RSS reader, and does not require the opening of any links. This is my preferred method of consuming RSS.
 - https://newsboat.org/
 - https://github.com/newsboat/newsboat
 - ie. those RSS items for which the author has chosen to make their entire article/post available over RSS instead of merely posting a teaser and requiring browsing to their website to read the rest
I highly recommend it.
You probably just want to run code on each new item in the feed.
Pipedream lets you treat an RSS feed as an event source. Pipedream runs the code to poll the feed, emitting new items as the feed produces them.
RSS for Hackers - https://rss.pipedream.com
# in a bash script called by cron every handful of hours there are many, many lines like this:
perl twitter_user_to_rss.pl gnuradio > ~/limbo/www/rss/gnuradio.xml
perl twitter_search_to_rss_wtf.pl "rtlsdr" > ~/limbo/www/rss/rtlsdr.xml
1. You get a personalized view of what you have and have not read.
2. You can scan over a lot of posts very quickly, and pick out what you want to read.
3. You can aggregate a lot of different websites in a single place. No need to visit each website individually.
4. Increased privacy.
5. Less tracking.
6. Increased control.
7. Fewer ads.
Probably more reasons, but these are the primary reasons why I still prefer RSS.
I'd say 95% of my content discovery comes from RSS; the few sites that don't tend to be high volume sites - like news - where I get value from visiting the home page to see how editors have prioritised stories.