Hacker News new | past | comments | ask | show | jobs | submit login
How far I'll go to make an RSS feed of your website (chrishardie.com)
6 points by ChrisHardie 10 hours ago | hide | past | favorite | 7 comments





Love all of these RSS resources. Thanks for sharing!

Last week, I spent a couple of hours at a local hack event putting an RSS aggregator[1] together for our community. Just something fun to do.

One thing I realized when I deployed is that Substack gives a 403 if you try to read their RSS feeds from a GitHub Action. The only obvious workaround to me is to pull the content on local periodically, commit it, and then deploy. But I'd much rather have this site updating itself via GitHub Action and cron.

Have you run into this situation before?

[1]: https://github.com/astoria-tech/subcurrent-astro/


The usual case I run in to is that a site will block requests with User-Agent header strings that don't at least try to look like a regular browser, or that appear on some list of known bots/automation tools. (If they are using Cloudflare, this is a very easy state for a site to get in to.) I'm not sure if GH actions lets you customize the user agent in the spot you're hitting the issue, but that's where I'd start.

Every small town deserves someone like this. And also someone moderating local Facebook groups to tamp down the scams. For every elderly victim of scams, there’s a honest hard-working young person who discovers that the corresponding economic opportunity moved elsewhere.

Isn't it time-consuming to build a scraper for every website you want to get updates from? What if the HTML is a mess and full of dynamic front-end-framework classes etc?

Once I created the structure to support lots of different kinds of scraping-to-feed conversions, it’s usually fast to add a new target site in to the mix. There are definitely exceptions, and definitely the occasional maintenance when someone updates their CSS.

For apps, static analysis and reverse engineering can be a good alternative (or complement) to the proxy-in-the-middle technique.

Grato!



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: