Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Changedetection.io detect changes in websites and JSON feeds (github.com/dgtlmoon)
147 points by biggunz on Jan 3, 2022 | hide | past | favorite | 29 comments


There seem to be a lot of these services ("scrape X in order to generate an RSS/webhook/etc. change feed for X"), but I feel that I can't rely on any of them.

For anything that I would need this particular service for, my highest-concern requirement is that I'm able able to "set and forget" the service for literal years at a time. But almost always, services like this just drop dead eventually; or they tweak their scraping algorithms in ways that make previously-working configs break; etc.

Is there a change-detection service like this, being provided with stable SLAs by one of the cloud IaaS providers?

I suppose I could always find a self-hosted software product for this, and deploy it myself. But though the software would be stable, that's not very "set and forget" either; I'd be on the hook for applying updates, etc.

I want something that is to the "HTTP response change detection" abstraction, as S3-alike object-lifecycle event notifications are to the "object-storage change detection" abstraction. Some simple plumbing primitive that does exactly one thing, exactly one (obvious) way, that can be used to build the higher-level abstractions I want, with nine nines of uptime behind it, and the expectation it'll stick around for at least the next decade.


The linked software is for self-hosting. In my experience self-hosting is the safest bet. When self-hosting it's of course best to use software packages that automatically get security updates from the distribution. On Debian, for example, urlwatch would be an option.


Self-hosting is the only safest bet, or share a real server somewhere with some friends, or however you like


So you just need to set up two of these services, and have each monitor the other... Have to agree, it does feel like a case where rolling your own may be less elegant, but at least you'll know how the plumbing works (or if it doesn't).


It sounds like you value permanence and reliability over feature set. You may want to consider https://www.followthatpage.com/

I have been using it for years, and—judging from it's visual design—it has been around for years longer.


I think you're right that the services themselves break, but more often than not, the failure mode is that what page you're monitoring changes its structure, and then every refresh gives you data that is incommensurate with what came before. It's why we version APIs, but there's no such versioning we expose on websites.

I think that also points to why what you propose would be very hard to do: in HTML pages, even if the page structure (a new div level) changes, it's hard to say if that's actually a fundamental change to how they've architected the information hierarchy you're trying to get notified out, or if that is indeed the information itself. It's ambiguous.

Maybe that's okay? Like, if all you want is a notification about every time a thing changes, regardless of the meaning of the change, then what you're describing is possible. But I think the meaning of the change is important, and that's ultimately what shifts over time in ways computers cannot grok.


"But I think the meaning of the change is important" this is exactly right, is it a whitespace? is it the text? what is "change" ?


Visualping.io works pretty well as a SaaS.

They have an okay free tier (65 checks per month).

https://visualping.io/pricing

I just wish their pricing model was less check-based. I would use it a lot more if I could pay $10–20/month for unlimited checks, but that's not something they offer. Their pricing model scales up really aggressively compared to the underlying resource usage.


Feels like you need something like a non profit providing this service, similar to Let's Encrypt, with enough funds to keep a two person team (bus factor) compensated for maintenance and ongoing support.


I've been running this software locally as a docker container on my laptop since about February 2021, obviously with a few updates/restarts, but generally it is set and forget


StackDriver supports health checks: https://cloud.google.com/monitoring/uptime-checks


> Is there a change-detection service like this, being provided with stable SLAs by one of the cloud IaaS providers?

You're looking for Zyte (formerly ScrapingHub).


> You're looking for Zyte (formerly ScrapingHub).

Are you sure? it seems focused on scraping/crawling/data extraction, and I'm not seeing any built-in capabilities to (for example) simply trigger a callback when a resource changes.


changedetection.io supports filters and triggers which all can result in a notification to a JSON call or anyone of the existing hundreds of other services (email, discord, etc)


I used this to replace Visualping.io. The only features it lacks compared to VisualPing is a GUI for selecting page elements to monitor and fully rendered diff results. It either monitors the whole page or you can add a CSS selector manually.


It accepts CSS filters - click on EDIT, Then "Filters & Triggers" and there you can fill in a CSS filter, more stuff coming soon :)


Nice and easy to self-host with Docker, thanks. Might save me more than a few bookmarks that I check every couple weeks manually.

Small feature wish: optionally fetch and display favicons next to the URLs or <title>s in the website list.


Great idea!


I'm using urlwatch for this:

https://thp.io/2008/urlwatch/


Looks nice! Good luck.

Were in a similar space, coming out with a SaaS in a few weeks.


Nice little project. I played around with it for a bit, but couldn't figure out how to extract snippets (e.g. a list of headlines, like from the hackernews home page). It seems to be good at grabbing a dumb dump of text within a given CSS selector, but no regex pattern matching within a line (the regex fields appear to toss out all lines that match the pattern).


How did you create the screenshot difference? https://raw.githubusercontent.com/dgtlmoon/changedetection.i...


Just wait for a change to be detected and then clicked "DIFF" button in the main list, easy.. the screenshot is just from my browser of my local installation of the software


This would fit well as a browser extension - generate your own feeds with or without RSS. Coupled with a simple topic classifier with user defined topics it would bring some order to the chaos.


I tried this project recently, but it didn’t seem to handle pages that load content dynamically via XHR. At least the couple pages I tried at the time.



Indeed I had it setup using a separate selenium chrome container. Still had the same issue


Nice! Does it have an API for adding URLs to monitor? Would be nice to have a browser extension to do that easily.


Yeah, coming in the next week or so actually, or feel free to open a PR! also adding an API to retrieve snapshots (text/JSON) too




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: