
Show HN: Modular (serverless) property scraper in Rust with notifications - floschnell
https://github.com/floschnell/properwatcher
======
ilovefood
Good job Flo, any particular reason you used Rust? Some years ago I wrote the
same thing [0] in Python which was also automatically applying to the flats
that fit a certain criteria. I then had to scale it up Germany wide for
Bayerische Rundfunk and Spiegel Online which resulted in a nice little study
[1]. The issue I had with lambdas is that the IP's where blacklisted and one
had to solve a captcha to get to the site. Is it still the case? Any way, good
job. I'm happy to see we kinda took the same path for the design decisions,
good times. :)

[0] [https://funnybretzel.com/datamining-a-flat-in-
munich/](https://funnybretzel.com/datamining-a-flat-in-munich/) [1]
[https://www.hanna-und-ismail.de/](https://www.hanna-und-ismail.de/)

~~~
floschnell
Hey thank you for your response and the kind words. Very interesting reads you
referenced! Your article confirms on a big scale, what a few friends have been
reporting on individual basis :(. I still want to read the discussion on HN
that you referred to from your blog post ... couldn't find your crawler's
source code, did you publish that somewhere? To answer your questions, I chose
Rust, because of two reasons mainly: 1\. I knew it was popular and I wanted to
learn it. Also typical benefits of the language itself. 2\. Performance. I
wanted to use query selectors for scraping and I knew that Rust was at the
core of Firefox' Quantum Engine. Kuchiki, the DOM manipulation library that I
use, builds on top of html5ever and cssparser, both used in Servo (Firefox).
Also, crawlers can run in parallel and thus make use of multiple cores. So far
I didn't have any issues with the lambda IPs. Maybe it's because I use the
Frankfurt region. Or maybe because more and more business is happening
serverless (or in AWS in general) these days and blocking those IPs without
losing business integration is not feasible anymore. I would like to add a
filter module that just checks hard facts like prices, squaremeters and rents.
However, for now it is enough or even gives more fine grained control, to use
the filters on the property sites directly and then scrape the specific
results pages. For the sites I have seen so far, the filter were always
encoded in the URL. Sending automatic replies is a very neat feature. What
currently scares me off a bit, is that it would need to be implemented for the
different portals individually.

