In Chrome/Chromium there is a WebRTC Network Limiter [1] extension that let you set "Use only my default public IP address" policy and render the method I presented ineffective.
If you use Chrome-exclusive links, please at least also link to the closest standard section [0] and preferably, mention the Chrome-linked text directly.
That said, they don't say anything about security, I obviously forgot about fingerprinting, but still don’t see security issues?
Sure, it can be! Also, as some people have already pointed out, this is often a gray area where people go beyond violating ToS. Some good examples are privacy violations (scraping personal data), credentials stuffing etc.
Recently, there is a boom of "anti-bot" services. These are essentially SaaS businesses that "protect" websites from being scraped by automated software. As you onboard the first customer who wants to extract data from a bot-protected website, you are going to run into an unlimited waterfall of stupid troubles. Your bots will be blocked, will consume excessive amount of data, kill your CPU/GPU performance.
I have shared some highlights on how to bypass these recently on HN [1], but it is sadly only the tip of the iceberg. On the other hand, since the post has been featured on HN I have been reached by more than 50 companies and individuals whose business operating model is based solely on data extraction/automated scraping. These are (in my opinion) successful companies, and two out of these are part of YC.
Some parts of Chromium seem to be intentionally exposing fingerprinting surfaces and, because its changing quickly with new features and addons, keeping up with patches like Bromite does is incredibly challenging task
I thought about it too but when you consider cost of running headless Puppeteer (lets say on AWS) and the cost of a good proxy that is charged per GB its often as expensive (if not more) as some of these SaaS-es. This is the case especially for websites with some heavyweight JS/CSS/img assets.
That's true when it's a one-time job: pull the data and disappear. I also see how this is the case for most freelancers on Fiverr or Freelancer. This is the tool they know, so they use it. However I imagine there is a number of companies that strongly rely on continous data scraping - let it be for price comparison - and I've seen one heavily using Puppeteer
@jjgreen I am genuinely interested what are the existing solutions and how people deal with the problem. This is why it's "Ask HN". If there is none and someone would be interested in using our tool, why creating two topics?
@qiyuxuan96 Hi there. I am developing a cross-browser extension development/deployment SaaS. Would you be interested in hearing more?
We make life easier with things like:
- versioning,
- packaging extensions for different extension galleries,
- collecting payments,
- gathering analytics (extension views, installations etc.).