Hacker News new | past | comments | ask | show | jobs | submit login

I was a professional web scraper. I still keep up to date with the industry.

These days, you do not make money by doing web scraping; you make money selling services to web scrapers. There are tons of web scraping SAAS and services out there, as well as dozens of residential proxy providers.

Most anti-bot mechanisms evolve so quickly that you can make a decent income just by working in a traditional software engineering role dedicated entirely to engineering anti-anti-bot solutions. As these mechanisms evolve rapidly, working for a web scraping company is more stable than pursuing web scraping as a profession.

Web scrapers get paid by projects, making it an unstable job in the long run. High-level web scraping requires operational investments in residential proxies and renting out servers. Additionally, low-end jobs pay very little. Brightdata hosting a conference on web scraping, which should indicate the profitability of selling services in large-scale web scraping.




I've long thought that the use of residential proxies for things like scraping and operating large-scale bot networks is a necessity, but I've never really dabbled in using them, so I've never confirmed my suspicions about how residential proxies are used at a scale like this. Do you know if insecure IoT devices and malware-infected consumer hardware as common as one might think for this? I can't imagine it would either be profitable or even possible to work with an ISP to acquire residential IPs, which kinda leaves me thinking that the only option for a residential proxy service would be pretty clandestine.


If you just search for "residential proxy" you'll find a lot of them are basically Raspberry Pis or similar shipped to people who are then paid for the amount of traffic that goes through it. Others are agents running on user's computers, I suspect at least some of these proxy providers aren't overly thorough about due diligence on how that agent got installed.


Is there a conference you would suggest that is the closest to scraping, generally speaking? As far as I know there isn't a scraping conference or strong community anywhere, and I'd like to learn and improve my skills.


The scientific aspects (algorithms, incl. implementations, performance evaluation) of Web crawling (including focused crawling) is covered by conferences like WWW, ACM SIGIR, BCS ECIR, ACM WSDM and ACM CIKM.

But you may refer to informal MeetUps or trade fairs; if so, google "Web Data Extraction Summit", "OxyCon Web Scraping Conference", "ScrapeCon 2024" (all past) or the forthcoming: https://www.ipxo.com/events/web-data-extraction-summit-2024/


The edge that every web scraper has is the knowledge they possess. In my opinion, conference presentations are usually too generalized or geared towards pitching services related to web scraping solutions.

There are some communities you can find in Discord, Telegram and most professional web scrapers are pretty active in LinkedIn and Twitter. The fun communities are in fact small groups of people with shared values and interests.


I've been writing scrapers on Upwork for many years. I'm sick of doing project based work and want to work at/start a scraping SaaS. Any advice?


I would recommend checking Google to see if you can find any job openings. Please remember that it is a niche industry, so there may not be many companies currently hiring. But honestly, if you are looking to make a full-time living, consider choosing another niche as web scraping jobs require you to consistently stay on top of your game. Most full-time jobs involve scraping data from big tech companies, and you are on your own to find solutions in bypassing anti-bot measures.


The irony is that before I realized it was so easy I would just open source the code - not on Github, mind you, since the likes of Akamai would DMCA pretty quickly, but playing a little bit of jurisdictional arbitrage I put it on Gitee - the Chinese copycat of Github. I don't have a background in any of this, but companies like the brag and it's not hard to put two and two together. It also was a practical way to enable me to place wagers on sports automatically - which was more or less my actual day job - and was pretty good for learning programming quickly in your late 20s.

Instead almost immediately I got inundated by sneaker botters in China and in English from somewhere that doesn't use it as a native language, judging from the idiosyncratic use. I kept the code up for a bit but took it down not because of any legal threats (good luck with DMCA-ing a platform endorsed by the CCP, even though I have no love for the party, I also find the American attitude that places intellectual property over real property in practice - from my experience as a defense attorney - to be just as screwed up in terms of priorities, just a matter of degrees. What made me take it down was the fact that I did not want to work in a customer service job or really for anyone, and judging by the requests, it was mostly consisted of "you do the work but we'll split the profits", which I can't believe anyone would fall for.

But since the internet is forever, some parts of code that specifically worked to emulate Cyberfed-Akamai from 0.8 to 2.3 are probably still floating around. My bad. I don't wear shoes normally - flip flops or nothing after having to wear a suit to work for a decade - and have no idea beyond what happens in NBA2K. Although cybersecurity firms making products that someone who learned how to program in their mid 20s and put online within 3 years and had it work should be pretty ashamed of how much they charge, considering that I haven't even taken a math course since 11th grade and had too much of an ADHD problem to watch videos or even read more than blog posts or documentation. Everything I learned, I learned by copying from Github and similar services until it worked. There must be a lot of snake oil being sold out there, maybe most of it, since the insidiousness of the whole thing is that selling bunk solutions seldom gets you in trouble anyway, while actual crime - rape, murder, robbery and the like - are largely lagging because the police simply prefer to complain about culture war bs instead of actually, you know, do their jobs. Who knew Judith Butler was THIS spot on.


Thank you very much for sharing your story. From what I know these days, sneaker bots as an industry have pretty much gone downhill. Not because of anti-bot measures, but because the entire industry has essentially shifted from retail stores to eBay resllers. Everyone is competing to buy the first batch to the point that it is not worth building a sneaker bot anymore.


How do you keep up with the industry?


It is kind of like Fight Club. There are 2-3 good communities that I lurk in. The people won't walk you through your scraping problems, but if you ask the questions to the right person politely, they often help.

Many residential proxy and scraping experts are pretty active on LinkedIn. But they do not talk about scraping data, just news around web scraping.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: