Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. Find many examples of these nofollow links

2. Create a webpage with these links, not including the nofollow

3. ...

4. Profit!



Cynical-me suspects step three is something to do with:

"while allowing legitimate users and verified crawlers to browse normally."

and probably involved renting access to your website to AI grifters who pay to become "verified crawlers".


The best part about "verified crawlers" is that there's no easy way to discover how to become one. Or if you need to become one.


Everybody knows how to become one. It's just like every "enterprise SaaS" out there. There's no 3 tier pricing plan with lists of features. You need to contact enterprise sales so they can work out how much you can afford to pay, then take all your money.

And you _know_ if you need to become a "verified crawler", you just need to remember the developers you demoted or fired when they brought up the ethical problems of way you've configured your crawlers.


How does that second paragraph work? I run engineering at Common Crawl, and Common Crawl is ethical and has never fired a developer over ethics.

During the End of Term 2024 crawl[1], we discovered a lot of blocking on US government websites. Many of these sites were also blocking the Internet Archive and the US National Archives. The US National Archives is a government agency.

1: https://eotarchive.org/




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: