| ||Ask HN: Best practices for ethical web scraping?|
270 points by aspyct on April 4, 2020 | hide | past | favorite | 107 comments |
As part of my learning in data science, I need/want to gather data. One relatively easy way to do that is web scraping.
However I'd like to do that in a respectful way. Here are three things I can think of:
1. Identify my bot with a user agent/info URL, and provide a way to contact me
2. Don't DoS websites with tons of request.
3. Respect the robots.txt
What else would be considered good practice when it comes to web scraping?
| Apply to YC