Hacker News new | past | comments | ask | show | jobs | submit login
Web Scraping Using ChatGPT – Complete Guide with Examples (proxiesapi.com)
20 points by anticlickwise 7 months ago | hide | past | favorite | 7 comments



Side note: Add this to your robots.txt if you don't want ChatGPT to scrape your website:

  User-Agent: ChatGPT-User
  User-Agent: GPTBot
  Disallow: /
Sources:

- https://platform.openai.com/docs/gptbot

- https://platform.openai.com/docs/plugins/bot


Side note to your side note: the official beta ChatGPT browser plugin’s user agent is Python/3.9 aiohttp/3.8.4 (I just tested it on a domain that I control)

And like a sibling comment says, any plugins/agents that are trying to scrape will have various fingerprints.


Well this has little to do with the article here, ChatGPT is not accessing the website itself, I even think that this feature is disable at the moment.


They opened browsing feature for Plus and Enterprise users 2 days ago.


If you're feeding the specific nodes to scrape, iterating and bug fixing what is Chatgpt doing other than giving you someone to talk with while you code?

Call me when I can ask a LLM to pull structured data in CSV form from website X and deliver it to me each morning. And it does it.


https://agenthub.dev - not my project (but I have a similar project in my bio if anyone is interested)


This doesn't seem too difficult of an ask. Let me see if I can scrape something together.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: