Some companies publish the IP ranges of their crawler (OpenAI [1], Mistral [2] for example) but many like Anthropic don’t.
Not sure those lists can be fully trusted though. Perplexity, for instance, was caught using IPs outside of their declared list [3].
[1] https://platform.openai.com/docs/bots
[2] https://docs.mistral.ai/robots/
[3] https://blog.cloudflare.com/perplexity-is-using-stealth-unde...