> Can you give some rough indications of how many pages you index in total?
I index like 300 million documents right now, though I crawl something like 1.4 billion (and could index them all). The search engine is pretty judicious about filtering out low quality results, mostly because this improves the search results.
> How many page you crawl each day?
I don't know if I have a good answer for that. In general the crawling isn't really much of a bottleneck. I try to refresh the index completely every ~8 weeks, and also have some capabilities for discovering recent changes via RSS feeds.
> Size of the machine(s) in RAM and HDD?
It's an EPYC 7543 x2 SMP machine with 512 GB RAM and something like 90 TB disk space, all NVMe storage.
Can you give some rough indications of how many pages you index in total? How many page you crawl each day? Size of the machine(s) in RAM and HDD?
Sorry, many questions, just genuinely intrigued!