Log parsing seems like the logical choice for the static site crowd but it seems like there's little interest there. I must be missing something.
Any site is constantly being accessed by bots, only some of whom announce themselves in the user agent. Some are deliberately designed to mimic human browsing and you can only tell by carefully following their access pattern.
We host websites and the bots are super annoying, because even the well-behaved ones throttle requests per domain, which means they just hit all of our customers at once. If our cache architecture were a little more rotten, like I’ve seen on other jobs, then bot-driven evictions would get ugly, instead of just spiking our traffic, increasing our overhead, and making it harder to get clear metrics.
others have pointed out:
- Client side SPAs sometimes don't hit server logs
- Some static sites are hosted places where you don't have access (github pages, netlify, etc)
- Bots are sometimes defeated by a simple js file