We are design an analytics logging API for our JS application. What bothers us is spamming prevention. I can't come up with a better way than rate limiting to prevent someone else use a fake client and spam the API. Is there something I missed here? What are some best practices on this? What do generic services, say Google Analytics, solve the problem?
I often have the same concern. What I ended up doing was issuing a token to each user, and verifying the token on the logging endpoint. That way, if someone decided to fill the logs with spam, we can easily delete all the events from that token. From what I've seen, major analytics services seem to do nothing at all to prevent a client from pretending to be any user, so this kind of abuse is probably rare enough that it shouldn't be a big concern. Some basic rate limiting is always a good idea, though.
But what is preventing a spammer who has made up his mind to reverse engineer your analytics code from faking the tokens? If its only about filtering, wouldn't it be easier and more effective to do them via ip-addresses?
>so this kind of abuse is probably rare enough that it shouldn't be a big concern
Yes I would agree too that its pretty rare since webmasters are more cautious around fishy looking sites. However, there were 2-3 instances I noticed someone spammed a referral into my Google Analytics data.
The tokens are cryptographically signed with a shared secret between the main server and analytics server. So an attacker can't forge another user's token. IP addresses would be a good solution also, especially if you need to track anonymous users.