Rate limit per website (e.g. don't download more than 10 images per domain per second)
Limit the total number of images it downloads per document, so a single user can not cause too much traffic.
(I know that servers can be configured not to send ETags or break caches by sending random ones every time, but this could reduce the data usage considerably since most of the responses would only include the headers.)
If we ignore that ETags are related to URLs and not 'files', ETag as suggested by
userbinator might work for some cases, but if the large file is dynamically generated, it's unlikely to have an ETag; defaults in many servers are to make an ETag based on the inode of the file rather than any properties of the file, so if there are multiple servers behind a load balancer, they're likely to return different ETags.