One way is for clients to first ask for the latest checksum/Etag from the server, and compare it to what they have locally before requesting the resource.
But maybe that's an extra round-trip. I wonder if a different design of HTTP could have avoided all this.
Either a page requests a linked resource, or not. If it does not, we can assume that it has it cached. We can tell these two outcomes apart.
But we can make tracking much harder by not having the client send an ETag. E.g. the server could send ETags of all page's resources in the initial response, and let the client decide. This assumes that the server knows about all the related resources, about the page structure, etc. This is not unreasonable, but this does special-case HTML, and does complicate the server quite significantly.