The hash is a pictorial representation of the image, and not quite a checksum of...

The hash is a pictorial representation of the image, and not quite a checksum of the raw file data (like MD5 etc.). I would expect that even photos of printed photos would still have the same pictorial hash (if the photos are properly aligned), where obviously the cryptographic hash would be much different (since it's not an exact replica of the original image) but in the ML's eyes (bearing in mind the pictorial hash is generated through machine learning afaik), there would be a very strong match between visually similar images.

I suppose that it's a bit like when you do a reverse image search on your favourite search engine. When you upload an image, the engine will try and find images that the ML thinks look the same, even if the bits and bytes that make up the file are different. From what I can see, the similarity detection will be much more specific so as to not generate false positives. As you theorise though, it might be possible to modify images to evade detection if the hash's match specificity is high enough.

All bearing in mind that the pictorial hash also is supposedly designed to be a one-way function to ensure that those who know file hashes don't know what the original contents of the file are.