
Impact of metadata on Image Performance - inian
https://blog.dexecure.com/impact-of-metadata-on-image-performance/
======
sbierwagen

      On an average, this kind of metadata occupies 16% of size 
      of the JPEG file.
    

Ho ho. You think that's bad? Back in 2011, Tumblr didn't strip metadata from
_avatar_ images. That results in some funny files, like this one:
[http://28.media.tumblr.com/avatar_c5ee131b70d0_40.png](http://28.media.tumblr.com/avatar_c5ee131b70d0_40.png)

That PNG has a 3325 byte IDAT chunk, and a 106022 iCCP chunk. The metadata is
3188% bigger than the image itself.

Personally, I think websites _should_ strip metadata from thumbnails and
resized images, but should _also_ let you download the original, unmodified
image, complete with original filename. Why?

Instagram and others always recompress and strip metadata when you submit an
image. This results in shitpics-- images so mangled by recompression that they
look like visual gravel: [https://theawl.com/the-triumphant-rise-of-the-
shitpic-e25d8e...](https://theawl.com/the-triumphant-rise-of-the-
shitpic-e25d8e5af9bc#.bkxh5tln3) This is a complete own goal, there's no
technical reason this has to happen. Digital files aren't supposed to decay!

And, of course, stripping authorship tags would make the dream of automated
attribution impossible: [https://eev.ee/blog/2016/08/15/attribution-on-the-
web/](https://eev.ee/blog/2016/08/15/attribution-on-the-web/)

~~~
inian
I just looked at JPEG files for this..should look at PNG files too..hopefully
things are much better than that image you posted haha

------
soamv
From my experience hosting a bunch of user-provided images:

1\. Strip all metadata but provide downloads of originals somewhere

2\. Keep it simple, just use imagemagick's convert to remove profiles (but
don't use imagemagick for file type detection)

3\. If the image has orientation exif tags, rotate the image to the right
orientation (-auto-orient) before removing the exif profile.

4\. Don't remove image profile data. Or convert to sRGB first.

------
huphtur
ImageOptim is a handy little tool to strip all the metadata
[https://imageoptim.com/mac](https://imageoptim.com/mac)

------
laurent123456
There's some use to this metadata, for example gps coordinates to locate where
it was taken, author info, camera parameters, etc. It might not be needed all
the time, but it probably also shouldn't be stripped off all the time.

~~~
inian
Yup this information is indeed useful for a lot of cases (for photo editing
software, etc.)..But for images delivered on the web it makes sense to
preprocess them to strip off the EXIF data since it is mostly not used by
browsers.

~~~
tombrossman
Exif data is particularly useful for preserving copyright metadata and
(optionally) contact info for the photographer. Stripping too much metadata
perpetuates the 'orphan works' problem and creates lots of photos floating
around the internet that can never be used commercially, because the
photographer cannot be identified.

More info here from the US Copyright office - choice quote: _" For good faith
users, orphan works are a frustration, a liability risk, and a major cause of
gridlock in the digital marketplace."_
[http://www.copyright.gov/orphan/](http://www.copyright.gov/orphan/)

Also, the UK has effectively given everyone the green light to steal photos
lacking metadata because it's basically too difficult to find the
photographer.
[http://www.bbc.co.uk/news/technology-22337406](http://www.bbc.co.uk/news/technology-22337406)

And from my perspective, I release many images CCO Public Domain with my email
or name in the metadata, and I'm annoyed that this metadata is not preserved
because it means people may be reluctant to re-use my images due to (non-
existent) copyright concerns.

~~~
inian
Yup, I have covered this use case in the article

------
steaminghacker
Does Google index the metadata within images?

~~~
inian
AFAIK Google doesn't use this data for indexing or SEO purposes..

~~~
eyelidlessness
They do capture the information. I'd be shocked if it isn't considered in
PageRank.

~~~
steaminghacker
thanks. For web pages, I usually clear any existing metadata within images,
then insert some simple (but correct) keywords about the image. Wondering if
I'm wasting my time.

