
A dumb reason computer vision apps aren’t working: Exif Orientation - ageitgey
https://medium.com/@ageitgey/the-dumb-reason-your-fancy-computer-vision-app-isnt-working-exif-orientation-73166c7d39da
======
mceachen
Getting image and video orientation correct, which should be trivial, is
decidedly not. In writing PhotoStructure [1], I discovered

1) Firefox respects the Orientation tag using a standard [2] CSS tag, but
Chrome doesn't [3]

2) That videos sometimes use "Rotation", not "Orientation" (and Rotation is
encoded in degrees, not the crazy EXIF rotation enum). Oh, and some
manufacturers use "CameraOrientation" instead, just for the LOLs [4]

3) That the embedded images in JPEGs, RAW, and videos _sometimes_ are rotated
correctly, and sometimes they are not, depending on the Make and Model of the
camera that produced the file. If Orientation or Rotation say anything other
than "not rotated," you can't trust what's in that bag of bits.

[1] [https://blog.photostructure.com/introducing-
photostructure/](https://blog.photostructure.com/introducing-photostructure/)

[2] [https://www.w3.org/TR/css-images-4/#image-
notation](https://www.w3.org/TR/css-images-4/#image-notation)

[3]
[https://bugs.chromium.org/p/chromium/issues/detail?id=158753](https://bugs.chromium.org/p/chromium/issues/detail?id=158753)

[4] [https://exiftool-
vendored.js.org/interfaces/makernotestags.h...](https://exiftool-
vendored.js.org/interfaces/makernotestags.html#rotation) (caution, large
page!)

~~~
jcims
They should just put a gravity vector in the image.

~~~
gregmac
This would (maybe) help with a common thing my wife does when taking videos:
Start recording in portrait mode, then realize you did that, and rotate the
phone 90 degrees to get widescreen video (but without restarting the
recording).

When you play it back on a phone (with auto-orientation mode on), starting
from holding the phone in portrait mode (as you normally do):

* it starts playing back as portrait, which looks fine

* the video rotates (because the camera was physically rotated), so now you're watching a widescreen video that's 90 degrees off

* Your natural reaction is to flip the phone 90 degrees to make down "down" again, but this changes the phone into widescreen mode, and because it thinks it's playing a portrait-style video, it changes to portrait-in-widescreen mode, and now the video is again tilted 90 degrees but 1/3 the size with huge black bars on either side

If you play it back on a computer/TV, you get the same end result: a
widescreen video that's rotated 90 degrees, and 1/3 the size with huge black
bars on either side.

~~~
taneq
Can't help you with the video-taking technique but when playing back the
videos, if you hold the phone so that the video matches the screen, then
rotate it so the screen is upwards, you can then spin it so it looks the right
way up as long as you keep the screen vertical enough.

------
6gvONxR4sf7o
One of my pet peeves in ML/stats/data science is people who hardly look at
their data. Unless there are privacy reasons not to, then you really need to
look at some data. You'll learn so much more from looking at a few hundred
samples than you will from different metrics. You'll get a feel for how
complex the problem is, or whether something simple will do. Check your
assumptions. You might even realize that your images are sideways.

~~~
avip
That’s what my first numeric mentor taught me. You _have to look at raw data_.
The first q he’d ask any programmer was “did you look at the data, the actual
data?”. He was a PhD in Physics and his approach really sticked with me.

But it’s not always straightforward to “look at data”

~~~
barry-cotter
Stuck. Everything else is perfect idiomatic English.

~~~
avip
Thanks!!

------
enlyth
I ran into this headache while writing a web app.

I made an image upload widget that provided a preview, and when users selected
the "take a picture" option on their phones, I showed the preview with a blob
link and CSS background-image property. The images were showing sideways on
some phones.

I looked at the EXIF data of those photos and of course Orientation: 90 showed
up.

It was easy to fix on the backend when processing the images but I struggled
to do it in a performant way on the front-end. One solution involved reading
the EXIF data with JS and rotating with canvas and .toBlob(), but proved too
slow for large 10MB photos as it blocks the main UI thread.

One thing I thought of is just reading the orientation and the using CSS
transforms to rotate the preview, but I never got around to trying it.

~~~
Waterluvian
Try web workers! I've been doing all kinds of crazy expensive spatial data
validation in many parallel threads. It's amazing. I think off screen canvas
in workers is gaining browser support recently.

~~~
inetknght
> _Try web workers!_

NO, please do not try web workers. I don't want any _more_ running on my
device than absolutely necessary.

~~~
BubRoss
You don't want to use all your cores to run things faster?

~~~
inetknght
Nope. I want things to look plain and simple. I want to use fewer cores for
longer battery life.

~~~
BubRoss
The faster a CPU finishes, the faster the CPU can sleep, which is how you save
battery life. The more cores you use the lower the CPU frequency can be, which
saves power, since frequency increases do not use power linearly.

I think it goes without saying that javascript workers have nothing to do with
the design and layout of a webpage.

~~~
inetknght
> _The faster a CPU finishes, the faster the CPU can sleep, which is how you
> save battery life._

Yup.

> _The more cores you use the lower the CPU frequency can be, which saves
> power, since frequency increases do not use power linearly._

More cores in use has little to do with frequency and more to do with heat.
More heat means more thermal throttling which lowers frequency. Lower
frequency means that the CPU _doesn 't_ sleep sooner.

> _I think it goes without saying that javascript workers have nothing to do
> with the design and layout of a webpage._

Yup. That's exactly why I don't want them. Why should I execute something
which doesn't, and shouldn't, have anything to do with rendering page content?

Don't get me wrong, I'm _fine_ with using more cores if it's actually
beneficial. But every use I've ever seen for a web/service/javascript worker
has _always_ been user hostile by taking what should be done on a server and
offloading it onto the user's device instead.

~~~
BubRoss
Using all your cores for the same workload would mean it finishes faster or
finishes in the same time with significantly lower frequency. It saves power
and heat. Your example would mean using more cores for the same amount of
time, which makes no sense in this comparison.

Do you also buy single core computers to save power?

It is clear that what you are saying has nothing to do with javascript
features at all and just boils down to not liking bloated web pages.

------
LeoPanthera
ImageMagick will "apply" the exif rotation to the image data for you.

    
    
        convert input -auto-orient output
    

Or in place:

    
    
        mogrify -auto-orient *

------
rkagerer
Even outside of machine vision this issue has caused me all sorts of
headaches. Especially when switching between tools that honor or ignore the
tag.

 _" But the tricky part is that your camera doesn’t actually rotate the image
data inside the file that it saves to disk."_

Cameraphones are so powerful these days. I don't understand why the app can't
simply include a behavior setting to always flip the original pixels.

~~~
joosters
It might not be up to the app? If the phone has hardware-accelerated JPEG
compression, then potentially the image will already be compressed in the
'wrong' orientation before the app gets its hands on it. So rotating the image
data could involve re-compressing the image again, leading to quality loss. Or
if you choose to get the raw sensor data instead, and rotate the data before
doing the compression yourself, you might lose out on the hw acceleration
entirely.

(I've not developed any camera apps, so this is just a guess!)

~~~
pwg
> So rotating the image data could involve re-compressing the image again,
> leading to quality loss.

With JPEG, no, 90 degree rotations can be accomplished in a lossless manner.

See jpegtran from the libjpeg library. A version of its manpage is here:

[https://linux.die.net/man/1/jpegtran](https://linux.die.net/man/1/jpegtran)

Quoting:

"... It can also perform some rearrangements of the image data, for example
turning an image from landscape to portrait format by rotation.

jpegtran works by rearranging the compressed data (DCT coefficients), without
ever fully decoding the image. Therefore, its transformations are lossless:
there is no image degradation at all, which would not be true if you used
djpeg followed by cjpeg to accomplish the same conversion. ..."

------
jeroenhd
I'm not sure how good your algorithm is if it can only work in a specific
orientation. A slightly tilted image can already cause problems for such an
algorithm and many people have a hard time getting a picture with only a few
degrees of rotation.

It would probably be better to deal with this in an elegant way, e.g. set up
the algorithm to work regardless of orientation. This seems like a (mostly)
solved problem: [https://d4nst.github.io/2017/01/12/image-
orientation/](https://d4nst.github.io/2017/01/12/image-orientation/)

~~~
cookingrobot
That’s not an option if you’re trying to detect left-arrows vs right-arrows,
on street signs for ex.

~~~
jeroenhd
Depends. Based on the linked article, the picture can be completely upside
down if it was taken in landscape mode. The article I linked is capable of
rotating it upright regardless. After rotating to the right orientation, left
and right are suddenly perfectly workable.

Of course using exif data for such rotations is easier, but a tilted picture
of a tilted sign can create a lot of tilt that human vision copes with fine
but an orientation dependent network cannot.

------
sandos
If the algoritm fails with a rotated image, then I would claim its a bad
algorithm that is over-fit and not at all generalising what it has learnt.
What about slight rotation? Where does it start failing, and would a human?

Also, making an algorithm for detecting rotated images should be easy if it
affect the results so much.

~~~
DougBTX
Yea, there’s some truth to that, but then we wouldn’t get these gems:

Is it a duck or is it a rabbit?

[https://www.reddit.com/r/dataisbeautiful/comments/aydqig/is_...](https://www.reddit.com/r/dataisbeautiful/comments/aydqig/is_it_a_duck_or_a_rabbit_for_google_cloud_vision/)

------
planis
Topic of image orientation always reminds me about this article:
[https://www.daveperrett.com/articles/2012/07/28/exif-
orienta...](https://www.daveperrett.com/articles/2012/07/28/exif-orientation-
handling-is-a-ghetto/)

------
quietbritishjim
One of the common image processing libraries _does_ take into account EXIF
rotation: OpenCV [1] I would tend to use that over manually rotating using the
code from the article. Although beware that OpenCV cannot open quite as many
formats as Pillow, most notably it cannot open GIFs due to patent issues. You
can get a prebuilt wheel of OpenCV by pip installing the package opencv-python
[2]

[1]
[https://docs.opencv.org/trunk/d4/da8/group__imgcodecs.html#g...](https://docs.opencv.org/trunk/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56)

[2] [https://pypi.org/project/opencv-python/](https://pypi.org/project/opencv-
python/)

------
mark-r
There's a big photography site I use that treats the EXIF inconsistently. I
rotate the image in my editor to work on it, then save and upload. In some
contexts it looks OK, but in other contexts the site rotates the image again
and it's wrong. I don't want to strip the EXIF because it has interesting
information such as the camera model and lens, and the exposure settings. My
editor doesn't correct the EXIF rotation setting, so I have no choice but to
use a utility to strip that single value from the file before I upload it.

------
eggie5
CNNs are translation and scale invariant thanks mostly to the pooling
operation. Good data augmentation (rotating images for example) would have
build a model more robust to this effect.

------
Fission
This is functionally a coordinate system problem. Thankfully, it's pretty easy
here. Just wait until we start getting more models for 3D data. I worked with
3D data for the longest time, and it was incredibly painful — different
libraries can use wildly different coordinate systems (e.g. I've seen the up-
direction be +z, -z, +y, and -y). At that point, it's nontrivially difficult
to even figure out what the right way to convert between coordinate systems
is.

------
m3kw9
Train a model that recognizes L,R, or U side way images to reorient them all.
Run it once.

------
anilakar
Some websites also strip EXIF data from user submitted content but do not
rotate the image to match the now-missing information about which side is up.

------
layoutIfNeeded
Why don’t you train your classifiers to be orientation-agnostic? My brain
certainly had no problem recognizing the geese on the sideways image...

~~~
6gvONxR4sf7o
Why make it harder than it has to be?

~~~
variaga
I'd say you're not making the problem "harder" \- rather, requiring that
object detection be orietation-agnostic makes the problem exactly as hard as
the problem actually is. Allowing the network to train only on images with a
known, fixed, (correct) orientation makes the object detection too _easy_, so
the network results will likely fail if you feed it any real-world data.

i.e. you should be training your image application with
skew/stretch/shrink/rotation/color-pallete-shift/contrast-adjust/noise-
addition/etc. applied to all training images if you want it to be useful for
anything other than getting a high top-N score on the validation set.

------
peterhj
Bonus round:

(1) Find the ImageNet (ILSVRC2012) images possessing EXIF orientation
metadata.

(2) Of those images, find which ones have the "correct" EXIF orientation.

~~~
yeldarb
Has anyone done this yet? I'd be curious to know if their dataset is "cleaned"
or not.

~~~
peterhj
Spoiler, ish:

The last time I measured ImageNet JPEGs with EXIF orientation metadata, the
number of affected images was actually quite small (< 100, out of a dataset of
1.28M). There are also some duplicates, but altogether it seems fairly
"clean."

------
spullara
When I built a consumer site that processed tons of photos I ran into this all
the time. Ended up doing it all myself by parsing the exif data and doing the
rotation. Also ended up writing some pretty extensive resize code that worked
much better than what was built-in and more like the great scaling effects you
see in Preview.app.

~~~
kccqzy
It always bothers me that many libraries still use nearest-neighbor to resize
images instead of some kind of Lanczos filtering.

------
unnouinceput
Hi Adam, if you read this, you have a small mistake.

Your quote: "This tells the image viewer program that the image needs to be
rotated 90 degrees counter-clockwise before being displayed on screen"

should be:

"This tells the image viewer program that the image needs to be rotated 90
degrees clockwise before being displayed on screen". Cheers.

~~~
ageitgey
Thanks!

------
theon144
>Most Python libraries for working with image data like numpy, scipy,
TensorFlow, Keras, etc, think of themselves as scientific tools for serious
people who work with generic arrays of data. They don’t concern themselves
with consumer-level problems like automatic image rotation — even though
basically every image in the world captured with a modern camera needs it.

What a snotty attitude. The tools are already complex enough to take on
responsibilities of parsing the plethora of ways a JPEG can be "rotated". This
thread is a testament to the non-triviality of the issue and I certainly don't
want a matrix manipulation or machine learning library to bloat up and have
opinions on how to load JPEGs just so someone careless out there can save a
couple lines.

------
kwhitefoot
If your computer vision application can't even recognize which way up a
picture like the ones in the article is then surely you have bigger problems
then EXIF orientation.

As others have said: what about pictures that are simply not aligned with the
horizon?

~~~
sicariusnoctis
All the top architectures are not rotation-invariant. I think some of the
blame lies in how successful CNNs are (as Hinton claims).

------
faceshapeapp
I ran into this issue while developing
[https://www.faceshapeapp.com](https://www.faceshapeapp.com) where user
uploads their photo and face detector is run on top of the picture. Shortly
after launching about 10% of users complained having rotated images, after
some debugging, I discovered that it was due to exif rotation.

You could parse the exif and rotate the image using canvas, but thankfully,
there's already a JS library which does it for you:
[https://github.com/blueimp/JavaScript-Load-
Image](https://github.com/blueimp/JavaScript-Load-Image).

------
NeoBasilisk
Would it really be too much of a burden for phones to just save photos at the
correct orientation now? I understand the hardware limitations that were
present in 2007, but surely these can't still be a factor?

~~~
ryandrake
Is there any good reason to save a photo in an orientation that does not match
the orientation that the device is being held in? Shouldn’t up be up? If that
results in a NxM photo, save it NxM. If it results in a MxN photo save it that
way!

The only edge case is a camera pointed straight up or straight down. Or a
camera in space.

~~~
TeMPOraL
Correct orientation is relative to the photographer, not to the ground. Most
of the times the photographer is in the usual, vertical orientation but
sometimes you really want to take a shot at an angle, and you have to fight
with your phone to do it correctly. I really dislike these kinds of
"convenience" optimizations.

~~~
uoaei
Coming back to ancient software design principles:

> _The user is always right_

Let them have a setting. It's really that easy.

------
jonshariat
Isn't it better to train the model that way though? In fact it might be good
to load every image in different orientations to add to the training data.

------
lolc
Only goes to show how far we have to go still when computers don't even tilt
their heads automatically when given rotated input.

------
mister_hn
Well, if you are using the input Image as JPEG. The information is stored in a
tag, you can remove it with any kind of language. There's also a C++ library
for it:
[https://github.com/mayanklahiri/easyexif](https://github.com/mayanklahiri/easyexif)

------
octorian
I should add that this very problem also exists with video. What makes it
worse is that "smartphone" video apps started regularly using video
orientation metadata years before desktop video apps even seemed to
acknowledge that it was a thing.

~~~
BEEdwards
It's the simplest way to do it and has so few downsides that it's not going to
change until someone invents a googly eye camera that rotates with the phone.

------
brlewis
As an alternative to the python code in the article, there's this command-line
tool:
[https://linux.die.net/man/1/exiftran](https://linux.die.net/man/1/exiftran)

------
superjan
I just could not stop laughing reading this. It does remind me, that my
professor once showed us a slide of the coastline of africa, which none of us
recognized until he rotated it to the correct orientation.

------
UglycupRawky
Who thought Exif Orientation was a good idea? It's a needless complication of
the file format. It's not like iPhones (or any camera) lacks the computing
power to rotate the image before saving it.

~~~
bradfa
Wikipedia says EXIF was released in 1995
([https://en.wikipedia.org/wiki/Exif](https://en.wikipedia.org/wiki/Exif)). If
you were shooting with a DSLR, say 6 Mega pixel with 8 bits per pixel, the raw
output would be 18MB in size
([https://en.wikipedia.org/wiki/Kodak_DCS_400_series](https://en.wikipedia.org/wiki/Kodak_DCS_400_series)).
In order to rotate this raw 6Mp image you would need 36MB of RAM (input and
output buffers of the same size, non-overlapping). Then, after the rotation
you could perform JPEG, so that the rotation is lossless. Finally, you could
store the JPEG image to disk.

36MB of RAM just for raw image buffers would have been quite expensive in
1995. Simply tagging some extra data onto the image to say which orientation
it should be presented in takes almost no extra memory or processing within
the camera, some big desktop PC could easily rotate the uncompressed JPEG to
perform a "lossless" rotation after the fact (ie: uncompress JPEG in wrong
orientation, rotate, present to user).

~~~
bradfa
Technically, you wouldn't need a full 18MB for the output buffer so long as
you perform the JPEG in-line with the rotation and are willing to deal with
slicing the image into swaths. So in theory you could get away with like a 1MB
output buffer but then your rotation time would depend on your JPEG timing and
you couldn't take another picture with the main raw buffer until rotation and
JPEG were both complete. It's a tradeoff, time versus memory.

------
pequalsnp
[https://docs.fast.ai/vision.transform.html](https://docs.fast.ai/vision.transform.html)

------
ensiferum
Maybe the first step before doing image classification or object detection
images is to do a image rotation detector ;)

------
rhizome
Sure, that would reduce the problem space by one, but that ain't the reason CV
doesn't work.

------
raveenb
some of the object detection ai systems do actually rotate the images in 4 to
8 ways like in fast.ai. when trained with such input data, the ai detects the
object in almost all orientations.

------
yellowapple
Sounds like the training sets should include more rotated samples.

------
moonbug
that or train models to be rotationally invariant.

------
secraetomani
Surely someone can create a neural network that reorients the image, even if
the Exif orientation is wrong (JPEG lets you can do this without re-encoding).
Sounds like it should be a very simple problem for the vast majority of
"regular" images.

~~~
machinelearning
Or how about just use the damn exif information to orient it correctly as the
article outlines? The actual article is far less interesting than what the
title seems to imply. While the title suggests a problem with computer vision.
This is more of a programmer logic error.

~~~
magicalhippo
There are many times my phone detects I'm holding it horizontally while taking
a picture, but I'm actually trying to take a vertical picture, and vice versa.

For those images, the exif information is technically correct, but actually
wrong.

------
machinelearning
Also known as "programmer error".

~~~
acdha
What value do you think this comment contributed? The entire article clearly
describes it as a commonly mistake and is trying to raise awareness so people
stop repeating it.

~~~
machinelearning
I think it contributed the warning that the title is clickbait. Its very clear
based on the discussion in the thread that people who've just read the title
are discussion computer vision and machine learning strategies to solve this
problem when the article describes an almost elementary programming issue.

------
egfx
Nonsense. Image recognition engines are very capable of detection in even at
the most extreme angles. Sure it won’t rotate the image for you but it will
certainly tell you there is a goose in it. What tricks image recognition
technology more then anything is lighting.

~~~
egfx
Let me clarify and say googles IR engine is capable. I actually ran the first
set of tests developed around image recognition technology. I worked at Neven
Vision which got acquired by google in 2006.

------
MrStonedOne
This article is all wrong.

At the core, there is something that is converting this jpeg or misc encoded
data into raw encoded data, and this process MUST account for the orientation.

Either the app is reading the image, converting it before passing it to the
CV/ML/AI library, and this conversion step _needs_ to respect this tag, and
either transfer the tag or apply it to the transformed object; OR, the
CV/ML/AI library is getting in encoded image data, and it needs to check for
this tag.

Those are the two options, either the CV/ML/AI library sees the tag, and
should consider it, or it doesn't, and the library that is stripping it away
shouldn't be doing that

~~~
laughinghan
This article helps people use the libraries they have, and it does so
correctly. If you want to fix the libraries, submit a PR, don't complain about
this article.

