1) Firefox respects the Orientation tag using a standard  CSS tag, but Chrome doesn't 
2) That videos sometimes use "Rotation", not "Orientation" (and Rotation is encoded in degrees, not the crazy EXIF rotation enum). Oh, and some manufacturers use "CameraOrientation" instead, just for the LOLs 
3) That the embedded images in JPEGs, RAW, and videos sometimes are rotated correctly, and sometimes they are not, depending on the Make and Model of the camera that produced the file. If Orientation or Rotation say anything other than "not rotated," you can't trust what's in that bag of bits.
 https://exiftool-vendored.js.org/interfaces/makernotestags.h... (caution, large page!)
I'm interested in your PhotoStructure application, just subscribed to the beta!
By default it just writes the new orientation to an XMP or MIE sidecar. The downside of this approach is that most applications don't respect sidecars.
Not technically degrees. MP4 encodes it as a matrix. (Now, there are matrices corresponding to degree transformations...)
(The output of `exiftool -Rotation example.mp4` would be great!)
The "Matrix Structure" is in the 'mvhd' atom.
[Some of the best documentation for MP4 that I've seen on the web is from Apple, since it grew out of QuickTime.]
I really don't understand why it was decided that most photo viewing applications would honor EXIF rotation, but web browsers would not.
Many years ago I accidentally deleted all the metadata from many images that needed rotating.
> Copy only comment markers. This setting copies comments from the source file but discards any other data that is inessential for image display.
> The default behavior is -copy comments.
Argh! Thanks! So also my "lossless" transformations .. weren't.
When you play it back on a phone (with auto-orientation mode on), starting from holding the phone in portrait mode (as you normally do):
* it starts playing back as portrait, which looks fine
* the video rotates (because the camera was physically rotated), so now you're watching a widescreen video that's 90 degrees off
* Your natural reaction is to flip the phone 90 degrees to make down "down" again, but this changes the phone into widescreen mode, and because it thinks it's playing a portrait-style video, it changes to portrait-in-widescreen mode, and now the video is again tilted 90 degrees but 1/3 the size with huge black bars on either side
If you play it back on a computer/TV, you get the same end result: a widescreen video that's rotated 90 degrees, and 1/3 the size with huge black bars on either side.
And uptime seconds (!?)
And estimated distance to subject, AGPS information, GPS acquisition time, depth field metadata, current battery level, operating system version, ...
I've been looking for a replacement for Google Photos. It's the only thing keeping me on Google products at this point.
1. Become one with the data.
The first step to training a neural net is to not touch any neural net code at all and instead begin by thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns. Luckily, your brain is pretty good at this. One time I discovered that the data contained duplicate examples. Another time I found corrupted images / labels. I look for data imbalances and biases. I will typically also pay attention to my own process for classifying the data, which hints at the kinds of architectures we’ll eventually explore. As an example - are very local features enough or do we need global context? How much variation is there and what form does it take? What variation is spurious and could be preprocessed out? Does spatial position matter or do we want to average pool it out? How much does detail matter and how far could we afford to downsample the images? How noisy are the labels?
Plus an animated version that includes a T-Rex .
But it’s not always straightforward to “look at data”
Do just a few bad predictions skew your score? What does the best prediction look like? What does the worst look like?
Are all your results just shifted by 2 pixels to the left due to some bug? Are there mislabeled examples in the test set? Etc. etc.
Practitioners frequently do cursory data analysis and data exploration to gain insight into the data, corner cases and which modeling approaches are plausible.
Just to give some examples, Bayesian Data Analysis (Gelman et al), Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill), Doing Bayesian Data Analysis (Kruschke), Deep Learning (Goodfellow, Bengio, Courville), Pattern Recognition and Machine Learning (Bishop) and the excellent practical blog post  by Karpathy all list graphical data checking, graphical goodness of fit investigation, descriptive statistics and basic data exploration as critical parts of the model building workflow.
If you are seeing people produce models without this, it’s likely because companies try to have engineers do this work, or hire from bootcamps or other sources that don’t produce professional statisticians with grounding in proper professional approach to these problems.
When people mistakenly think models are commodities you can copy paste from some tutorials, and don’t require real professional specialization, then yes, you get this kind of over-engineered outcome with tons of “modeling” that’s disconnected from the actual data or stakeholder problem at hand.
People wonder why it’s hard and expensive to hire ML engineers... because they actually solve these problems with craft. Meaning, they systematically grow understanding of the data, start with simple models, and have well articulated reasons explaining cases when complexity is justified.
Did you look at the actual data?
Then they are wrong. Find a way to cheaply visualize data/state and call me back when you're done.
I made an image upload widget that provided a preview, and when users selected the "take a picture" option on their phones, I showed the preview with a blob link and CSS background-image property. The images were showing sideways on some phones.
I looked at the EXIF data of those photos and of course Orientation: 90 showed up.
It was easy to fix on the backend when processing the images but I struggled to do it in a performant way on the front-end. One solution involved reading the EXIF data with JS and rotating with canvas and .toBlob(), but proved too slow for large 10MB photos as it blocks the main UI thread.
One thing I thought of is just reading the orientation and the using CSS transforms to rotate the preview, but I never got around to trying it.
Basically entirely because of this ridiculous issue where chrome refuses to respect exif tags.
Wait, I know its a big company, but you're at the same company...
Rather than reencode the video just to preview it, I can just apply the same transforms (scale, rotation, translation) to the view that's displaying the preview. I then mask that view with another view of the same size so it doesn't go outside the edges.
Of course I still need to encode the video with those transforms if I want it to show up in their camera roll later.
NO, please do not try web workers. I don't want any more running on my device than absolutely necessary.
> The more cores you use the lower the CPU frequency can be, which saves power, since frequency increases do not use power linearly.
More cores in use has little to do with frequency and more to do with heat. More heat means more thermal throttling which lowers frequency. Lower frequency means that the CPU doesn't sleep sooner.
Yup. That's exactly why I don't want them. Why should I execute something which doesn't, and shouldn't, have anything to do with rendering page content?
Do you also buy single core computers to save power?
That way you'd only need to read the EXIF data, but wouldn't need to go all the way to rotating the pixels yourself.
At my previous job I did the same thing, although I never noticed a significant slowdown. I also made the file size smaller since we wanted to have predictable upload times and mitigate excessive usage of storage space.
The other reason was that the EXIF data was wierd on some devices and the back end library didn't rotate them correctly.
Oddly it seems like it wasn't as big a problem until cell phones. ( I honestly couldn't tell which way is up for some images, being abstract.)
we ended up using JS library called "croppie", but I'm not sure it helps with large images.
convert input -auto-orient output
mogrify -auto-orient *
"But the tricky part is that your camera doesn’t actually rotate the image data inside the file that it saves to disk."
Cameraphones are so powerful these days. I don't understand why the app can't simply include a behavior setting to always flip the original pixels.
(I've not developed any camera apps, so this is just a guess!)
With JPEG, no, 90 degree rotations can be accomplished in a lossless manner.
See jpegtran from the libjpeg library. A version of its manpage is here:
"... It can also perform some rearrangements of the image data, for example turning an image from landscape to portrait format by rotation.
jpegtran works by rearranging the compressed data (DCT coefficients), without ever fully decoding the image. Therefore, its transformations are lossless: there is no image degradation at all, which would not be true if you used djpeg followed by cjpeg to accomplish the same conversion. ..."
You'll notice that an orientation sensor is no where in that list. So what happens is the camera hardware spits out a JPEG. The app then combines it with the orientation sensor & produces the EXIF headers. It could choose to decode, rotate, re-encode, but that's slow (~100ms) and hurts shot-to-shot latency. And it loses quality. And, hey, since everything supports EXIF orientation anyway, why bother?
Or it could simply rotate without decoding or re-encoding, which has the added advantage of being lossless.
Obviously it's still added processing time and (probably more importantly) development time, so it's generally not worth bothering, however it's important to point out that JPEG rotation can (in the case of 90 degree increments) be done losslessly.
This is a major oversight for the companies who develop camera apps. The major ones even have whole teams dedicated to that single app.
Lots of other complaints in this thread and also something I have encountered on Android with a small side feature allowing users to upload some pictures.
It would probably be better to deal with this in an elegant way, e.g. set up the algorithm to work regardless of orientation. This seems like a (mostly) solved problem: https://d4nst.github.io/2017/01/12/image-orientation/
Of course using exif data for such rotations is easier, but a tilted picture of a tilted sign can create a lot of tilt that human vision copes with fine but an orientation dependent network cannot.
Also, making an algorithm for detecting rotated images should be easy if it affect the results so much.
Is it a duck or is it a rabbit?
If your dataset consists of nothing but isolated 'd's and 'p's in unknown orientation, you won't be able to classify them correctly because that is an impossible task. But it would be more common for your dataset to consist of text rather than isolated letters, and in the context of surrounding text it's easy to determine the correct orientation, and therefore to discriminate 'd' from 'p'.
Incidentally, how does that work for mirroring, when all that surrounding text gets mirrored too? (Consider the real example of the lolcats generated by Nvidia's StyleGAN, where the text captions are completely wrong, and will always be wrong, because it looks like Cyrillic - due to the horizontal dataflipping StyleGAN has enabled by default...)
[edit: https://www.reddit.com/r/dataisbeautiful/comments/aydqig/is_... tried it out]
i.e. you should be training your image application with skew/stretch/shrink/rotation/color-pallete-shift/contrast-adjust/noise-addition/etc. applied to all training images if you want it to be useful for anything other than getting a high top-N score on the validation set.
Regardless, the article isn't really shifting blame, in so much as explaining what's happening in the real world, with the real tools. The tools don't care about EXIF. Consumer software uses EXIF to abstract over reality. A lot of people playing with ML don't know about either.
I think this is closer to the performance we should expect of current neural models - a few months old child, not an adult. NNs may be good at doing stuff similar to what some of our sight apparatus does, but they're missing additional layers of "higher-level" processing humans employ. That, and a whole lot of training data.
(1) Find the ImageNet (ILSVRC2012) images possessing EXIF orientation metadata.
(2) Of those images, find which ones have the "correct" EXIF orientation.
The last time I measured ImageNet JPEGs with EXIF orientation metadata, the number of affected images was actually quite small (< 100, out of a dataset of 1.28M).
There are also some duplicates, but altogether it seems fairly "clean."
Your quote: "This tells the image viewer program that the image needs to be rotated 90 degrees counter-clockwise before being displayed on screen"
"This tells the image viewer program that the image needs to be rotated 90 degrees clockwise before being displayed on screen".
What a snotty attitude. The tools are already complex enough to take on responsibilities of parsing the plethora of ways a JPEG can be "rotated". This thread is a testament to the non-triviality of the issue and I certainly don't want a matrix manipulation or machine learning library to bloat up and have opinions on how to load JPEGs just so someone careless out there can save a couple lines.
As others have said: what about pictures that are simply not aligned with the horizon?
The only edge case is a camera pointed straight up or straight down. Or a camera in space.
> The user is always right
Let them have a setting. It's really that easy.
36MB of RAM just for raw image buffers would have been quite expensive in 1995. Simply tagging some extra data onto the image to say which orientation it should be presented in takes almost no extra memory or processing within the camera, some big desktop PC could easily rotate the uncompressed JPEG to perform a "lossless" rotation after the fact (ie: uncompress JPEG in wrong orientation, rotate, present to user).
For those images, the exif information is technically correct, but actually wrong.
For example, if I hold the phone parallel with the ground (aimed at the floor), what is the correct EXIF orientation?
Edit: the link also closed for me a forgotten open loop about the meaning of an annoying error message Windows XP era. Thanks for making me look it up!
He also has a very useful utility (I normally use it to check DCT table) called JPEGSnoop .
At the core, there is something that is converting this jpeg or misc encoded data into raw encoded data, and this process MUST account for the orientation.
Either the app is reading the image, converting it before passing it to the CV/ML/AI library, and this conversion step needs to respect this tag, and either transfer the tag or apply it to the transformed object; OR, the CV/ML/AI library is getting in encoded image data, and it needs to check for this tag.
Those are the two options, either the CV/ML/AI library sees the tag, and should consider it, or it doesn't, and the library that is stripping it away shouldn't be doing that