
Deep-learning algorithm predicts photos’ memorability at “near-human” levels - Libertatea
http://news.mit.edu/2015/csail-deep-learning-algorithm-predicts-photo-memorability-near-human-levels-1215
======
stkni
Uploading a URL of white noise from Wikipedia gave it a high-memorability
(0.82) with lots of areas of interest. Secondly uploading a purely white image
[2] produced high areas of interest in the top corners and a mediumly
interesting image (0.62).

I thought the tests might reveal something useful, like the eye-tracking heat-
maps of Jakob Nielsen [3] but I'm not convinced.

[1] [https://upload.wikimedia.org/wikipedia/commons/f/f6/White-
no...](https://upload.wikimedia.org/wikipedia/commons/f/f6/White-noise-
mv255-240x180.png)

[2] [http://images.all-free-
download.com/images/graphiclarge/plai...](http://images.all-free-
download.com/images/graphiclarge/plain_white_background_211387.jpg)

[3] [https://www.nngroup.com/books/eyetracking-web-
usability/](https://www.nngroup.com/books/eyetracking-web-usability/)

~~~
svantana
A machine-learned system is only as good as its training data, at best. In
this case, it was trained on natural-looking images, so results on "unnatural"
images will be unpredictable/random/wrong.

One way to fix this would be to provide "bug bounty"-style rewards for
producing images that makes the system deviate significantly from mechanical
turk workers performing the same task. I wouldn't be surprised to see
google/fb etc starting such programs in the near future, as their ML systems
reach maturity.

------
nicklo
The applications for this in the advertising industry are exciting and a tad
bit scary.

I can imagine a feedback system that takes the output of this algorithm and
modifies the image slightly according to the error gradient to optimize the
MEM-score of the brand/item being advertised in the photo. Then it could feed
the new image back in and repeat like the Deep-Dreaming algorithm.

side-note: I'm so happy that CSAIL is finally embracing deep learning. I'm an
undergrad at mit and this semester was the first that deep learning was a
major part of both the computer vision class and the nlp class.

~~~
arethuza
In another thread, someone mentioned a mobile app that paused ad videos if you
weren't looking at the device. Maybe the approach you mention could be applied
in a similar way - to morph images in ads until you _do_ look at them.

~~~
danso
This problem has been mostly solved. You don't need a deep learning algorithm
to detect a face. There's an entire company that has sprouted from online
education by providing solutions for the proctoring of exams (the webcam is
turned on, and facial movement that suspiciously turns towards offscreen areas
is flagged). I'm not even sure the phone's camera would have to be on to do
the mobile app detection you describe.

~~~
arethuza
That's what I meant - the application that detect whether you are looking at
an ad is already out there. What I meant was that rather than simply pausing
the video it could change the image iteratively until you _do_ look at the
image.

------
danso
I don't see this being of huge use to most media campaigns in which a human
editor is involved...though perhaps it could be one of several first-pass
filters used to go through a digital photographer's memory card to and filter
out the weakest images (personally, as a photographer, a much simpler tool
that can weed out obviously blurry or unfocused images would be much, much,
much more useful than something that tells me what I hope I already know and
enjoy doing: picking out my favorite photos)

However, this algorithm would be immediately useful for people who need to
auto crop photos in a way more intelligently than "just fit this dimension and
ratio"...but this function has been implemented to some degree by various
other computer vision systems, such as Microsoft's Projext Oxford
[https://www.projectoxford.ai/vision](https://www.projectoxford.ai/vision)

------
ioeu
> For each image, the algorithm produces a heat map showing which parts of the
> image are most memorable. By emphasizing different regions, they can
> potentially increase the image’s memorability.

Most memorable, according to human subjects subjective thought on the matter?

> The team then pitted its algorithm against human subjects by having the
> model predicting how memorable a group of people would find a new never-
> before-seen image. It performed 30 percent better than existing algorithms
> and was within a few percentage points of the average human performance.

Who's to say human subjects are any good at objectively judging how memorable
a photo is? I feel like I'm missing something.

Edit: Riight, I guess it could be based on observing neural activity in human
subjects while they look at photos. That makes a lot more sense.

~~~
florianletsch
Not based on human subject's opinion but on measured performance:

> The images had each received a “memorability score” based on the ability of
> human subjects to remember them in online experiments.

~~~
ioeu
In retrospect that seems much simpler than what I proposed in my edit, thanks.

------
ZeroGravitas
The point they seem to be going out of their way to avoid is whether humans
are any good at this task. Anyone know if they are?

~~~
visarga
They don't ask humans how memorable a photo is. They experiment on humans
memory to measure it.

~~~
ZeroGravitas
I was going to quote the bit from the article, but on re-reading it, it's
possibly just mangled English in the article that's confused me.

------
cheriot
Forget marking, I just want to spend less time reviewing travel photos.

~~~
knughit
Google Photos "Stories" already does this.

------
eveningcoffee
These are interesting bits:
[http://memorability.csail.mit.edu/download.html](http://memorability.csail.mit.edu/download.html)

[https://people.csail.mit.edu/khosla/papers/iccv2015_khosla.p...](https://people.csail.mit.edu/khosla/papers/iccv2015_khosla.pdf)

------
ioeu
> While deep-learning has propelled much progress in object recognition and
> scene understanding, predicting human memory has often been viewed as a
> higher-level cognitive process that computer scientists will never be able
> to tackle

Seriously?

Has it seriously often been viewed like that?

~~~
eli_gottlieb
The AI Effect: if a computer can do it, it must not be thinking.

The Reverse AI Effect: if it's thinking, then surely it can't be done by a
computer.

Because Humans Are Special.

------
pseud0r
Sounds like maybe this could be used for deep-learning networks to learn what
to pay attention to. The memorable regions should be the salient or important
regions in the image. Basically the more you are paying attention, the more
memorable something is.

------
pmontra
Maybe all the world is uploading pictures right now but after several minutes
the site is still "Computing..." the one I uploaded. It's only me? I tried
with Firefox and Opera.

------
dang
Also
[https://news.ycombinator.com/item?id=10747490](https://news.ycombinator.com/item?id=10747490).

~~~
danso
There's an interesting tidbit there:

> _The latest version of MemNet is available online. Being an amateur cat
> photographer myself, I decided to give this a try. Apparently, the most
> memorable part of Mr. Tango Tangerine’s face is his left ear_

The cat photo is pretty ordinary, as far as photos of cats taken by their
loving owners go...though I could imagine why the algorithm behaved the way it
did, id be interested in hearing anyone try to argue that the algorithm picked
something remotely relevant to the human experience. I mean, if the ear were
deformed or on fire, sure...but it's not interesting in any way, even if you
take the tack of "we'll all cats look the same anyway so no one will remember
the cats face"

That said, a huge kudos to the MIT researchers for not only open sourcing
their work, but releasing a straightforward REST API to make it easy for
anyone to test out their algorithm.

