
Facebook’s automatic alt-text for images - boyter
https://www.facebook.com/zuck/videos/10102762457220381/
======
hacker_9
Accurate, automatic descriptions of snapshots of peoples lives?? This is
surely sending shock waves down the data mining community. Additionally
Facebook are saying this was built as an aid for blind people, but this is
surely just a cover for being able to take targeted ads to the next level.

~~~
ma2rten
_This is surely sending shock waves down the data mining community._

Not really. From a technical point of view there is nothing impressive about
this, except that they spend the time and money to collect all the training
data. Also that it says "this image may contain:" tells you something about
how accurate this actually is.

~~~
lucaspiller
Tensorflow + Inception is about this accurate. For high level categories and
popular items it's pretty good, but if you want to dig deeper (e.g. identify
the type of animal in a picture) it'll need more training.

Google Photos is a lot smarter in that it can identify what a set of photos
contains. As an example I noticed all pictures of my graduation (which had no
metadata saying it was that) have been grouped into a 'Graduation' album. It's
also kind of scary...

~~~
r00fus
Did you have EXIF data on your pictures? Similar location + even a single
picture with an academic cap could easily provide enough information for
categorization...

------
speedyapoc
I always find Facebook's example feed to be funny since it's a completely
unrealistic depiction of what their site actually is for most users.

Just quickly looking at the top few posts in my feed, I see someone
celebrating their two year friendship with someone I don't know, one person
sharing a link to a new airplane, four people sharing videos, one person
liking a sponsored video, and finally one person updating their profile
picture.

I wish I could see actual updates from people instead of being kept abreast as
to what piece of third party content they've liked at some point in time, what
third party content they're sharing, etc.

~~~
manigandham
How is this unrealistic? This is exactly the kind of stuff my feed shows. The
people you follow probably aren't posting any updates (that you like to
see)... and sharing content _is_ an update, that's what that person decided to
post.

Your feed is what you make it.

~~~
NeutronBoy
Everytime I see someone complain that their feed is full of memes, junk, and
chain posts, I think 'Maybe you should just remove that person from your
Facebook?'.

------
ospfer
Last July, I lead a project in support of a federal agency to analyze current
business processes and identify weaknesses in the agency Section 508 office.
My work focused primarily on externally accessible internet sites and one of
the most common 508 violations that we encountered was the lack of ALT-text on
images. This agency utilized a number of automated scanning tools and
processes, but lacked any ability to efficiently remediate these errors. While
we never talked more than from a conceptual standpoint, a coworker and I
discussed something along the lines of what Facebook has accomplished here
through the use of Google's Neural Networks. Very cool to see this advancement
come to life.

~~~
dr_zoidberg
This kind of systems/algorithims also allows to asign a certain semantic
component to images (with a grain of salt, of course), which might enable
further developments that weren't considered posible yet.

Sadly, it also brings another complete set of cases to the oh-so-anoying "but
Facebook/Google/Twitter/Amazon does it!" clichés that we'll now have to deal
with...

------
verusfossa
I'm waiting for the day this is just a library you pass an image to and it
returns an array. No, not a SaaS. Then on my own pump.io, diaspora, redMatrix
etc. it just works. My data, my images, my network. I'm not against the tech
at all though. Neat

~~~
ma2rten
There are already pre-trained networks out there. TensorFlow comes with an
example command line tool that you can pass any image and it will tell you
what is in the image.

The classes that it can detect are from ImageNet, so that might be limiting.

~~~
dr_zoidberg
If you were to train your own net, ImageNet is one of the biggest and most
complete datasets and you'd surely use it for training. The alternative is to
make your own training set, which will cost you money and/or time. For a proof
of concept or initial prototype (until your business can pay for it), those
classes should be enough.

~~~
ma2rten
ImageNet is not going to have something like "smile" in it's dataset like they
showed in the video. It has all kinds of possible dog breads instead.

Maybe someone should create a website that lets volunteers label images for
this purpose.

~~~
dr_zoidberg
Yeah, in "Person, individual, someone, somebody, mortal, soul" there's the
deeper category "smiler" which contains the categories "smirker" and
"simperer". Some of the images are (note: my idea was to link to the images
directly, but it isn't loading, so I had to take a screenshot and upload to
imgur):

* [http://i.imgur.com/Wex6pSR.png](http://i.imgur.com/Wex6pSR.png)

Which I say is pretty much what you'd need to train a net to detect people
smiling (amongst other things). Of course, there are some refinements you can
make to improve accuracy and presentation of the results. My point was: you
should begin with datasets that are readily available, and then improve on
need (and if resources are available to justify the investment).

------
skrjon
There is more information on what Facebook is doing on the research site.

[https://research.facebook.com/blog/how-blind-people-
interact...](https://research.facebook.com/blog/how-blind-people-interact-
with-visual-content-on-social-networking-sites/)

Including a link to the publication that was written on the technology here.

[https://research.facebook.com/publications/how-blind-
people-...](https://research.facebook.com/publications/how-blind-people-
interact-with-visual-content-on-social-networking-services/)

I think its exciting and an honest attempt to make peoples lives better.

------
sidcool
With all due respect to conspiracies, this is a cool feature.

------
bla2
Warning, that page has an auto-play video with sound.

------
shogun21
This might be asking for too much, but why not use more of the image meta-data
than these computer vision techniques?

If I were blind, I really wouldn't care that this is an image of "two people,
smiling". Facebook has facial recognition, tagging, and locations. It would be
much more valuable to me to say "Peter and Laura smiling at Channel Islands
State Park."

~~~
visarga
They are just tagging, but there are ML solutions for describing in natural
language.

Demo: [http://googleresearch.blogspot.ro/2014/11/a-picture-is-
worth...](http://googleresearch.blogspot.ro/2014/11/a-picture-is-worth-
thousand-coherent.html)

------
chippy
I'd like to compare Facebook's image tagging with Google Cloud Vision API
[https://cloud.google.com/vision/](https://cloud.google.com/vision/) I think
it would be interesting to see which one is more accurate or verbose.

------
TazeTSchnitzel
I suppose it'll be like YouTube's automatic subtitles for audio. It'll do a
bad, but passable, job: at least the blind and visually impaired have _some_
idea of what the image contains.

------
whatever_dude
"Cat. Cats. Cat. Baby. Dog. Baby. Cat. Baby with dog."

~~~
visarga
Bag. Duck face. Nails. Duck face.

------
SimeVidas
Glad to see Mark explaining what a screen readers is to millions of people :-D

------
buro9
This is really what I wanted to use the Google Image API to do.

But it's _way_ too expensive.

All I wanted was keywords for alt-text, dimensions for placeholder, and the
dominant colour for placeholder background.

[https://cloud.google.com/vision/](https://cloud.google.com/vision/)

The price for that would be $7.50 per 1,000 images for the first million
images.

I have some 60,000 images on the site I run and don't happen to have $450 in
loose change laying around (the whole site costs less than that to run each
month).

I guess I don't care about alt tags that much.

~~~
fudged71
Re-upload them to facebook then scrape the generated descriptions ;)

------
cphoover
Is this tool open source? would be a great contribution to the accessibility
community.

------
tlrobinson
Very cool.

Obvious next step: build this into the OS/browser/screen reader.

------
Spearchucker
I do wonder when somebody uses this technology to troll.

~~~
Crespyl
The techniques already exist[0] for some sophisticated trolling, though it may
be hard to achieve in practice without direct access to the classifier being
used.

[0] [http://karpathy.github.io/2015/03/30/breaking-
convnets/](http://karpathy.github.io/2015/03/30/breaking-convnets/)

------
nickysielicki
that's honestly very creepy

------
odinduty
Ah, the Twitter app for Android (beta version) recently added a feature that
allows you to add a description to pictures you upload for impaired people.

------
d33
Obligatory: [https://www.youtube.com/watch?v=_wXHR-
lad-Q](https://www.youtube.com/watch?v=_wXHR-lad-Q)

It's impressive that people don't really connect the dots and see that as a
huge threat to their freedom.

