
The ImageNet dataset transformed AI research - ozdave
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
======
komali2
>Matthew Zeiler built Clarifai based off his 2013 ImageNet win, and is now
backed by $40 million in VC funding.

I used Clarifai at a hackathon a couple months ago. It was certainly
impressive, but I didn't really see the difference between it and Google Cloud
Services AI whatever offerings, other than Google seemed to have a better
handle on deployment and production scaling.

I guess I'm wondering how companies like Clarifai and the couple other similar
"AI API cloud service" companies intend to take on behemoths like Google?

~~~
goberoi
I tried out the top 5 computer vision API vendors last year. See my findings,
along with examples of output from all of them here:
[https://goberoi.com/comparing-the-top-five-computer-
vision-a...](https://goberoi.com/comparing-the-top-five-computer-vision-
apis-98e3e3d7c647)

At the time, Clarifai was the "best" one (I caveat by quoting because this was
a for a small corpus, with subjective results, not a real train-test cycle). I
re-ran the results about a month ago (linked to from the post), and found that
Google and others have continued to invest and improve.

~~~
contingencies
Great overview. Clarifai is certainly extremely impressive.

Do you play the tablas? My wife and I studied sitar but our instrument was
destroyed by the movers in our latest relocation to Shenzhen :( The tabla
teacher where we studied was able to play a very complex taal while chewing
betel nut and rolling his eyes back in their sockets, immediately switch to a
pitch, bend and time-perfect rendition of 'pink panther' melody, then switch
back to a very complex taal without skipping a beat. Brilliant to see.

~~~
goberoi
Yeah, Clarifai did well. I am keen to learn how well their custom model
feature works. Per their FAQ[0], you only need supply 20-50 images per
concept. That seems remarkable to me, given that a concept like 'cow' has
~1500 images on Imagenet[1]. Perhaps they are using some sort of transfer
learning to facilitate this? I.e. using a pretrained model, and then only
retraining the last few fully connected layers, or retraining parts of the
entire network?

I am not a deep learning practitioner, but would be curious to know from
experts how their custom model feature might work; and from any of their users
on how well it actually does.

Tablas: haha, great description of your teacher. I do play, with enthusiasm,
but poorly. For those in Seattle, there is an amazing teacher who teaches up
on Cap Hill [2].

[0] [http://help.clarifai.com/custom-training/custom-training-
faq](http://help.clarifai.com/custom-training/custom-training-faq)

[1] [http://image-net.org/synset?wnid=n01887787](http://image-
net.org/synset?wnid=n01887787)

[2] [http://www.acitseattle.org](http://www.acitseattle.org)

~~~
cmarschner
It is not necessary to train things from scratch; you take the largest
imagenet model available and fine tune it for the task. This way it reuses
much of the lower layers the have seen lots of data.

------
jjawssd
Never forget about the most important person in all recent neural network
object recognition research: Mark Everingham.

Mark was the key member of the VOC project, and it would have been impossible
without his selfless contributions.

[https://www.computer.org/csdl/trans/tp/2012/11/ttp2012112081...](https://www.computer.org/csdl/trans/tp/2012/11/ttp2012112081.pdf)

[http://www.bmva.org/w/doku.php?id=obituaries:mark_everingham](http://www.bmva.org/w/doku.php?id=obituaries:mark_everingham)

Mark Everingham passed away in 2012, bringing the VOC project to an end.

------
relate
Shameless plug: we have a workshop today @ CVPR 2017 that explores the
classification task of ImageNet using crawled (and noisy) data:
[https://www.vision.ee.ethz.ch/webvision/workshop.html](https://www.vision.ee.ethz.ch/webvision/workshop.html)

In contrast to ImageNet, only the validation and test data were manually
labeled. Impressively, on such data, an accuracy of ~95% (top5) is achieved by
the winning solution this year.

------
Maven911
ImageNet is the big one on image recognition, is there other AI/ML/DL
competitions that people have heard of ?

~~~
jonbaer
The Loebner (on the NLP side) this year should be interesting -
[http://www.aisb.org.uk/events/loebner-
prize](http://www.aisb.org.uk/events/loebner-prize) \- on the AI planning side
usually the StarCraft competitions -
[https://sscaitournament.com](https://sscaitournament.com) \- interested to
see how DeepMind performs.

------
contingencies
Tangential anecdote: Incidentally the article mentions that ImageNet was
conceived through WordNet[0]. I actually downloaded WordNet again last night
with the idea to relocate qualifying adjectives from certain noun phrases in a
system to a separate list of qualifiers, ie. "adj1 adj2 noun" ->
{"noun",["adj1","adj2"]}

What I found was that WordNet has crap coverage, eg. large numbers of real
adjectives were unlisted. In my ancient Chinese translation work (hobby), I
have basically concluded that Wiktionary[1] is the best thing going, and also
easiest to contribute corrections and background data to. So I thought "why
not adapt English Wiktionary data[2] to produce an English adjective list?".
This proved far more complicated than a quick hack, since the only way the
data seemed to be supplied was as a huge XML file mixing Wiki and XML
markup.[3] In the end I downloaded a Wiktionary-derived dataset known as
dbnary[4] and parsed wordlists out of that, resulting in a satisfying list of
103,125 English adjectives. Total time invested, under 2 hours.

[0] [http://wordnet.princeton.edu/](http://wordnet.princeton.edu/)

[1] [http://en.wiktionary.org/](http://en.wiktionary.org/)

[2] [https://dumps.wikimedia.org/backup-
index.html](https://dumps.wikimedia.org/backup-index.html)

[3]
[https://dumps.wikimedia.org/enwiktionary/20170720/enwiktiona...](https://dumps.wikimedia.org/enwiktionary/20170720/enwiktionary-20170720-pages-
articles.xml.bz2)

[4] [http://kaiko.getalp.org/about-dbnary/](http://kaiko.getalp.org/about-
dbnary/)

~~~
PeterisP
BabelNet [http://babelnet.org/](http://babelnet.org/) or Open Multilingual
Wordnet
([http://compling.hss.ntu.edu.sg/omw/](http://compling.hss.ntu.edu.sg/omw/))
might be interesting for you.

However, the prime use case of WN is as an _verified_ inventory of _machine
readable senses_ , providing structured information about the semantic
attributes of the word by providing the semantic links to other words/senses.
Wictionary will tell you that a word is an adjective, but it doesn't tell a
machine what does that adjective mean, how it relates to other adjectives.

E.g., the wiktionary entry on 'corgi' has a few sentences that mention that
it's a type of dog, but it's not structured enough for this information to be
used, and if you just extract it yourself without manual verification then it
won't have high accuracy; WN, on the other hand, has an explicit structured
link (hypernym) between corgi and dog.
([http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=...](http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=corgi&i=2&h=100#c))

~~~
contingencies
Interesting though those features were not required for yesterday's
application. Now getting on to something more syntax / grammar interpretation
related (ie. to a basic level only, parse-the-structure-of-the-sentence - this
part is a "verb phrase", that part is a "noun phrase", etc.) ... any idea what
a good software library would be? I found
[https://spacy.io/](https://spacy.io/) \+
[https://github.com/emilmont/pyStatParser](https://github.com/emilmont/pyStatParser)
\+ [http://spes.sourceforge.net/](http://spes.sourceforge.net/) \+
[https://stackoverflow.com/questions/2699646/how-to-get-
logic...](https://stackoverflow.com/questions/2699646/how-to-get-logical-
parts-of-a-sentence-with-java/2703107#2703107)

~~~
PeterisP
Spacy is an option; for syntactic parsing, you might also want to take a look
at Stanford CoreNLP
([https://stanfordnlp.github.io/CoreNLP/](https://stanfordnlp.github.io/CoreNLP/))
or NLTK ([http://www.nltk.org/](http://www.nltk.org/)) or SyntaxNet
([https://github.com/tensorflow/models/tree/master/syntaxnet](https://github.com/tensorflow/models/tree/master/syntaxnet)).

------
unityByFreedom
Looking forward to when someone makes a significantly sized, fully labeled
medical dataset. Currently, the main thing holding us back on diagnostics
using machine learning isn't the software or hardware- it's the lack of
labeled data, which is costly to prepare because it requires technologists to
work alongside doctors. Still, the results could be huge.

I think machine learning through medicine will save more lives than self
driving cars.

~~~
PeterisP
It's _highly_ problematic due to privacy issues. It would be a nightmare to
get a public release for the underlying unlabeled data, especially since we
know how well deanonymization can work.

~~~
unityByFreedom
There already is some that is publicly available, like the Kaggle lung cancer
competition's dataset.

I'm not saying it's easy on any front. It is _possible_ , and it would be
hugely beneficial to humans to have such a dataset. Raising awareness about
the problem may be one step towards solving it.

~~~
nonbel
The problem with a lot of medical labeling is the labels are not very
consistent/accurate to begin with. Just go to three doctors with the same
symptoms and see what they say...

~~~
unityByFreedom
That is surmountable. You can use annotator agreement, and, algorithms don't
need perfectly labeled data in order to improve.

Any effort to create additional medical datasets would probably help enhance
machine learning diagnostics.

The data that you want to be perfect is the test set, which is smaller than
training sets.

------
eanzenberg
"Li said the project failed to win any of the federal grants she applied for,
receiving comments on proposals that it was shameful Princeton would research
this topic, and that the only strength of proposal was that Li was a woman."

Jesus..

~~~
scottLobster
Stories like this are part of the reason I never went into academia. It's
amazing how many highly educated people are judgmental and inflexible, and in
academia one person like that can sink you. As an engineer in the private
sector, worst case scenario I can find a better team/company to work with.

------
bra-ket
FYI Kaggle is hosting all 3 Imagenet challenges now:
[https://www.kaggle.com/c/imagenet-object-detection-from-
vide...](https://www.kaggle.com/c/imagenet-object-detection-from-video-
challenge)

------
dmix
Not just a dataset but hacking community and contest:

> Matthew Zeiler built Clarifai based off his 2013 ImageNet win, and is now
> backed by $40 million in VC funding.

On a side note, this looks like a fun company to build: hacking on ML,
exposing it via APIs, developers as customers, paid for on a resource basis
like a VPS style per-use payment plan.

It's interesting how many of these AI startups are raising huge rounds. The
words AI seems to draw them in. At least this one has an obvious business
model.

~~~
eli_gottlieb
I was went to an AI talk at MIT CSAIL (the same Vicarious one I mentioned in
another thread). Me and a guy got talking on the way out. He figured I seemed
pretty clever, so he started asking me what sorts of companies to invest in.

Turned out he was an investor with some VC firm. I did a little mention of the
place I worked at the time (which isn't an AI company, but wanted to be when
it was young), but didn't have a business card to give him at he time.

I was actually kinda surprised an investor was attending a talk and saying,
quite so uncritically, "hmmm, who do the smart people at talks say I should
invest with?"

------
nightcracker
Is it possible to hold an image classification challenge without secret data
that doesn't lead to overfitting?

~~~
Govindae
Yeah, just specify the test set and ask everyone not to train on it.

------
ouid
imagenet is a truly terrible dataset. Any image recognition task that humans
are not >99.9% on is not an image recognition task.

~~~
PeterisP
99.9% is an _extremely_ high bar. People aren't machines, people make errors.
If you asked them to tell whether a square is black or white, you would get
less than 99.9%.

~~~
ouid
99.9% is a very low bar for image recognition. if one out of every thousand
times you saw a car you misclassified it as a chair, you would be dead.

~~~
mturmon
It sounds like you may not have experience in this domain? You're benchmarking
an active, time-based task (a person "seeing a car" \- implicitly allowing eye
saccade, multiple images, context, depth) versus a static task ("classify this
single image based on no context"). They are not comparable.

~~~
ouid
have you actually tried the imagenet task yourself?

The reason imagenet is popular is because it has allowed google to claim that
they can beat human performance on image recognition by artifically lowering
human performance on image recognition.

