Hacker News new | comments | show | ask | jobs | submit login
The ImageNet dataset transformed AI research (qz.com)
198 points by ozdave 11 months ago | hide | past | web | favorite | 55 comments

>Matthew Zeiler built Clarifai based off his 2013 ImageNet win, and is now backed by $40 million in VC funding.

I used Clarifai at a hackathon a couple months ago. It was certainly impressive, but I didn't really see the difference between it and Google Cloud Services AI whatever offerings, other than Google seemed to have a better handle on deployment and production scaling.

I guess I'm wondering how companies like Clarifai and the couple other similar "AI API cloud service" companies intend to take on behemoths like Google?

I tried out the top 5 computer vision API vendors last year. See my findings, along with examples of output from all of them here: https://goberoi.com/comparing-the-top-five-computer-vision-a...

At the time, Clarifai was the "best" one (I caveat by quoting because this was a for a small corpus, with subjective results, not a real train-test cycle). I re-ran the results about a month ago (linked to from the post), and found that Google and others have continued to invest and improve.

Great overview. Clarifai is certainly extremely impressive.

Do you play the tablas? My wife and I studied sitar but our instrument was destroyed by the movers in our latest relocation to Shenzhen :( The tabla teacher where we studied was able to play a very complex taal while chewing betel nut and rolling his eyes back in their sockets, immediately switch to a pitch, bend and time-perfect rendition of 'pink panther' melody, then switch back to a very complex taal without skipping a beat. Brilliant to see.

Yeah, Clarifai did well. I am keen to learn how well their custom model feature works. Per their FAQ[0], you only need supply 20-50 images per concept. That seems remarkable to me, given that a concept like 'cow' has ~1500 images on Imagenet[1]. Perhaps they are using some sort of transfer learning to facilitate this? I.e. using a pretrained model, and then only retraining the last few fully connected layers, or retraining parts of the entire network?

I am not a deep learning practitioner, but would be curious to know from experts how their custom model feature might work; and from any of their users on how well it actually does.

Tablas: haha, great description of your teacher. I do play, with enthusiasm, but poorly. For those in Seattle, there is an amazing teacher who teaches up on Cap Hill [2].

[0] http://help.clarifai.com/custom-training/custom-training-faq

[1] http://image-net.org/synset?wnid=n01887787

[2] http://www.acitseattle.org

It is not necessary to train things from scratch; you take the largest imagenet model available and fine tune it for the task. This way it reuses much of the lower layers the have seen lots of data.

Can you share any thoughts on which would be best for a computer vision newbie and programming novice to get started playing with? Or are none of them really great for that?

These aren't really great for learning about computer vision or deep learning, but are great for building projects that require image classification.

E.g., recall the awesome project for lego sorting by jacquesm[0] ? He built his own model using Keras and Tensorflow, but you may be able to achieve similar results by using Clarifai's feature to train your own models with no understanding of deep learning. This is great if your goal is to build a thing, like a lego sorter, but not so much if you want to learn how to build a state of the art image classifier.

If you're interested in learning about computer vision or deep learning, I recommend searching this site to find threads that cover that extensively. Good luck!

[0] https://jacquesmattheij.com/sorting-lego-the-software-side

API.AI was acquired by Google. I think they nailed the agent design interface.

This is the age of AI infrastructure.

IBM, Apple, Amazon, Facebook, Google will all buy these types of companies until they have what they need in house. Other bigger FXXX companies may buy a few as well if customer pilots go well.

I don't think they will buy too many of them though, seems like Google/Facebook are actually leading this area, and very few companies can compete with them, even research wise, let alone startups. Big companies have: data(if not they can afford massive data annotation), machines(thousands of GPUs connected), and other infrastructures. They are the Data monopolies.

The point of this kind of startup is not to sell the tech, but to assemble a team of highly capable, experienced engineers. The big corporations will then buy out the startup just to get the talent. This is called acqui-hiring.

The problem is...Google/Facebook,etc, have the best people, and others are following them, why would they need to acqui-hire someone who are worse than what they already have? For less prestigious companies though, those small companies might still be attractive, however.

Nothing, as far as I can tell, clarifai isn't doing anything novel or pushing research into the field. It looks like an ego-project for their CEO.

They launched almost two years before tensorflow and google vision APIs were released. At the time there were only a few labs doing deep learning research so an API seemed like the best way to make the technology accessible to more people.

Publishing research papers is a great way to get acquired but it's not the best use of your time if you're a small startup that's trying to make money. They still do research but it makes it into the product instead of getting published.

AI is not for startups. Not until we have sufficiently reduce the demand for data and computational resources. Or you need to sell a product, like a car or camera, not merely an API.

Maybe custom / bespoke model training, plus they can offer a lot better customer support than Google. A sucker is born every minute...

Google allows you to train models, and I engaged with customer support when I used their google cloud platform. I agree their other services have terrible customer service, but it seems like when you start paying out, it actually exists.

They also offer a search API and develop custom models for larger customers.

Never forget about the most important person in all recent neural network object recognition research: Mark Everingham.

Mark was the key member of the VOC project, and it would have been impossible without his selfless contributions.



Mark Everingham passed away in 2012, bringing the VOC project to an end.

Shameless plug: we have a workshop today @ CVPR 2017 that explores the classification task of ImageNet using crawled (and noisy) data: https://www.vision.ee.ethz.ch/webvision/workshop.html

In contrast to ImageNet, only the validation and test data were manually labeled. Impressively, on such data, an accuracy of ~95% (top5) is achieved by the winning solution this year.

ImageNet is the big one on image recognition, is there other AI/ML/DL competitions that people have heard of ?

The Loebner (on the NLP side) this year should be interesting - http://www.aisb.org.uk/events/loebner-prize - on the AI planning side usually the StarCraft competitions - https://sscaitournament.com - interested to see how DeepMind performs.

On NLP side, many subfields tend to run annual shared task competitions, e.g. SemEval (http://alt.qcri.org/semeval2017/index.php?id=tasks) or TAC (https://tac.nist.gov/) for semantic analysis, WMT (http://www.statmt.org/wmt17/index.html) for machine translation, etc etc.

(Disclaimer: I work at Nexar)

Nexar just released Nexet, a large dataset of 55K road images (5K of which are labeled). I think it's an interesting one as the images were captured by mobile phones (setup as dashcams), so you get "real world" data. Of-course there's an accompanying competition coming up: https://www.getnexar.com/challenge-2/

Kaggle, The Netflix Challenge, CASP

The non-profit platform https://www.crowdAI.org currently hosts a NIPS 2017 challenge as well as a genetic challenge, and constantly adds new challenges. Btw, it's completely open source, and keen on having contributors to help build the open challenge platform the community wants the most.

There's WIDERFace(http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/index.html) which is specific to facial recognition.

MS COCO, although i think that's part of the imagenet contest now.

slightly different, but wanted to add numer.ai to the great list others have commented

Tangential anecdote: Incidentally the article mentions that ImageNet was conceived through WordNet[0]. I actually downloaded WordNet again last night with the idea to relocate qualifying adjectives from certain noun phrases in a system to a separate list of qualifiers, ie. "adj1 adj2 noun" -> {"noun",["adj1","adj2"]}

What I found was that WordNet has crap coverage, eg. large numbers of real adjectives were unlisted. In my ancient Chinese translation work (hobby), I have basically concluded that Wiktionary[1] is the best thing going, and also easiest to contribute corrections and background data to. So I thought "why not adapt English Wiktionary data[2] to produce an English adjective list?". This proved far more complicated than a quick hack, since the only way the data seemed to be supplied was as a huge XML file mixing Wiki and XML markup.[3] In the end I downloaded a Wiktionary-derived dataset known as dbnary[4] and parsed wordlists out of that, resulting in a satisfying list of 103,125 English adjectives. Total time invested, under 2 hours.

[0] http://wordnet.princeton.edu/

[1] http://en.wiktionary.org/

[2] https://dumps.wikimedia.org/backup-index.html

[3] https://dumps.wikimedia.org/enwiktionary/20170720/enwiktiona...

[4] http://kaiko.getalp.org/about-dbnary/

BabelNet http://babelnet.org/ or Open Multilingual Wordnet (http://compling.hss.ntu.edu.sg/omw/) might be interesting for you.

However, the prime use case of WN is as an verified inventory of machine readable senses, providing structured information about the semantic attributes of the word by providing the semantic links to other words/senses. Wictionary will tell you that a word is an adjective, but it doesn't tell a machine what does that adjective mean, how it relates to other adjectives.

E.g., the wiktionary entry on 'corgi' has a few sentences that mention that it's a type of dog, but it's not structured enough for this information to be used, and if you just extract it yourself without manual verification then it won't have high accuracy; WN, on the other hand, has an explicit structured link (hypernym) between corgi and dog. (http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=...)

Interesting though those features were not required for yesterday's application. Now getting on to something more syntax / grammar interpretation related (ie. to a basic level only, parse-the-structure-of-the-sentence - this part is a "verb phrase", that part is a "noun phrase", etc.) ... any idea what a good software library would be? I found https://spacy.io/ + https://github.com/emilmont/pyStatParser + http://spes.sourceforge.net/ + https://stackoverflow.com/questions/2699646/how-to-get-logic...

Spacy is an option; for syntactic parsing, you might also want to take a look at Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/) or NLTK (http://www.nltk.org/) or SyntaxNet (https://github.com/tensorflow/models/tree/master/syntaxnet).

Looking forward to when someone makes a significantly sized, fully labeled medical dataset. Currently, the main thing holding us back on diagnostics using machine learning isn't the software or hardware- it's the lack of labeled data, which is costly to prepare because it requires technologists to work alongside doctors. Still, the results could be huge.

I think machine learning through medicine will save more lives than self driving cars.

It's highly problematic due to privacy issues. It would be a nightmare to get a public release for the underlying unlabeled data, especially since we know how well deanonymization can work.

There already is some that is publicly available, like the Kaggle lung cancer competition's dataset.

I'm not saying it's easy on any front. It is possible, and it would be hugely beneficial to humans to have such a dataset. Raising awareness about the problem may be one step towards solving it.

The problem with a lot of medical labeling is the labels are not very consistent/accurate to begin with. Just go to three doctors with the same symptoms and see what they say...

That is surmountable. You can use annotator agreement, and, algorithms don't need perfectly labeled data in order to improve.

Any effort to create additional medical datasets would probably help enhance machine learning diagnostics.

The data that you want to be perfect is the test set, which is smaller than training sets.

"Li said the project failed to win any of the federal grants she applied for, receiving comments on proposals that it was shameful Princeton would research this topic, and that the only strength of proposal was that Li was a woman."


Stories like this are part of the reason I never went into academia. It's amazing how many highly educated people are judgmental and inflexible, and in academia one person like that can sink you. As an engineer in the private sector, worst case scenario I can find a better team/company to work with.

I found getting halfway decent funding since 1980 to be quite a chore. - Alan Kay (2016)

... from my fortune clone @ http://github.com/globalcitizen/taoup

I picked up on that too. It's the biggest failing "high up" (their own arse) many academics have.

FYI Kaggle is hosting all 3 Imagenet challenges now: https://www.kaggle.com/c/imagenet-object-detection-from-vide...

Not just a dataset but hacking community and contest:

> Matthew Zeiler built Clarifai based off his 2013 ImageNet win, and is now backed by $40 million in VC funding.

On a side note, this looks like a fun company to build: hacking on ML, exposing it via APIs, developers as customers, paid for on a resource basis like a VPS style per-use payment plan.

It's interesting how many of these AI startups are raising huge rounds. The words AI seems to draw them in. At least this one has an obvious business model.

I was went to an AI talk at MIT CSAIL (the same Vicarious one I mentioned in another thread). Me and a guy got talking on the way out. He figured I seemed pretty clever, so he started asking me what sorts of companies to invest in.

Turned out he was an investor with some VC firm. I did a little mention of the place I worked at the time (which isn't an AI company, but wanted to be when it was young), but didn't have a business card to give him at he time.

I was actually kinda surprised an investor was attending a talk and saying, quite so uncritically, "hmmm, who do the smart people at talks say I should invest with?"

Is it possible to hold an image classification challenge without secret data that doesn't lead to overfitting?

Yeah, just specify the test set and ask everyone not to train on it.

imagenet is a truly terrible dataset. Any image recognition task that humans are not >99.9% on is not an image recognition task.

A truly terrible dataset that leads to AlexNet/VGG/Neural Style Transfer/GoogleNet/ResNet, and facilitate research so much, is still a good dataset. Actually no, it is a phenomenal dataset. Not to mention, it is free and open to all.

You can make assessment even if you are not an expert, but I would think you need more efforts to backup your claim a little bit to convince people and avoid downvoting.

You mean you can't personally identify all 120 dog breeds in imagenet?

99.9% is an extremely high bar. People aren't machines, people make errors. If you asked them to tell whether a square is black or white, you would get less than 99.9%.

99.9% is a very low bar for image recognition. if one out of every thousand times you saw a car you misclassified it as a chair, you would be dead.

It sounds like you may not have experience in this domain? You're benchmarking an active, time-based task (a person "seeing a car" - implicitly allowing eye saccade, multiple images, context, depth) versus a static task ("classify this single image based on no context"). They are not comparable.

have you actually tried the imagenet task yourself?

The reason imagenet is popular is because it has allowed google to claim that they can beat human performance on image recognition by artifically lowering human performance on image recognition.

You should know that for a lot of menial tasks machines are more accurate because there is no boredom, and boredom breeds mistakes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact