So far Deep Learning has made this Xkcd obsolete: https://xkcd.com/1425/ Check w...

simias · on May 17, 2017

Can a computer algorithm reach a human-level of skill in identifying birds in a photo in all situations? That sounds like a very hard problem indeed.

For instance could an algorithm conclusively identify the birds in all of these pictures without having too many false positives?

https://ak3.picdn.net/shutterstock/videos/6087611/thumb/1.jp...

http://www.hippoquotes.com/img/impact-of-nature-quotes-in-fr...

http://www.birdsasart.com/baacom/wp-content/gallery/cache/17...

https://vztravels.files.wordpress.com/2014/05/img_1390.jpg

http://www.zastavki.com/pictures/1920x1200/2011/Animals_Bird...

This is not a rhetorical question by the way, I genuinely don't know the state of the art in this field. If it's indeed possible to do that today I'll be extremely impressed.

julius · on May 17, 2017

Detection accuracy is fine. We are actually at a point where NNs can make photos :)

Convert string:"this small bird has a pink breast and crown, and black primaries and secondaries." into a photo.

https://www.youtube.com/watch?v=rAbhypxs1qQ

simias · on May 17, 2017

That's impressive, but I'll point out that the bird photos in this video are all clean, well focused close ups, that's probably easier to process than random pictures.

If you wanted a general algorithm working on non-curated data (like tagging facebook photos for instance) I'm sure it would be significantly harder.

nl · on May 17, 2017

Check out the (deliberately blurry) examples in https://arxiv.org/pdf/1703.05393.pdf where it can distinguish between blurred, low resolution pictures of different types of crows.

It's only ~50% accuracy, but the photos are terrible. Much worse than Facebook pics.

OTOH, this is classification into hundreds of classes, not millions like in the case of FB face recognition. (Although of course that can use the connectivity graph as a filter on that too).

Smerity · on May 17, 2017

This is all doable today. As an example, check out some bird photos[1] from the Visual Genome[2] project that are similar to your examples. I selected the photos and hosted them on Imgur in hopes we don't kill Visual Genome with traffic ;) The systems to do this today are not highly efficient or without flaws but it can certainly be done.

The research group I am part of, Salesforce Research (formerly MetaMind), have a model that does this "accidentally" - and there's even an example image of a bird[3]! The model is only meant to provide a caption for an image, not to segment the image into the various objects, but learns to "focus" on the bird as part of describing the image. For those particularly interested, check out the paper "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"[5].

Systems made specifically to segment an image into objects would obviously do far better. For an example of that, check out "CRF as RNN - Semantic Image Segmentation Live Demo"[4]. There are many more systems of this style floating about.

[1]: http://imgur.com/a/1UPnn

[2]: http://visualgenome.org/

[3]: http://imgur.com/vbX8NNZ

[4]: http://www.robots.ox.ac.uk/~szheng/crfasrnndemo

[5]: https://arxiv.org/abs/1612.01887

TheOtherHobbes · on May 17, 2017

I think you underestimate the problem, which is not to get an output that says "Bird", but one that says "Specific breed of bird."

Human experts can get enough clues from the bird shape and the context to do that in the sample photos. I doubt your captioning system can.

This is a good example of a standard problem in ML - underestimating the complexity of the problem domain.

You could argue that your system only needs to do the simpler task to be useful, and that's likely true. But if the goal is to approach human expert levels of classification, it needs to improve by at least a few levels.

I suspect getting it there would run into some interesting performance constraints, and possibly some theoretical issues too.

nl · on May 17, 2017

No, ML is very, very good at doing breeds. See, for example https://arxiv.org/pdf/1603.06765.pdf which gets 88.9% accuracy on the Stanford Dogs dataset, and 84.3 on the Caltech Birds dataset.

These are way better than anything a non-expert human can do. For example, it can distinguish between the Rhinoceros Auklet and the Parakeet Auklet.

I'm not sure what expert performance is, but around 94% is where humans top out on most tasks.

Also, the parent poster knows what they are talking about: https://www.semanticscholar.org/author/Stephen-Merity/337544...

chrisan · on May 17, 2017

How far away is ML from identifying everything in a picture?

For example if we could take a photo of Noah's Ark loading up every animal?

Do you just loop through each NN you have on each species?

nl · on May 17, 2017

A single NN can predict more than one class of object. The ImageNet competition has 20,000 classes.

There's also image segmentation as another poster has pointed to.

In the case of FB face tagging, they'd have learn an embedding space for faces, and when a new image comes in they'd place it in the embedding space along with all the person's connections and find the nearest neighbors.

See https://arxiv.org/abs/1503.03832 or the implementation https://cmusatyalab.github.io/openface/

rjtavares · on May 17, 2017

This seems to be what you are looking for: https://code.facebook.com/posts/561187904071636/segmenting-a...

mnx · on May 17, 2017

The problem posed in the xkcd is "Check if the photo is of a bird", not identify the bird in question. As far as identifying the bird species, that would probably be harder, because I'm guessing there's very few human experts who could reliably do that across a wide spectrum of species, and without knowing the context of the photo.

ardivekar · on May 17, 2017

RESnet in 2016 achieved 97% accuracy on the ImageNet challenge with hundreds of classes. I think that's near human accuracy.

tchalla · on May 17, 2017

It DID take a research team and ~ 5 years to get there? :)

randomsearch · on May 17, 2017

Others have pointed out that this problem has not been solved in the general case.

More importantly, the progress that has been made in recent years actually builds very heavily on work since the early 1990s, so not only is it not complete, what has been achieved took a great deal longer than 5 years.

pixl97 · on May 17, 2017

Well, because we had to invent the massively parallel GPU in between those times. Essentially work that would take an entire supercomputer cluster in the 90s can be done on my desktop with 4 high end GPUs stuck in it.

Now that we are in the range of having the correct hardware the whole "it's taking decades issue" will go away.

randomsearch · on May 17, 2017

GPGPU was definitely part of the success of recent years, but there was also a lot of experimentation and hard work carried out e.g. on CNN designs. Lots of trial and error. That took a lot of time. There have been fundamental changes in the structure and training of NNs that have helped bring the step-change in success.

nyamhap · on May 17, 2017

There was a previous discussion about this nearly 2 years ago on hn:

https://news.ycombinator.com/item?id=10239401

seren · on May 17, 2017

Interesting, it shows how things can move quicker than expected, even when judged by enlightened people.

Is there a way to know the date (1425) was released ?

emiliobumachar · on May 17, 2017

There's an army of Machine Learning researches out there. Arguably a single team would have taken much longer, even on a narrow task.

spatulan · on May 17, 2017

24th September 2014

bryanrasmussen · on May 17, 2017

Programmers often have Scotty syndrome and pad their time estimates to look like geniuses when they solve ahead of schedule.

undergrowth54 · on May 17, 2017

> to look like geniuses when they solve ahead of schedule

No, it is because estimating software tasks is difficult, the penalty for underestimating is that people think you are dishonest/flakey, and there isn't anywhere to get an education in how to do it well. The default advice given to junior engineers is therefore: "take your intuition and triple it." I hate that this is the state of the industry. My interactions around estimation over the past 5 years since uni have literally made me feel nauseated and near fainting on multiple occasions. I would love for Joel or Klamezius or Uncle Bob or someone else to fix it and produce a good course on how to create estimates.

UK-AL · on May 17, 2017

There isn't a way. It's an issue that blights everyone in the industry.

Probably the best your going get is the book "Software Estimation: Demystifying the Black Art "

Even applying those techniques you get it wrong.

Most experienced software companies have adopted agile, and accept reductions in scope to meet deadlines as something that happens.

bragh · on May 17, 2017

Agreed, agile seems the only way, but does indeed require experienced managers. A lecturer once pointed out that business/normal people would always expect some kind of point estimate, they are never satisfied with some kind of distribution or interval. Personally, I would say that this is even more sad than that: the point estimates are always taken at the extreme values, which ever suits the person wanting the estimate more, never the average value.

Of course, all this leads to bad blood between techies and business side: how long will it take? -> probably about 3 weeks, but this requires using a library we haven't used before, so in the worst case even 2 months -> what? so long? get it done in 4 days, this is required the next week -> no, that's not really possible -> make it happen -> it happens and it either sucks when it's delivered at all, so the deadline gets extended anyway to iron out all the bugs or it causes lots of problems in the future.

seren · on May 17, 2017

For very long projects, I have seen much delay because of feature creep.

"OK you have implemented it as requested, but finally the customer does not like it, it needs to be slightly different. Can you do it quickly?"

Sometimes it is easy to adapt, sometimes next to impossible.

josefx · on May 17, 2017

Or when you allow for hilarious false positives/negatives. Sometimes birds are birds, sometimes they are cats and cats are birds, sometimes they are dogs. Everything is possible with the right training set and machine learning.

nyamhap · on May 17, 2017

https://news.ycombinator.com/item?id=10239401

efaref · on May 17, 2017

By the looks of http://explainxkcd.com/1425 it was September 2014.

Beltiras · on May 17, 2017

First copy in Waybackmachine is Oct. 2. 2014. https://web.archive.org/web/20141101000000*/https://xkcd.com...

Ended · on May 17, 2017

24 Sep 2014 (hover over the title in https://xkcd.com/archive/). So, less than 5 years ago!