
Create an algorithm to distinguish dogs from cats - willis77
https://www.kaggle.com/c/dogs-vs-cats
======
chris_mahan
All it needs is a robot that says: "Come Here Boy, Come! That's a good
doggie."

If the animal comes, it's a dog. If it continues without looking at you, it's
a cat.

~~~
josefresco
Lots of good cat jokes here. One thought would be to automatically upload the
images to Reddit and gather the # of upvotes. The higher the upvotes the more
likely it's a cat.

/joke session

~~~
ktr100
Feed a dog and it thinks you're god. Feed a cat and it thinks IT is god:

if (human.feedanimal() == true) { animal.type = "Dog"; human.name = "GOD";
}else{ animal.type = "Cat"; animal.name = "GOD"; }

~~~
dragonwriter
> Feed a cat and it thinks IT is god

Incorrect. The cat doesn't depend on your validation of its status; if you
feed it, you just increase the chance that it thinks you are a subject worthy
of its time and attention.

~~~
ktr100
good point

------
GotAnyMegadeth
When we can get computers to tell the difference between animals accurately we
can make a real life pokedex app. I can't wait.

EDIT: If anyone one thinks we can start working on this now, I'm game.

~~~
wf
Oh my god you're right, wow, I need to start working on this now and fulfill
my dream to be the very best...

EDIT: I would definitely be interested in building something like this.
iOS/mobile app? I have basic experience in ML and have written an ANN in C++
to classify letters (they were 'pixelated' images, 1 and 0's).

~~~
creatio
If I saw this post earlier maybe I would taken more AI route with my classes.

~~~
eli_gottlieb
Ha! If I hadn't thought ML was a fad used for making book recommendations on
Amazon I wouldn't be sitting here kicking myself for never learning AI/ML
techniques.

Bloody ML Summer...

------
apu
The sample images are of two types: images which are mostly of the subject
(cat or dog), and images which have a cat or dog in them, but are not
necessarily focused on them.

In computer vision, these two types of images are traditionally handled
separately. First, a detector for a class (like "dog" or "cat") is run across
the image at all locations and multiple scales to find _where_ the things are.
Once you have the locations, then an image classification algorithm is run for
each detection window to either confirm it, or to give you more information
about the object.

The latter often takes the form of giving more fine-grained category
information, such as what species of dog/cat it is. Both leafsnap [1] and
dogsnap [2] take the form of this type of program; i.e., they both assume that
you've captured a single subject, roughly centered in the photo window, and
that you already know that it's a plant/dog.

Sometimes you don't have to run a detector even if the object is not the focus
of the image, if the context/setting can narrow down the answer for you. For
example, if you were deciding between dogs and airplanes, it would be pretty
unlikely to see a dog on a runway or a plane in a living room, so just by
classifying the entire image, you can do reasonably well. That's not the case
here, as dogs and cats will, for the most part, appear in pretty similar
environments.

So if I were attacking this problem, I'd first see how many images were of the
non-focused type. If not many, I'd basically ignore them and focus on building
a classification system. Note also that if you're constrained to make a hard
choice between only two classes, that's a much easier problem than a more
open-ended "what is this?"

As many have pointed out, deep learning approaches seem to be the current
state of the art on classification tasks such as these. But deep learning
requires a lot of training data to be effective. A procedure I've been hearing
many people use to great success is to use the Imagenet [3] hierarchy and
images to train a deep learning classifier (i.e., as if you were going to
compete in the Imagenet Large Scale Visual Recognition Challenge [4]). Then
use the trained network, chop off the last stage (which makes the final
prediction), and replace it with an SVM trained on your specific training
data. In this way, you'd be using the network only as a feature extractor.

I'm happy to try and answer other questions.

[1] [http://leafsnap.com](http://leafsnap.com) or see my project page for more
details on how it works:
[http://homes.cs.washington.edu/~neeraj/projects/leafsnap/](http://homes.cs.washington.edu/~neeraj/projects/leafsnap/)

[2]
[https://itunes.apple.com/app/dogsnap/id532468586?mt=8](https://itunes.apple.com/app/dogsnap/id532468586?mt=8)

[3] [http://www.image-net.org/](http://www.image-net.org/)

[4] [http://www.image-net.org/challenges/LSVRC/2013/index](http://www.image-
net.org/challenges/LSVRC/2013/index)

~~~
unoti
As so often happens, the top rated comment on HN is more interesting than the
article itself. Leafsnap and Dogsnap are so awesome!

------
dvt
I think that if I were to do this, I would use facial landmark recognition
(using something like a Haar classifier). Haar-like features have been used to
aid in (human) facial recognition since 2001 to great success[0]. And
recently, people have been thinking about using similar methods for animal
tracking[1].

If one _could_ locate the face in the test set, she could also presumably find
some landmarks of interest: eyes, nose, mouth, etc. Considering that dogs
typically have longer snouts, cats have pointier ears, etc, this data could be
used to differentiate between a dog and a cat. There would be difficulty
dealing with awkward angles and bad lighting though.

[0]
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.35...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.3549)

[1]
[http://www.eng.auburn.edu/~troppel/internal/sparc/TourBot/To...](http://www.eng.auburn.edu/~troppel/internal/sparc/TourBot/TourBot%20References/Haar/2000186.pdf)

~~~
apu
Haar wavelets are most useful for _detecting_ faces (drawing a rectangle
around the entire face). They are not very good for locating landmarks on the
face. Also, they tend to be much more sensitive to the orientation of the face
than other features, so modern face detectors are often composed of multiple
independent detectors, each specialized for different pose angles.

Standard computer vision features like HOG (histograms of oriented gradients)
or SIFT will probably do much better, or the deep learning features others
have mentioned.

Your larger point of adapting a face detectors for animal use is well taken,
though probably overkill for simply saying "dog" or "cat". You need that level
of detail to identify which breed (e.g., this is the approach that dogsnap
takes), but not for the base distinction.

The other way to go would be to train a deformable parts model (DPM) detector
[1] for dogs and cats. DPMs are the current state of the art in detecting
objects, e.g. as measured on the pascal VOC benchmark [2].

[1]
[http://www.cs.berkeley.edu/~rbg/latent/](http://www.cs.berkeley.edu/~rbg/latent/)

[2]
[http://pascallin.ecs.soton.ac.uk/challenges/VOC](http://pascallin.ecs.soton.ac.uk/challenges/VOC)

------
joe_the_user
"Hey Cool challenge dude, any relation to AI? Didn't think so..."

or

"You too could solve this problem, a get a Phd and joined that overcrowded
labor market"

Just consider that if you have M categories and you have N Phd students who
can each four years to create one clever algorithms to distinguish category i
from category j, then you need M(M-1) Phd students for a complete
classification system - which when you consider many, _many_ categories there
are in human knowledge, works out to being more than can even be pumped out by
excess student loans today and exponentially more than can find tenured
positions.

IE, once you'd add to the "deep but not wide" algorithms of computer vision,
And twenty years ago, we might have believed this adding-to would lead to
something broad and general but it's been twenty years and the trend is
becoming clear.

See:

[https://news.ycombinator.com/item?id=6401026](https://news.ycombinator.com/item?id=6401026)

------
bayesianhorse
Easy: Put videos of the animal on youtube and
[http://www.cuteoverload.com](http://www.cuteoverload.com) and count the
upvotes.

To quote @BigDataBorat (Twitter): 90% of data is unstructure. Furthering
analysis reveal that 60% of unstructure data is cat video.

------
yaddayadda
While it isn't specific to dogs and cats, nor open source or publicly
available, doesn't Google already have this ability? \-
[https://encrypted.google.com/search?tbm=isch&q=dogs&tbs=imgo...](https://encrypted.google.com/search?tbm=isch&q=dogs&tbs=imgo:1)
\-
[https://encrypted.google.com/search?tbm=isch&q=cats&tbs=imgo...](https://encrypted.google.com/search?tbm=isch&q=cats&tbs=imgo:1)

edit: I'm sure some of theirs is from metadata, but I thought I read a while
back that they were doing some graphical identification also.

~~~
habosa
There is definitely some graphical identification. You should try Google+
image search (if you have any images on there), it's really incredible. I
searched "water" on my friend's images and got pictures of water glasses, the
ocean, etc. None of the pictures had comments or metadata. Also worked
searching for things like "soccer", got a bunch of pictures of him playing
soccer.

------
wojzaremba
I can bet for $1000 that winning team is going to use Convolutional neural
networks. Anyone willing to bet (I can bet also for smaller amount if you
prefer)?

~~~
kdavis
The "state of the art" they reference is SVM's trained on color and texture
features.

Pre deep belief network I'd agree with your guess on convolutional neural
networks. However, now I'd guess you'd use a deep belief network to create a
network that would pick out better features than those picked out "by hand" in
the convolutional neural network. (See for example [1][2])

So my money would be on some deep belief network.

[1] Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning
algorithm for deep belief nets. Neural Computation, 18:1527-1554.

[2] Building high-level features using large scale unsupervised learning
arXiv:1112.6209

~~~
wojzaremba
So far as it comes to large datasets unsupervised learning doesn't work ! You
better off training initially discriminatively your network on imagenet, and
then switch to this cat vs dog training. Rather, than do unsupervised
learning.

~~~
MrMan
program it in Lush then.

everyone here found out about Deep Neural Networks and that is all they know.

------
dllthomas
What would it say about hyenas?

------
brainless
I understand Kaggle wants someone to make an algorithm to "identify the
entity", but if used as an alternative to CAPTCHA, is it not possible to
defeat this HIP (Human Interactive Proof) by reading the image and the
classification data from the same Petfinder.com and just do image matching?

It may take some time to match from 3 million images, but doable right? Or am
I missing something here?

------
primaryobjects
I just gave it a try and submitted a program. I scored 64% accuracy. Currently
in 4th place, but I'm sure that won't last for long.
[http://www.kaggle.com/c/dogs-vs-
cats/leaderboard](http://www.kaggle.com/c/dogs-vs-cats/leaderboard)

------
silveira
A captcha of 8 characters has a space of ~26^8 (~208 billions) possible
combinations in a brute force attack. To divide a set of 12 images between
dogs and cats has a space of 2^12 (4096) possible combinations in a brute
force attack.

------
moe_
i heard that before,
[http://www.bbc.co.uk/news/technology-18595351](http://www.bbc.co.uk/news/technology-18595351)

~~~
willis77
Sure, they may have solved the cat problem, but the well-documented challenges
of "pug face" and "slobber smudging" makes dog recognition an order of
magnitude harder. Some say the Clay Institute is pondering a $1M prize for it.

------
ameoba
Clever way to crowdsource your spambot's CAPTCHA breaking routines.

------
jlengrand
Too bad it's just for swag. I'd have given it a shot :D

------
phogster
Anyone else compete on these types of sites? Are they worth it?

~~~
yankoff
Have been playing with their contests since I finished ML course on coursera.
I think they worth it, pretty fun and addictive, plus a very good way to
practice your machine learning/data mining skills. Community there is very
good and helpful.

------
gnarbarian
OpenCV plus a ton of training data should do the trick.

------
robodale
I have a cat. I know it's a cat, because she bites me when I don't let her
outside, when I let her back inside, when I brush her, when I don't brush her,
etc, etc.

------
cwoods
I would have expected that putting this through a machine learning algorithm(
or one of the face recognition ones) trained with a very huge dataset might
improve the odds.

~~~
yelnatz
That's the whole point of kaggle. ;)

Fighting your machine learning algorithm against everyone else's. Best one
wins the prize.

There's a lot more "interesting" competitions from different companies of
course.

[https://www.kaggle.com/competitions](https://www.kaggle.com/competitions)

~~~
Maxious
This competition is the first "Playground" one just for fun
[http://blog.kaggle.com/2013/09/25/the-
playground/](http://blog.kaggle.com/2013/09/25/the-playground/)

------
segmondy
So narrow and so useless. What exactly are dogs? Almost all cats look the same
and are almost the same size. But dogs? Dogs vary greatly in size, and looks.
some of what we have accepted as dogs today, if you take them back to the past
before TV/Computers, people back then won't recognize them as dogs, because of
the looks or size. They would have to hear it back and behave like a dog to
classify it as such. if all they had was a picture, they mgiht very well
refuse and reject say pugs as dogs. so an algorithm to distinguish dogs from
cats without context (behaviour, sound) will be more difficult.

~~~
VladRussian2
>if you take them back to the past before TV/Computers, people back then won't
recognize them as dogs, because of the looks or size. They would have to hear
it back and behave like a dog to classify it as such. if all they had was a
picture, they mgiht very well refuse and reject say pugs as dogs.

my dog without using TV or computer (at least to my knowledge as i don't know
what he is up to when we are not at home) easily recognizes other dogs of all
the different breeds and sizes from the distance like across the street,
etc...

~~~
rohansingh
To be fair this is usually based on scent.

~~~
fractallyte
From my experience, it's based on body language. So: not the superficial
appearance of the dog, but its movement and 'greeting' signals.

And, of course, cats move in a _very_ different way to dogs!

