

Images to Text – Toronto Deep Learning Demos - benanne
http://deeplearning.cs.toronto.edu/i2t

======
JacobEdelman
Looks amazing. The fact that its just returns the "Cannot connect to server of
image2text models" makes me very sad.

------
YoukaiCountry
So far I keep getting the error "Cannot connect to server of image2text
models"

Anyone having any luck?

~~~
bootynuke
I think it must be getting slammed; I was able to get a couple of descriptions
out of it, but that was balanced by probably 2 times as many instances of the
above error.

------
finin
[http://www.skunkieacres.com/images/rabbit_box.jpg](http://www.skunkieacres.com/images/rabbit_box.jpg)

A picture of a rabbit in a wooden box => "a cat looking into a bin full of
apples"

Mistaking a rabbit for a cat is not too bad. A bin is like a box, I suppose.
I'm not sure where the apples came from.

~~~
thomasahle
Perhaps it's been trained with pictures of apples in boxes...

------
tly_alex
Rekognition API released similar image to text API and it's much more reliable
than this. At least the demo works smooth and response fast.
[https://rekognition.com/demo/concept](https://rekognition.com/demo/concept)

~~~
teraflop
Even leaving aside the reliability issue (which can be chalked up to the fact
that this one is a demo of a non-commercial project that got overloaded),
you're comparing two entirely different things.

Check out the "static demo" pages, e.g.
[http://www.cs.toronto.edu/~nitish/nips2014demo/results/79133...](http://www.cs.toronto.edu/~nitish/nips2014demo/results/791338571.html)

For this image, the University of Toronto software generates sentences like "a
cow is standing in the grass by a car", whereas Rekognition only produces a
ranked list of categories. ("sports_car", "car_wheel", etc.)

EDIT: this is an even better example:
[http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407...](http://www.cs.toronto.edu/~nitish/nips2014demo/results/89407459.html)
I'm cherry-picking the cases where the algorithm does well, of course. But
even if it's unreliable, the fact that this works at all is impressive.

~~~
modeless
The errors are fascinating. "a cow and a car are looking at the camera." "a
band plays a group of music [...]". You could almost call them metaphors
instead of errors.

~~~
vonnik
what a lovely way of thinking about it.

------
tonydiv
We are using this research to help people learn languages in VR.

Take a look here: [http://learnimmersive.com](http://learnimmersive.com)

------
CardinalAgnelo
Doesn't look to be designed for a lot of traffic, be gentle.

------
misiti3780
Very cool:

Comment: If you click on source code right now it gives me to javascript
alerts that were trying to print out JSON objects.

------
vonnik
I'm curious to hear how much this is read as a sign of strong AI.

------
cmyr
My brief survey suggests that their training sample did not include very much
hardcore pornography.

"a man and a girl are learning to play with a small pool", while poetic, is a
stretch in this case.

~~~
JacobEdelman
Already after 1 hour of this being posted on hn... Reminders abound of how
evolution only made us good tool makers to help us to reproduce more.

~~~
dzordzduan
This is why I love hn.

