
Descriptive Camera: A camera that prints a description, not an image - ColinWright
http://mattrichardson.com/Descriptive-Camera/?src=twitter
======
edw519

      -----------------------------------
      | |                             | |
      | |    This is a picture of     | |
      | |  30-60% off top brand name  | |
      | |    shoes like Nike, Vans,   | |
      | |    Adidas, Toms, and more.  | |
      | |   Call 1-800-TURK-GAMER to  | |
      | |   find your perfect pair.   | |
      | |                             | |
      | ------------------------------- |
      |                                 |
      |                                 |
      -----------------------------------

~~~
nitrogen
Why does the human race have to have griefers? Why can't we all just get along
and have cool sci-fi tech without people trying to game it?

~~~
drostie
Well, let's draw a distinction here. The above isn't griefing, it's simple
capitalism. It exists for some reason other than aggravating people and/or
wasting their time. You might also have griefers who would put out text-
photographs like:

    
    
         ________________________________
        /                                \
        |+------------------------------+|
        ||  This is a picture of how I  ||
        ||  just want to tell you how   ||
        ||  I'm feeling, gotta make you ||
        ||  understand, never gonna     ||
        ||  give you up, never gonna    ||
        ||  let you down, never gonna   ||
        ||  turn around and desert you, ||
        ||  never gonna make you cry,   ||
        ||  never gonna say goodbye,    ||
        ||  never gonna tell a lie and  ||
        ||  hurt you.                   ||
        |+------------------------------+|
        |                                |
        \________________________________/

~~~
RollAHardSix
Except that might not be used by 'griefers' instead it could be a love note
between two lovers.

'Greifers' is just a subjective term depending on which chair you're sitting
in to watch the performance.

------
ghshephard
Amazing - the meta concepts behind the "Descriptive Camera" are limited only
by the imagination, the resolution of the sensing devices, the
ubiquity/bandwidth of the data pipe, and the ability to manage/train the
backend human workforce and maintain a consistent level of quality. I'm pretty
certain, that within 3-5 years, a billion dollar company will emerge using the
basic concepts captured by the "descriptive camera." That is, a remote sensing
device feeding data into a human-backed work-management queue, and returning
some type of structured higher-level processing of the remote-sensing device's
data in close to real-time.

Just off the top of my head:

    
    
      o Flora/Fauna identification
      o Scene analysis (Car Crashes/Building Wreckages)
      o Medical analysis
      o Substance analysis
      o Structural integrity review
      o Online Translators.  Today.
    

I'm pretty certain companies would be more than happy to pay $1000+ for 3-4
hours of a real-time translator than worked through a smartphone for business
trips. Ubiquitous high bandwidth wireless (LTE)/High Resolution
Mics/Cameras/smartphones are the enabling technologies which trigger this
class of application.

~~~
mbesto
_Medical analysis_

I don't think I would rely on a description from a Mechanical Turk human for
describing what they see in a pictue.

 _Is that a tibia or a fibula?!?_

~~~
ghshephard
Right - you don't send to Mechanical Turk. You instead send it to a highly
trained backend workforce, presumably highly skilled in the vertical that you
are doing analysis on. And, it's not just a picture, but video/sound/other
sensing. Who knows what the next advances in remote telemetry will be.

~~~
anandkulkarni
Indeed, we do this kind of work today in MobileWorks, using exactly this fact.

Once your workforce is skilled all kinds of new opportunities of this nature
open up.

~~~
ghshephard
Nice. I guess we don't need to look too far to see that soon-to-be-billion
dollar company then. Have you considered doing real-time feedback to a device
that has remote telemetry (video/sound/pictures/etc...)?

~~~
anandkulkarni
We've done some work in this vein already, as a matter of fact: real-time
communication between the crowd and cameras / robots. We're currently
supporting a few fast camera-driven smartphone applications that developers
have built on the platform.

I think what you're after, though, is more comprehensive integration: a remote
telemetry system that has crowd intelligence baked into its circuits along
several media, rather than one: analyzing simultaneous audio and video. I
don't think this would be difficult for a developer to build, and it's a great
idea.

------
archangel_one
I was hoping that the description would be derived algorithmically rather than
via human intervention. I guess that really was a bit optimistic though :(

~~~
JohnLBevan
Agreed, the concept's great but really you're just taking a picture then
asking someone to describe it & printing that; that tech's been around for a
while.

A more practical approach may be to develop a camera with access to the web.
When pictures have been taken and the camera can get a 3g/4g/wifi network
connection it uploads the pics to the cloud (optionally removing or retaining
to save space on the camera's local memory or allow the images to be accessed
on the camera in offline mode). You can then put in some serious server power
to run through all uploaded images to describe the contents (including tagging
people in the photos and using GPS data to help identify content based on
location context) and have this data automatically sent back to the camera (if
desired) as well as stored against the image online. Randomly picked images,
or those with a low accuracy rating (e.g. the algorithm has rated its
description as tenuous) could be sent to a service such as mechanical turk so
that the results can be verified / improved upon / to ensure the algorithm has
feedback to learn from.

~~~
anjc
Heh you're describing the whole implementation except for that pesky but
critical computer vision part.

------
drostie
It is a cute idea but unfortunately (or fortunately?) this particular
implementation of the camera comes with (a) opinions, (b) typos, and (c)
probably trolls, if they know that their text will automatically be accepted
and printed out.

I guess these factors could probably be reduced with more money? I don't mean
paying more for the job: I mean using the API to ask a second human to verify
the results of the first human and/or fix typos.

~~~
scraplab
I think you're missing the point of it.

~~~
drostie
Am I? Would the point somehow be served if you took a self-portrait and the
response back was "haha ur so ugly"...?

(I should add that I know that Amazon offers a review process, but I am not
sure how it integrates with the above camera, and I am thus assuming that the
camera automatically has to accept a description as correct before printing it
out. This might not be the case; and I don't know.)

~~~
huskyr
You are. This isn't ment to be a completely realistic project ready to be
shipped, it's a glimpse of the future, or maybe an art project. I think it's
wonderful to see out-of-the-box stuff like this appearing on Hacker News.

~~~
drostie
My apologies; I suck at communicating.

I think I agree. This camera is cute but this particular implementation has
flaws -- that's what I said before and what I hoped would be my take-home
message, and I guess I didn't highlight that enough. I certainly didn't
intend, "oh, fix this before it goes into production!" or anything like that.

------
fdb
Add speech synthesis and this would be an awesome camera for the blind.

~~~
zoul
Even with a six-minute latency?

~~~
squirrel
Absolutely. for example, a big challenge for blind people is interacting with
devices in the environment that report status via visual signals. For example,
a blind coffee fiend I know can't tell what state his fancy espresso machine
is in ("brewing", "out of beans") without sighted help to read the little
display. He would happily wait a few minutes to get someone to tell him - it's
quicker than waiting for his wife to come home.

~~~
drostie
Thanks for this. Now I'm trying to think of ways that you could actuate thin
metal rods, so that you could transform a greyscale picture into a sort of
texture-photograph, so that you could "see" details by placing a hand atop it.
If the rods are needle-thin you could probably get a reasonable resolution on
it, and even have some brightness/contrast sliders on the side to help -- but
it seems like the real problem is just moving all of those little rods
automatically.

~~~
paulsilver
If you built it, I think you'd find the real problem is having enough
resolution in your fingertips to work out what the hell the picture is about.

Lots of people who become visually impaired later in life can't read Braille
as their fingertips aren't sensitive enough, especially if they did manual
work earlier in their life. Working out a picture is even harder.

As a side note about this, William Moon invented 'Moon script' which was a
kind of simplified alphabet in embossed writing to help visually impaired
people read. It was invented a little before Braille invented the raised dot
patterns, and even after Braille became popular, Moon still had it's niche for
those who didn't have the resolution for Braille. As Moon was based on the
Roman alphabet, it was also easier to learn for people who'd been able to read
then lost their sight.

------
kghose
I'm disappointed. I was thinking some one had optimized a template matching
algorithm and implemented in an ASIC that you could slot into a camera.

------
mcguire
Let me see if I understand this technology correctly: you take a picture of
something, mail the picture to someone, somewhere, they write a quick summary
of the picture and mail that back to be printed out on what appears to be a
thermal receipt printer.

Wat?

~~~
ChuckMcM
Yep, and if you can internalize why that makes the front page of HN you can
get a sense of what Web 3.0 is going to look like.

------
lucisferre

      It is pitch black. You are likely to be eaten by a grue.
    

Ooops, lens cap is on.

------
EastCoastLA

      Here is an idea, translate the text into a picture.  An API will take text and generate a representation of the description using images from the web via google image search.

~~~
bazzargh
There's been some interesting efforts at that already. eg:

WordsEye constructs 3d images from text descriptions
<http://www.wordseye.com/>

Sketch2Photo takes a crude annotated sketch, and creates a composite image:
<http://cg.cs.tsinghua.edu.cn/montage/main.htm>

The output of wordseye isn't great - it looks a bit 90's POV-Ray. Sketch2Photo
does a nicer job - more like automated photoshopping - but needs more
assistance on the placement of objects in the scene.

I'm sure there's others.

~~~
sedachv
The Wordseye renderer actually happens to be a 90s vintage ray tracer.

------
alephnil
Maybe not too practical as presented here, but if you start with mechanical
turks doing the recognition, the pairs of images / description could be an
invaluable source when developing and training automatic algorithms that does
this kind of image analysis.

Thare are already systems that can analyze wounds automatically by analyzing
pictures of it.

------
tlrobinson
My first thought was "oh great, another silly art project", but I've been
wanting something like the Mechanical Turk part of it for awhile.

My least favorite part of photography is organizing and finding the best
photos. I'd happily pay to have a bunch of people rate all my photos (if the
results are meaningful enough)

You'd probably need to show each photo to multiple Turkers to get good data.
You could run analysis on the votes to kick out the Turkers who give ratings
that deviate significantly from other Turkers. Does MTurk have anything like
that built in?

------
daralthus
And then comes pictureless pinterest, oh wait here it is:
<http://twitter.com/picturelesspins>

~~~
stagnative
The source of their content are submissions from their website
<http://picturelesspinterest.tumblr.com>

------
antihero
Like reverse Dwarf Fortress?

------
vette982
This is like the DARPA Mind's Eye project, except the DARPA project does this
computationally. [http://www.wired.com/dangerroom/2011/01/beyond-
surveillance-...](http://www.wired.com/dangerroom/2011/01/beyond-surveillance-
darpa-wants-a-thinking-camera/)

------
gerrypez
> Here is an idea, translate the text into a picture. An API ...

This one uses a combination of computer vision and optional human vision.
<http://www.iqengines.com/omoby/>. There is an API allowing for computer
vision training.

------
scraplab
This is brilliant. It reminds me of Sascha Pohflepp's Blinks & Buttons
project, which swaps your photo with someone else's taken at the same time.
There's even an iPhone app.

<http://www.blinksandbuttons.net/>

~~~
pavel_lishin
Reminds me of Swapshot.

------
abeatnik
A camera for the blind - if the printer is adapted to print Braille.

------
joshu
Oh god, the Searle Prophesy is coming true!

------
K2h
its too bad tineye doesn't do a better job of matching similar, you could feed
the picture in, find the closest match, and then scrape the description off
the corresponding pages where it was found.

------
miniatureape
Of course, these should be saved in JTEG format.

[http://whaleycopter.blogspot.com/2007/09/project-jteg-
compre...](http://whaleycopter.blogspot.com/2007/09/project-jteg-compression-
format-for-ink.html)

------
macarthy12
Now I know how Thicknesse felt.

------
zashapiro
Can someone build a version of this that prints snarky descriptions of certain
pictures? Give it a libertarian political slant and a vegan diet. Only good
things can come of this.

------
hackermom
I wonder how it holds up against a human being. They sould run a test on that.

~~~
PanMan
Did you read the article? It is being done by a human being, trough Amazon's
Mechanical Turk service.

~~~
hackermom
Yes I did, Mr. serious face.

------
mistercow
My camera gives me descriptions too, but they're in terms of quantized
frequency coefficients of 8x8 squares.

