| | | |
| | This is a picture of | |
| | 30-60% off top brand name | |
| | shoes like Nike, Vans, | |
| | Adidas, Toms, and more. | |
| | Call 1-800-TURK-GAMER to | |
| | find your perfect pair. | |
| | | |
| ------------------------------- |
|| This is a picture of how I ||
|| just want to tell you how ||
|| I'm feeling, gotta make you ||
|| understand, never gonna ||
|| give you up, never gonna ||
|| let you down, never gonna ||
|| turn around and desert you, ||
|| never gonna make you cry, ||
|| never gonna say goodbye, ||
|| never gonna tell a lie and ||
|| hurt you. ||
'Greifers' is just a subjective term depending on which chair you're sitting in to watch the performance.
Just off the top of my head:
o Flora/Fauna identification
o Scene analysis (Car Crashes/Building Wreckages)
o Medical analysis
o Substance analysis
o Structural integrity review
o Online Translators. Today.
I don't think I would rely on a description from a Mechanical Turk human for describing what they see in a pictue.
Is that a tibia or a fibula?!?
Of course, you need to pay well enough to get enough people to bother, but I imagine one could build a reasonably skilled workforce for many areas if your training material is good and you have an objective measure of the correct answer.
This image description application is both perfect and terrible for Mechanical Turk though - it's an ideal task for a human rather than a computer, but it's also impossible to score the result objectively, so you'll have to pay everyone, all the time, or introduce another level of scoring - "Is this an accurate description?" "Does this description read fluently in $LANGUAGE" etc.
Once your workforce is skilled all kinds of new opportunities of this nature open up.
I think what you're after, though, is more comprehensive integration: a remote telemetry system that has crowd intelligence baked into its circuits along several media, rather than one: analyzing simultaneous audio and video. I don't think this would be difficult for a developer to build, and it's a great idea.
Try scaling this out..."highly-trained" means poor scalability and lots of liability.
Article from 2005.
Honestly, as a camera, it is a step backwards. Photographs efficiently capture many times more information than a simple text description.
However, what it is a good demonstration of is the growing cognitive abilities of image-reading machines.
(And in your defense, having humans manually describe images in the image recognition domain seems so...unmodern, that your assumption was probably fair)
A more practical approach may be to develop a camera with access to the web. When pictures have been taken and the camera can get a 3g/4g/wifi network connection it uploads the pics to the cloud (optionally removing or retaining to save space on the camera's local memory or allow the images to be accessed on the camera in offline mode). You can then put in some serious server power to run through all uploaded images to describe the contents (including tagging people in the photos and using GPS data to help identify content based on location context) and have this data automatically sent back to the camera (if desired) as well as stored against the image online.
Randomly picked images, or those with a low accuracy rating (e.g. the algorithm has rated its description as tenuous) could be sent to a service such as mechanical turk so that the results can be verified / improved upon / to ensure the algorithm has feedback to learn from.
I guess these factors could probably be reduced with more money? I don't mean paying more for the job: I mean using the API to ask a second human to verify the results of the first human and/or fix typos.
I don't see how this gadget is particularly useful in any way, any more than the first "automated chess device" with a midget inside moving chess pieces around was actually revolutionary.
The comments on how it would be useful for the blind (with braille or speech synthesis) is as close as I've seen to a reasonable justification for it. Even then it's a stretch. Beyond that, basing it on Mechanical Turk means every "photo" costs money, and frankly doesn't provide much value.
If you want to caption your photos it would be much better to do it AFTER you've pruned the ones you don't care about, so you're not paying for descriptions of 100 throwaway photos for every keeper. In which case the fact that the camera itself is hooked up to Mechanical Turk is just a gimmick, since it would be easier to run a script on your desktop or on a server.
(I should add that I know that Amazon offers a review process, but I am not sure how it integrates with the above camera, and I am thus assuming that the camera automatically has to accept a description as correct before printing it out. This might not be the case; and I don't know.)
I think I agree. This camera is cute but this particular implementation has flaws -- that's what I said before and what I hoped would be my take-home message, and I guess I didn't highlight that enough. I certainly didn't intend, "oh, fix this before it goes into production!" or anything like that.
He's also a writer for Make: http://blog.makezine.com/author/makemattr/
I think the key is that he's one of those people who are experimenting with mashing up not only different technologies but also different generations of technologies.
I guess one could call this stuff "parlor tricks" as one other person said but I agree that it's totally within the spirit of hackernews.
> VizWiz is an iPhone app that allows blind users to receive quick answers to questions about their surroundings. VizWiz combines automatic image processing, anonymous web workers, and members of the user's social network in order to collect fast and accurate answers to their questions.
Lots of people who become visually impaired later in life can't read Braille as their fingertips aren't sensitive enough, especially if they did manual work earlier in their life. Working out a picture is even harder.
As a side note about this, William Moon invented 'Moon script' which was a kind of simplified alphabet in embossed writing to help visually impaired people read. It was invented a little before Braille invented the raised dot patterns, and even after Braille became popular, Moon still had it's niche for those who didn't have the resolution for Braille. As Moon was based on the Roman alphabet, it was also easier to learn for people who'd been able to read then lost their sight.
Someone (the "turk", or whatever Amazon calls them) keeps an open window (and is optionally paid for each minute with the window open), so new HITs would appear immediately, and the "turk" would be rewarded based on the speed of the response.
It is interesting in its cuteness though; it seems like it could have real applications depending on how 'powerful' the mechanical turk is.
It is pitch black. You are likely to be eaten by a grue.
Here is an idea, translate the text into a picture. An API will take text and generate a representation of the description using images from the web via google image search.
WordsEye constructs 3d images from text descriptions http://www.wordseye.com/
Sketch2Photo takes a crude annotated sketch, and creates a composite image: http://cg.cs.tsinghua.edu.cn/montage/main.htm
The output of wordseye isn't great - it looks a bit 90's POV-Ray. Sketch2Photo does a nicer job - more like automated photoshopping - but needs more assistance on the placement of objects in the scene.
I'm sure there's others.
Thare are already systems that can analyze wounds automatically by analyzing pictures of it.
My least favorite part of photography is organizing and finding the best photos. I'd happily pay to have a bunch of people rate all my photos (if the results are meaningful enough)
You'd probably need to show each photo to multiple Turkers to get good data. You could run analysis on the votes to kick out the Turkers who give ratings that deviate significantly from other Turkers. Does MTurk have anything like that built in?
This one uses a combination of computer vision and optional human vision. http://www.iqengines.com/omoby/. There is an API allowing for computer vision training.
Isn't that because the real work is being done by a human?