Hacker News new | comments | show | ask | jobs | submit login
Computer Vision in the Real World – Why SIGGRAPH Probably Won’t Help You (bitgym.com)
26 points by splendidfailure 1129 days ago | hide | past | web | 6 comments | favorite

Great article. They very clearly describe the state of research in computer vision. When I studied the field, I was very surprised at how shallow the field was, in the sense that, as they put it, computer vision algorithms "are generally only moderately sophisticated".

On the other hand, what they may not realize (and I know this as I have tried making computer vision algorithms both before and after reading the literature) is that while the computer vision literature may not contain a solution to your problem, knowing the literature can be very useful in helping to come up with your own solutions. There are a number of non-obvious themes in computer vision that provide a basis for new algorithms. For example, algorithms tend to be built from a pipeline in which the first stage is feature detection.

EDIT: you should really watch the video, it's quite funny and cool.

OP here - Just wanted to say that I believe pipelined architecture is a symptom of iterative research and is not necessarily good. I think the more direct solution avoids pipelining in favor of a domain specific hack.

Thanks for making that point. It still seems to me that most domain specific solutions will still make use of many ideas from the CV literature. E.g. looking at the last sections of your video it seems like you settled on a histogram based approach, although I may be wrong.

I think you get a bag of tricks as you work on this stuff, for sure. But I think they are more like the tricks artisans gather, share and re-use than fundamental academic truths.


I think the main thing about pipelining, modularity and such is that researchers want to produce something that someone else can use.

Domain specific hacks do seem like the most effective in practice, they more or less by definition can't be reused or extended, however.

Which can be sad from the perspective of making progress over time.

* For example, algorithms tend to be built from a pipeline in which the first stage is feature detection. *


Computer vision is an amazingly hard field that I've only done a small amount of work in. Basically, everything here seems "broken".

A lot of the paradigm involves layers (a "pipeline" etc). The researcher is, just for example, supposed to "segment" an image into object or boundaries and then use the segments for further processing. "Segmentation" then is "low level" to be followed by clever stuff later.

But the "segmentation" problem itself is entirely unsolved in any conventional conception of solved. It's not even solvable since there's no real criteria as whether a segmentation algorithm has succeed or failed other than what a later algorithm might think it wants. In my case, the final criteria was someone looking at the lines I drew and deciding they "looked right" - terrible relative to a "real" test but also inevitable since a thousand images "correctly segmented" could only be produced by a person making their judgement by drawing lines on the image (which isn't much less arbitrary).

And then there's the question whether a segmented image even carries the information the next step really needs, etc.


It seems that human (or animal) vision is an amorphous computation process with sight, judgement and action all run together. Description like this article are interesting for this:


But I lack the optimism of the article. Any algorithm we create has to attempt to simulate these amorphous processes but using modular systems that in the end can't really hope to do so.

Edit: Also, the video is hilarious and typical in that the effort is totally ad-hoc; stare-at-it and tweak-it till it works. "scalable and repeatable it ain't"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact