On the other hand, what they may not realize (and I know this as I have tried making computer vision algorithms both before and after reading the literature) is that while the computer vision literature may not contain a solution to your problem, knowing the literature can be very useful in helping to come up with your own solutions. There are a number of non-obvious themes in computer vision that provide a basis for new algorithms. For example, algorithms tend to be built from a pipeline in which the first stage is feature detection.
EDIT: you should really watch the video, it's quite funny and cool.
I think the main thing about pipelining, modularity and such is that researchers want to produce something that someone else can use.
Domain specific hacks do seem like the most effective in practice, they more or less by definition can't be reused or extended, however.
Which can be sad from the perspective of making progress over time.
Computer vision is an amazingly hard field that I've only done a small amount of work in. Basically, everything here seems "broken".
A lot of the paradigm involves layers (a "pipeline" etc). The researcher is, just for example, supposed to "segment" an image into object or boundaries and then use the segments for further processing. "Segmentation" then is "low level" to be followed by clever stuff later.
But the "segmentation" problem itself is entirely unsolved in any conventional conception of solved. It's not even solvable since there's no real criteria as whether a segmentation algorithm has succeed or failed other than what a later algorithm might think it wants.
In my case, the final criteria was someone looking at the lines I drew and deciding they "looked right" - terrible relative to a "real" test but also inevitable since a thousand images "correctly segmented" could only be produced by a person making their judgement by drawing lines on the image (which isn't much less arbitrary).
And then there's the question whether a segmented image even carries the information the next step really needs, etc.
It seems that human (or animal) vision is an amorphous computation process with sight, judgement and action all run together. Description like this article are interesting for this:
But I lack the optimism of the article. Any algorithm we create has to attempt to simulate these amorphous processes but using modular systems that in the end can't really hope to do so.
Edit: Also, the video is hilarious and typical in that the effort is totally ad-hoc; stare-at-it and tweak-it till it works. "scalable and repeatable it ain't"