
Computer Vision Research: The deep “depression” - nhegde
https://www.linkedin.com/pulse/computer-vision-research-my-deep-depression-nikos-paragios
======
goalieca
> Other than a handful number of people doing some fundamental research
> towards understanding the theoretical concepts of these methods, almost all
> the community now seems to target the development of more complex pipelines
> (that most likely cannot be reproduced based on the elements presented in
> the paper) which in most of the cases have almost no theoretical reasoning
> behind that can add 0,1% of performance on a given benchmark. Is this the
> objective of academic research? Putting in place highly complex engineering
> models that simply explore computing power and massive annotated data?

I last dabbled in image processing research around 2011. Probably most of the
papers i read during the previous 5 years were small little epsilon papers
that added no real value. I did some work in other fields and noticed a
similar trend there. I always attributed it to the trend of PhDs being pumped
through the system in ever greater numbers and the need for researchers to
publish a paper every few months.

~~~
f9us98a
This happened to me. I developed a computer vision technique that achieved a
well known result but without the same constraints. Although it never
surpassed the other technique for most images, it worked in a wide variety of
cases that the previous technique did not.

My major professor diluted the paper and added other content consistent with
the previous method. Not just adding prior art to the introduction, but
changing the meat of the paper so that it didn't seem like a departure.

He assured me that this would make it easier to publish, and publishing was
all that mattered. There were no bonus points for publishing a novel
technique, and there would certainly be extra work having to deal with
referees.

I'm very glad to be out of that environment now. I noped out of academia and
happily dealing with corporate B.S.

~~~
Roritharr
Having dropped out of University while studying Computer Vision, i've
witnessed this much too often in former friends and colleagues.

Smart people doing remarkable things don't seem to have a place in our society
anymore, neither in academia nor in the private economy. Sure there are
exceptions.

Microsoft Research, Google X and of course a handful of universities that
actually work as they are supposed to, but for most of my ex colleague's these
weren't real options as noone ever instilled the courage in them to find their
way there, or survive the competition.

It's strange that most of them are building CRUD Software, Writing Shaders for
Game Engines or work in marketing, instead of pushing us towards breakthroughs
in CV and with that AI.

~~~
demonshalo
I keep having that same thought. However, making a breakthrough requires 2
things:

1\. Resources. Time and money. These research ventures might take years and
sometimes need some funding in addition to your own salary. It is basically
borderline impossible for most developers to contribute anything useful in an
environment where they need to worry about paying their bills for this month.

2\. Know-how. Right now, most industries have advanced so much that you need
very specialized knowledge and a metric fuckton of math/stats to contribute to
any particular field in any significant way. A developer with a BS in CS can
most often not even understand the papers being currently published due to the
high math/specialized required knowledge.

------
hyperpallium
Machine learning gives performance without theoretical undertstanding. Norvig
discusses Chomsky on machine learning in linguistics:
[http://norvig.com/chomsky.html](http://norvig.com/chomsky.html)

> I mean actually you could do physics this way, instead of studying things
> like balls rolling down frictionless planes, which can't happen in nature,
> if you took a ton of video tapes of what's happening outside my office
> window, let's say, you know, leaves flying and various things, and you did
> an extensive analysis of them, you would get some kind of prediction of
> what's likely to happen next, certainly way better than anybody in the
> physics department could do. Well that's a notion of success which is I
> think novel, I don't know of anything like it in the history of science.
> [from the linked transcript]

~~~
trhway
>Machine learning gives performance without theoretical undertstanding.

that is good experimental data and the role of theorists here is to look how
that performance achieved and why. For example, one can reasonably suspect
that there is a good reason why the kernels in a well trained image
recognition deep learning net do look like receptive fields of neurons in
visual cortex. I'm pretty sure that there is some kind of statistical
optimality in that, something similar to like normal distribution is maximum
entropy distribution for a given standard variation. The same way i'd guess
Gabor of neuron receptive field is something like maximum entropy on the set
of all possible edges or something like this. The point here is that the great
success of deep learning generates a lot of very good data for theorists to
consume. You can do only so much theory without good experimental data, and in
the decades before the availability of computing power (and resulting success
of deep learning) there wasn't that much of the computer vision theory
advances to speak about, really.

>leaves flying and various things

Newton did that for 20 years. With great success.

~~~
argonaut
Here's a problem: you can argue that Gabor filters arise _because we design
neural nets to encourage them_. Gabor filters mostly arise in CNN or things
otherwise regularized to be like CNNs. Convolutional layers are a form of
regularization that restrict the space of models that a network can conform
to. The Gabor filters are learned but none of this is evidence they are
globally "optimal" given that a human manually decided whether or not to
include the presence of convolutions.

It also goes without saying that the phrase "statistically optimal" is
meaningless in this specific context. You _can_ claim they are a part of
minimizing the cost function, but, again, you have to be very careful about
the chicken and egg problem, because humans are the ones who manually craft
the cost function.

~~~
suchow
You might find this 1996 Nature paper by Olshausen & Field interesting. In it,
they describe how a coding strategy that maximizes spareness when representing
natural scenes is enough to produce a family of localized, oriented, bandpass
receptive fields, like those found in the early visual system of humans.

[https://courses.cs.washington.edu/courses/cse528/11sp/Olshau...](https://courses.cs.washington.edu/courses/cse528/11sp/Olshausen-
nature-paper.pdf)

~~~
argonaut
This fits in with my point: they imposed several restrictions about the model
space: maximizing sparseness, and they also make several linearity
assumptions.

------
atroyn
I would hardly say that deep learning has taken over - some of the best
results in the last few years have come from 'classical' domains like
nonlinear optimization.

For example, LSD-SLAM:
[http://vision.in.tum.de/research/vslam/lsdslam](http://vision.in.tum.de/research/vslam/lsdslam)

Deep learning / ML approaches certainly have a place, and they're getting a
lot of attention right now, but the computer vision domain is about a lot more
than segmentation and classification.

Maybe in coming years we'll see some more breakthroughs from the ML side on
encoding priors - for example, teaching a network about projective geometry is
a lot worse than just structuring it in a way that it 'knows' what projective
geometry is. This could result in a closer collaboration between the two
fields.

~~~
bluejellybean
Wow, the video you linked is awesome. I would love to see how it would handle
a huge dataset of something like time square footage.

~~~
atroyn
You might be interested in Prof. Marc Pollefeys' work:
[http://www.bbc.co.uk/news/technology-11827854](http://www.bbc.co.uk/news/technology-11827854)

------
siavosh
I was a Computer Vision Phd student until I dropped out in 2007. I wasn't very
good, but a lot of what he says now was also true back then too. A field
lacking any agreed on underlying theory, driven by fads, un-reproducable
papers, massive volumes of forgettable papers by an inexplicably growing
population of CV researchers with dim professional prospects.

Yet there are still people out there working against the tide trying to find a
'unified' theory of what's going on, granted with limited success or support.
Some argued at the time that the problem is simply too difficult to tackle
with our statistical 'tricks' and computing power. It's somewhat disheartening
that folks are still grumbling about the same things 10 years after I left.

------
frozenport
I've seen the exact trend in the field of optics where engineering work has
replaced science.

The fundamental problem is that funding is given to those who promise the best
outcome ("device that can recognize cancer") rather than the truth ("Where is
the data located in an HBM").

Now, engineering work isn't bad, but today's university still has relics from
a previous generation, like research papers. Hence, we're left with a bunch of
research papers with little scientific content. The only fix I can think of is
to offer useful alternatives to the PhD and prefer or mandate other markers of
achievement like patents instead of research papers.

~~~
laichzeit0
Is this a result of "hacker culture"? I.e. there seems to be a a pathological
trend on HN of "fuck a CS degree, I can learn to code in a week" type of
mentality. You have people who believe they can "hack" their way through
complex fields without having a theoretical underpinning in it. No need to
learn mathematics, you just need to "understand the main idea", code something
up, run a bunch of simulations and tweak constants, run more simulations, etc.
until you have some incremental improvement, write it up and publish.

It's the difference between giving a "brute force" computer proof like the
four color theorem than try and come up with new theory where it's just a
result from it.

~~~
frozenport
No. I think its really the mindset of higher ranking people (professors) who
due to funding or conflict-of-interest are motivated to do engineering rather
then science.

Most folks I know are desperate to do actual science, experimental or
theoretical. Instead, they optimizing some procedure/protocol.

~~~
lskfks
you hit the nail on the head.

if someone wants to do science, there is no one stopping them from doing
science.

if someone comes to someone with their hands out, then theres going to be
strings attached.

~~~
wolfgke
> if someone wants to do science, there is no one stopping them from doing
> science.

The copyright laws for scientific articles are. I just link to Aaron Swartz'
Guerilla Open Access Manifesto:
[https://archive.org/details/GuerillaOpenAccessManifesto](https://archive.org/details/GuerillaOpenAccessManifesto)
[https://archive.org/stream/GuerillaOpenAccessManifesto/Goamj...](https://archive.org/stream/GuerillaOpenAccessManifesto/Goamjuly2008_djvu.txt)
[https://archive.org/download/GuerillaOpenAccessManifesto/Goa...](https://archive.org/download/GuerillaOpenAccessManifesto/Goamjuly2008.pdf)

~~~
PeterisP
It is an important point in principle that should be solved.

However, in practice that isn't an issue:

1) At least in this domain, all publications are de facto open access, as in,
if you just google the name of a paper in a random citation in 99% cases you
will get a non-paywalled full text version - if not from the actual place of
publication, then on arxiv, author's home page, etc. It's not _totally_
appropriate as there could be differences, but it's definitely enough to say
"there is no one stopping them from doing science".

2) If you do actually need access to the university library databases for
paywalled articles, then just go to the library. If you want to do science,
there are options. Most people simply have or get some kind of
university/college affiliation. If you don't, in many places you can still use
the university library to access the data without the paywalls. If not, then
you often can (depending on your country) "join" university to audit a single
course, which would get you that affiliation and access to their
infrastructure. I'm getting to more and more obscure scenarios, but even then
there are options - the publications _are_ accessible (though at some times
not conveniently enough) and _that_ is not a serious obstacle to doing
science; it's still far less effort than actually reading and understanding
these papers.

3) If you do need something that's _really_ not available to you, just email
the author. As a rule, people write articles because they want people to read
them, use them and cite them. My advisor has a bunch of papers that he
received that way in pre-internet time when that involved expensive mailing
over the ocean. The only realistic case where an author wouldn't send a
preprint version to you is because you're either rude or haven't taken the
five seconds to click on the link in their homepage to get that paper.

------
return0
I consider deep networks as experiments. People try incrementally different
models , and the results keep getting better. At some point the theory behind
them will advance to the point where we can analytically describe them.

------
AndrewKemendo
_How from a community where all fresh incoming PhD students have never and
most likely will never hear about statistical learning, pattern recognition,
euclidean geometry, continuous and discrete optimization, etc. new ideas will
emerge._

Except they are learning this stuff prior to their graduate work...so I don't
know where the author is coming from here. All of our Computer Vision people
are very familiar with all of those topics - especially complex geometries and
topology.

~~~
argonaut
Really? My impression is that most new grad students have a superficial
understanding of this stuff, which they promptly never use, since they just
end-to-end deep learning solutions.

~~~
AndrewKemendo
I think it depends on the student and researcher. I would agree that most are
implementing ANNs to "brute force" around existing fundamental problems - but
to me that's as much a solution as doing it another way - I'm not sure it's
worth the effort to do it otherwise. Maybe some efficiencies, but I think
getting towards are more generalizable ANN based machine vision system
outweighs the negatives.

------
drcode
Deep learning is getting all the attention because it gets the best results-
If you don't like this, you either have to provide (1) a different measure for
results, or (2) give an objective mechanism for evaluating the "worth" of a
technique that doesn't involve looking at results.

I would still like to hear what the author of the post recommends as a course
of action- Maybe he can write a followup post that provides these details to
clarify this.

~~~
f9us98a
It gets the best results for what? Maybe for classification and recognition.

But the point of the article is that there is more to computer vision. Stereo,
optical flow, geometry, and physics can only be aided by deep learning so
much.

Another point not mentioned is the computational power required for deep
learning. Consider programming the physics for a ball rolling down an incline.
You could use (1) the math itself, vs. (2) a neural net. It's clear that the
direct math approach could be orders of magnitude faster than neural nets. I
wouldn't be surprised if directly coding the physics would achieve 1,000x the
performance of a neural net.

~~~
opticalflow
I would posit that a deep learning network that learns to optimize
_parameters_ for a complex algorithm _outside_ the convolutional network
itself may have immense utility outside the classification problem. Call it a
marriage of classic computer vision with deep CNN, or a hybrid approach. I
don't think it's a binary decision. A deep CNN can find the optimal parameters
(once trained) for a classic CV problem for a given image or video or other
dataset, like superresolution, patch-based inpainting, or motion tracking. The
training is the most computationally intensive part. As someone with way too
many kids, I can testify...

~~~
argonaut
You seem to be describing hyperparameter optimization, which I seriously doubt
CNNs are going to be used for anytime soon.

~~~
pedrosorio
It has been done already:
[http://arxiv.org/abs/1502.05700](http://arxiv.org/abs/1502.05700)

Edit: Although what they seem to describe is replacing GPs with neural
networks in Bayesian optimization which is supposedly more efficient.

Since the point of Bayesian optimization is to limit the number of times you
have to evaluate a new set of hyperparameters, I am not sure how useful it is
to "be able to scale" (i.e. even if maintaining the GP is O(n^3) with the
number of evaluations, the costly part should be to evaluate the
hyperparameters in the first place) but I haven't read the paper so they may
show some high dimensional hyperparameter cases where performing a lot of
evaluations pays off.

------
alayne
CV seemed stalled out before deep learning, at least in the late 90s. By that
I mean classifiers had hit performance limits and weren't improving, and so
on. As a non researcher, I'm glad to see it moving again.

------
romaniv
I had very similar thoughts recently. Really glad someone took time to express
them properly.

Even on this very website... I feel there is an immense bias in favor of
anything "deep" and "neural" when it comes to AI. Can't recall any recent AI
papers that made it to the front page without having those two words in the
title.

And please, don't tell me there is nothing interesting going on in the field
outside of deep learning. Even when an approach doesn't beat SOTA in terms of
error rates, it can still contain valuable ideas or have interesting
properties.

------
egfx
I worked in computer vision QA at a small startup named Neven Vision. (Google
Goggles, Picassa fame) It's interesting to see how much things have
progressed. I think I may have developed some of the first field tests in
mobile computer vison. IR was very sensitive to lighting at the time. Though
extreme angles always seemed to work. By the way I was testing on the best
camera phone in town, the t-mobile sidekick.

~~~
AndrewKemendo
_IR was very sensitive to lighting at the time._

It still is. Any system using structured light will have big problems when
confronted with sunlight (windows, outdoors etc...).

I'm not confident there is a solution within the structured light domain to
this as the beacons will (more than likely) never overcome sun intensity.
We're doubling down on passive systems and reference maps.

------
hacker42
One point that I've not seen mentioned yet, is that the neural revolution
somewhat aggravates economic inequality. What recent progress basically has
shown is that deep learning works better with more layers and more resources.
Geoffrey Hinton has also recently conjectured that there exist a pretty much
unexplored regime of applying gradient descent to huge models with strong
regularization trained on relatively small (but still big) data. This
inequality is alleviated to some extent by the fact that the machine learning
community fully embraces online education and open science, but still, you
need 50-150 GPUs to play human-level Go and having several grad students that
explore a wide variety of complex and huge models is key for progress. I can
only see this aspect getting worse in the years to come.

------
daix
Research paper needs more creativity than improvement. But it's quite
difficult to come up with an innovative method, especially when the field is
more explored and developed compared to decades before, and nowadays people
need to publish a paper every few months.

------
powera
So he's upset that something _successful_ is happening in computer vision, at
long last?

~~~
varjag
He is upset at monoculture in research.

Remember that throughout the eighties/nineties ANN classifier performance was
as underwhelming as other approaches or even worse at many tasks. Now it
reached a local optimum and all other approaches are being discarded.

------
lordnacho
If using loads of data and cpu is a bad thing, he should propose a new measure
of goodness that takes these into account. A bit like information criteria in
statistics which penalise using a bunch of extra variables.

~~~
wolfgke
> If using loads of data and cpu is a bad thing, he should propose a new
> measure of goodness that takes these into account.

You better separate between using lots of data (perfectly OK, in my opinion)
and the number of parameters that the model uses (say, number of weights in
the neural network) and where you better have good explanation for the
existence of any variable and why it has this concrete weight and no other and
why it is even necessary to introduce (consider any parameter that you have to
introduce as some kind of physical constant - physicists invest lots of time
to explain/reduce the number, so should CV researchers).

------
paulsutter
He should come up with a better benchmark then that requires the methods he
fears are underexplored. Perhaps demonstrate something important that's being
overlooked by the trendy crowd. If he can't come up with that, he's just being
sentimental.

~~~
argonaut
Those benchmarks exist. Anything to do with zero-shot or one-shot learning.

