
Computer Eyesight Gets a Lot More Accurate - dumitrue
http://bits.blogs.nytimes.com/2014/08/18/computer-eyesight-gets-a-lot-more-accurate/
======
liuliu
This year's result: [http://image-
net.org/challenges/LSVRC/2014/results](http://image-
net.org/challenges/LSVRC/2014/results)

------
lifeisstillgood
Computer vision is one of those odd areas that I cannot see a nice gentle
slope to adoption, but instead is a step change. For example NLP gives us all
sorts of add ons to our current interaction with computers (hey let's do
sentiment analysis of customer reviews / emails / etc)

But there is no obvious slope for computer vision - we need an infrastructure
of cameras and bandwidth before it becomes ubiquitous

So I struggle to see the profitable intermediate businesses between here and
there - and that troubles me.

~~~
TeMPOraL
There are many intermediate applications. From my own computer vision classes
at university I remember examples of jobs when a guy is sitting and looking at
a factory line or a machine for 8 hours a day in order to press a button if
something goes wrong. This is a kind of work that bores humans out of their
minds (thus making them extremely fallible), and that can be done much better
with a few cameras and a computer running not-very-supercomplicated computer
vision algorithms.

------
contingencies
_Science is but a perversion of itself unless it has as its ultimate goal the
betterment of humanity._ \- Nikola Tesla

Does this not nearly amount to "population-scale mass surveillance
algorithms"? Do people not feel this is accelerating negative social impacts
of technology?

Is it merely a coincidence that winning teams include many from countries
criticized for their totalitarian social contracts: Hong Kong University of
Science and Technology, National University of Singapore, Microsoft Research
China, Southeast University (China), Chinese Academy of Sciences? There's also
a presence from Holland.

Oh, and guess who won the category "with additional training data"? Google.

Come on people, we can do better than this! _SHAME SHAME SHAME._

~~~
kastnerkyle
This can already be done to a large degree... see [1]. That said, _this_
contest is about recognition of items and localization, both of which are key
for the future of robotics and have little do with your surveillance state
fears.

Ultimately, the thing stopping mass surveillance is not a limitation of
technology, but of policy. For better or worse, the days of "they don't have
the resources to do that" have been replaced by "they aren't allowed to do
that".

If you have access to the raw packets going to and from every device, and the
accelerometer in almost everyone's pocket, identification can be much simpler
than doing full face recognition all the time.

I seriously doubt the dawn of the surveillance state will be heralded by deep
neural networks recognizing faces in the streets - hardware and software
backdoors on phones are cheaper and more effective.

[1]
[https://www.facebook.com/publications/546316888800776/](https://www.facebook.com/publications/546316888800776/)

~~~
contingencies
The stated scope is _object detection and image classification at large scale_
, so I would be interested to hear your reasoning as to why that is not
applicable to mass surveillance.

It's not a stretch of the imagination to see these things being sold to
airports, seaports, mass transit stations, and storefronts as a security
feature. Next, your physical mail could be scanned. None of that seems
politically unlikely in the current climate.

~~~
kastnerkyle
To be perfectly honest, the fact that NSA already has access to every single
packet into or out of the US (and probably most inside the states as well...)
for much cheaper with much less rollout overhead, points me away from these
types of algorithms as a "tool of mass surveillance". Think Occam's razor -
you would need massive political pull to put this in every tiny jurisdiction,
not to mention equipment maintenance and the massive attack vector exposed by
hundreds of "internet of things" devices piping data to some endpoint. The
recognition results would need to be geolocated, time tagged, and encrypted to
NSA specs. To access the data it would have to go through some kind of
unclass->classified firewall, get decrypted AND they would have to keep the
public in the dark, blah blah blah.

The tools _already revealed_ for large scale surveillance are cheaper, more
effective, and more robust to outside attack than the mentioned ideas. More
importantly, they are _already there_ \- there is no rollout cost at all! And
up until recently it was also easier to keep the public in the dark...

I _do_ see applications at the places you mention, but for a very different
reason - border inspections (coupled with human oversight) are an excellent
place for automation where a small amount of effort could lead to a massive
increase in throughput per person.

The only downside is that officials who deploy these things will want
_guarantees_ on effectiveness, which you can never truly give due to
statistics. Couple this with the fact that neural networks are very difficult
to tune for false negatives and false positives and it would be a difficult
sell.

One alternative would be to use these types of networks as black-box
preprocessing, followed by a "tunable" algorithm like logistic regression
where you could effectively control the ratio of false positives - a high rate
of false positives coupled with human oversight could still lead to a large
boost in human performance if most of the border inspection process is
uninteresting.

But still there are unions... which is a whole separate issue to itself.

~~~
contingencies
I remain unconvinced. Fundamentally, packets and physicalia are apples and
oranges.

China, Hong Kong, Singapore, the UK, and an increasing number of cities
worldwide already have _massive_ video surveillance networks allied to local
law enforcement, traffic management, and other functions. Adding another stage
of image processing would help to leverage and extract actionable data from
those (the classic problem of CCTV is that nobody watches it) ... with
probably very little additional outlay compared to existing investment.

I can't see anything stopping this commercial progression, in fact I see it as
inevitable unless the citizenry can somehow curb their politicians. Good luck
with that...

------
brandonmenc
Minor editing nitpicks:

GPU, not G.P.U. OpenCV, not Open CV

c'mon NYT, act like you know.

~~~
thrownaway2424
Open CV is a mistake, but the style guide in force at the Times gives guidance
for the use of dots in abbreviations. G.P.U. appears as dictated by their
style guide. The Times also inserts dots into C.I.A. and F.B.I. for the same
reason.

