Hacker News new | past | comments | ask | show | jobs | submit login
A Year in Computer Vision (themtank.org)
450 points by Geeshang on Nov 27, 2017 | hide | past | favorite | 87 comments



It amazes me that there is seemingly complete void of ethical consideration here.

I can come up with handful of completely malevolent uses for facial recognition in no time flat. Automated weapons, increased surveillance, targeted marketing.

The only good use I can come up with is spotting terrorist. But they are already the one group of people who seem to have the luxury of occasionally using masks.

Involvement in this stuff could very well yield more problems with your personal conscience than what Mikhail Kalashnikov has ever had. "Someone will do it anyhow" Yes. But you can drive up the price and availability of malevolence by personally avoiding it.


"The only good use I can come up with is spotting terrorist"

I can come up with a handful of benevolent uses for facial recognition in no time flat.

1.) Automated ID systems for supermarkets. Walk into a store, store knows who you are. No checkout needed. Just walk out.

2.) Security for venues/work/etc. No more backstage passes getting copied - computer vision knows who's allowed where.

3.) Emergency response after a disaster. Automatically notify loved ones if you've been detected alive in a news feed.

4.) Voting systems. Got a stadium full of people? Just have them raise their hands - computer vision does a tally along with ID'ing who voted for what.

5.) Guest lists / ticketing. Show up to a venue, walk in. No ticket needed. Venue knows who's if you're allowed in. No more lost tickets.

6.) FaceID for the iphoneX. I'm personally loving it on mine.

Like many new technologies, it can be used for both good and bad. Harnessing nuclear reactions can be used to generate massive amounts of electricity, or to level cities. What's important is that it's regulated, not that it's banned.


1.) Automated ID systems for supermarkets. Walk into a store, store knows who you are. No checkout needed. Just walk out.

Of course, nobody will ever use this feature as a mean of enhancing their "targeted marketing" nor creating a "targeted pricing" strategy. For me, this is a big NO-NO and should go straight to the malevolent usage list.


If enough people won't tolerate price discrimination, there's money to be made if a store will pledge not to do it.

But if there's only one walk-out supermarket available, and it does price discriminate, and it's not too outrageous, I'm in. Waiting in line for checkout is very bothersome for me, and I like my job very much, so I'll pay more than my hourly rate at work to get out of each hour of line.

(extra bothersome if there's untargeted marketing blaring through some speakers while I wait, 1984-style. But I already avoid those supermarkets)

Nobody likes price discrimination, but there are advantages. Poor people pay much, much less. It's ironic that we're always coming up with inefficient measures full of collateral damage to ease thing up on the poor, but we'll rail against price discrimination like it's Salem.


People are pretty OK with price discrimination all the time if its open. Like "discount for veterans" and nobody bats an eye.


Did you know grocery delivery is available?


Not in my city, but thanks for the intention.


Now that's a presumptuous thing to say.


Wow... so not only could it correlate someone's identity with their race and possibly religion, and anything else that could be gleaned from existing social media and added to a proprietary "loss prevention" database, it would also be able to debit arbitrarily from someone's account without any intervention.

And from the point of view of the customer, it would be the most convenient thing ever.

That seems to blatantly exploitative on so many levels, with so much potential for abuse, that I have to assume someone will implement it within a decade. Probably Amazon.


You're late. Amazon is testing this with grocery stores now.


They don't do targeted pricing as far as I know but I thought they have been doing the targeted marketing with your purchasing history for a couple decades already. They already have your info from your credit cards


I can only see number 3.) to be actually beneficial to some people. Most of those could easily go horribly wrong. Like nobody let's police to backstage while there is crime going on. Or supermarket doing post purchase price adjustments and when you look at the receipt, they analyze your micro-expression and optimize their scamming to maximize their earnings while not pissing you off too much.

All of the rest could have been done decade ago if people would simply accept RFID implant. And most people are very skeptical of that, because in almost all of your examples the tech could be used for evil too.

Would you accept free RFID implant? If the only price was that the RFID fingerprint would be completely public.


> Like many new technologies, it can be used for both good and bad. Harnessing nuclear reactions can be used to generate massive amounts of electricity, or to level cities. What's important is that it's regulated, not that it's banned.

I worry that, as we already see with NSA, GHCQ and friends, governments won't be quick to regulate a huge source of information that they can tap, and a source of technology that they can use for themselves. Recent history would seem to indicate that governments see the potential for state use of these techniques, but overlook the much wider scope for criminal uses of them. (E.g: Reliable facial recognition and pervasive tracking seems like it would make undercover police work much harder.)

Having a stockpile of nuclear weapons feels like an innately dangerous thing, whereas having a computer system tracking everyone for auguries of pre-crime is light-hearted enough that it's the central plot premise of a major TV show. Many politicians are who are rightly wary of the destructive power of nuclear technology may not see the same dangers in computer vision or pervasive surveillance.


> 6.) FaceID for the iphoneX. I'm personally loving it on mine.

Sorry kinda OT from malovent/benevolent route, but is FaceID fast? Or rather, is it just as fast as the normal way of unlocking with finger?

Do you have to tilt your phone to your face or is there some kinda wide angle lens? I frequently unlock my phone laying flat on my desk


Number 3 has a much more critical use: after any disaster, the medical ops supplies have to be guarded, because people in panic will simply raid the medical supplies. Easily deployed mobile FR is critical for any disaster operation, as it frees up people that would ordinarily waste time guarding areas from the panic.


Since the parent mentioned weapons, I would add:

7) FaceID for guns.

Imho, it should be mandatory, and it could be a solution in the gun control discussion.


Gun making companies would love that. As resale would be practically impossible. No cheap guns to satiate your love of variation. Markets for new guns would increase.

Police would probably hate it, because having your side arm malfunction in dark alley in life and death situation is probably something you want to avoid.

Organized crime would probably love it. Manufacture of unmarked firearms is not that tricky. Fresh market for illegal goods is suddenly created, as now it's not only the criminals who want these, but also survivalists and all kinds of paranoid people. If you run the numbers of murders with illegal guns/ amount of illegal guns vs. murders with legal guns / amount legal guns, you will find that illegal firearms are roughly 10X or 20X more likely to be used in homicide.


Police could be exempted from using the technique (just like they have special gun-carrying status in e.g. EU countries).

Regarding the other points (companies and organized crime loving this), you can also look at other countries where guns are illegal. Practice shows there are less fatal incidents there.


I'm from a country with very strict gun laws. That's why I know the data on illegal guns, there is apparently no such thing as "illegal gun" in U.S.A.

The only thing that facial recognition would help in my country would be those cases when licensed gun collector gets robbed and his guns taken by organized criminals. In that situation it would only slow down the illegal use of those guns. Therefore it would possibly ramp the price up to a point where manufacture of illegal guns becomes profitable.


There are whole classes of firearms that are illegal in the U.S. Fully automatic weapons require a class 2 license which is very hard to get. No rocket launchers, RPG, missiles, cannons, and even tanks! What is a guy supposed to do for fun on the weekend!


You do not need a Federal Firearms License nor registration as a Special Occupation Taxpayer to purchase fully automatic weapons that had been registered in the National Firearms Registration and Transfer Record prior to 1986. All of the other items you mentioned are legal to purchase or privately manufacture as long as you comply with the National Firearms Act by registering and paying a tax on them (usually $200). I don't think there is any weapon which is outright illegal for a civilian to own federally in the United States except a nuclear weapon as plutonium and enriched uranium are restricted by the Nuclear Regulatory Commission.


There may not be a formal list of civilian-prohibited weapons but I'd be astonished if you can find someone who has a Title II license to own a MANPADS or even an ATGM.


Yes, but consider also that FaceID can be used in conjunction with a license. So the government could issue a license for one FaceID per gun-owner.

Problems this would solve e.g.:

- Children using their parents' gun at schools

- People using multiple guns (e.g. Las Vegas shooting)


So you could own several guns, but only use one of them? I can't see anybody voting for that.

Children using their parents guns is pretty neatly tackled here in my country by mandatory gun safes.


What exactly is wrong with owning "only" one gun?


Obviously this doesn't work for people who fire and care for guns as a hobby.


Solution: Double FaceID.

A person at the shooting ground unlocks your guns with their FaceID. Then you can shoot with them using your own FaceID for, say, 30 minutes. If time runs out, you ask the guy at the shooting range to unlock them again.

It's an application of the two-man rule, [1].

[1] https://en.wikipedia.org/wiki/Two-man_rule


What if you want to go hunting? What if you just wanna shoot skeet on your property? I predict that you are going to make zero progress on any attempt to get FaceID for guns in America.


Your suggestions are great for restricting cases in which guns could actually be used and thereby increasing safety, but not terribly practical (in the sense that there's not a chance in hell of getting national legislation passed without a massive shift in the country's culture).


Same thing that's wrong with owning "only" one knife.


Have you ever tried to go duck hunting with a revolver?


Aren't sports supposed to be challenging?

Elephant hunters should be required to use low caliber revolvers, so the elephant has a better chance of killing them.


People are missing the fact that FR is nearly 20 years old. Smart Triggers were one of the first first abandoned applications. It is simply not practical. The very last thing a person who understands firearms needs is something between them and the trigger.


Any "smart gun" component is an additional point of failure in an application where failure to function can easily mean death or serious bodily harm for the user.

I wouldn't be surprised if most proponents of this tech have no real-world firearms training or experience.


Just wait until you learn about gait recognition using 3D convolutions and how this identifies you better than facial recognition; and there is no way to mask yourself (beside putting random rocks to your shoes to make you walk funny).

I was recently approached to be a head of a self-driving car project; when I learned it was for military vehicle, I left the conversation. Maybe it would have been used for peaceful purposes occasionally (natural disasters), but I don't want to have my creation drive on its own somewhere in Middle-East and kill everyone around. It is an international project as well with the approval of leading governmental institutions to make it more fun and to give you some food for thought...


I'd be first one to point out that many military related technologies often have overall peaceful impact. My favorite example would be anti tank mine. It has to be dug on the spot, so it's used mostly defensively. It costs around 5$ to make and simple infantry soldier can blow up multi million dollar invading tank. Stuff like that allows poor countries to be independent. Or rich countries get to spend on social causes too.

But anything that removes personnel from the front line usually makes transgression easier. The national will to fight becomes less important and wars are dubbed just as "operations". Perfect example would be the MQ-1 Predator.

I think you made very honorable choise. Good luck on your career.


I'd like to learn more about the gait recognition research. Do you have any links or references?



It's tricky. Targeted assassination has replaced carpet bombing.


I know a company that uses facial recognition in nursing homes to keep track of residents.


I'm on the team that develops the Aureus3Dv facial recognition platform. We have clients using FR for all kinds of purposes, primarily knowledge of known people's presence in a given location, or knowledge of an unknown person in a given location. These situations are typically for automated doorways, identification (or non-identification) of people in retail / hospitality / secure scenarios. Typically it is about two things: safety of the people and assets at the location, and knowledge of the composition of traffic. For small organizations, like an art gallery, the adoption of FR is triggered by theft. For medium sized orgs, the adoption of FR is often safety of staff and security of assets. For large organizations, it is all of the previous plus hospitality / retail knowledge of who is in their spaces.


Interesting use-case , can you share a link. Thanks


Every technology since the dawn of man has been used as a force multiplier: fire, sharp sticks, domesticated horses, thick clothing, language, the wheel, boats, the printing press, carrier pigeons, computers...


What’s really fascinating about the current state of the art in computer vision/deep learning is just how many of the top research papers basically say “We tried this, it works, we’re not really sure why”


This has been my experience as well. I'm also am afraid that these researchers will succumb to a similar replication crisis that devastated the social sciences [0]. If you can't necessarily explain why something works, it's probably difficult to explain trying it again in the future didn't work

[0] https://en.m.wikipedia.org/wiki/Replication_crisis


It's more like the opposite problem.

Psychology researchers try to justify deep, compelling theories of human behavior with small experiments that are hard to reproduce.

Deep Learning experiments are comparatively easy to run and reproduce because the field has both a common set of performance benchmarks and a large community of people implementing cutting-edge ideas in common frameworks. The harder part is building useful theory on top of those experiments.


This is not really the case. I have personally quite literally implemented papers the week after they appeared on ArXiv and they worked. Anecdotical yes, and there have been papers that nobody was able to replicate, but at large the community is quite good at recognising which work can be used, and this work is then implemented and disseminated.


The situation is slightly better than social sciences though. Social science experiments, if any, can hardly be repeated. AI experiments can be easily redone if the source code and datasets are released into the public.


I'm thinking that the deep learning field will become like web development: it's becoming so simple that everybody can do it.

This article gives some evidence: [1], quoting:

> researchers were a bit perplexed when actress and director Kristen Stewart [2] appeared as an author on a machine learning paper.

[1] https://techcrunch.com/2017/01/19/kristen-stewart-co-authore...

[2] https://en.wikipedia.org/wiki/Kristen_Stewart


I might be wrong, and I don't want to question the possibility that someone from a non-technical domain can contribute to technical domains (in particular because neural networks and style transfer are basically high school mathematics; and since this happened before [1]), but it seems to me that the paper in question does not contain much technical innovation, but it is rather a demonstration of how existing techniques can be used in practice [2].

[1] https://en.wikipedia.org/wiki/Hedy_Lamarr

[2] https://arxiv.org/abs/1701.04928


I've heard that claim a couple of times now, but never read or heard experts in the field say anything like that. Do you have a source?

Of course, you can do fancy things with neural networks without fully understanding how they work, or why they behave like they do. But my impression is that common applications, like image classification, are quite well-understood now.


I think you are right to make that distinction. Research papers are probably more about other, less known applications.


What? They do explain their results, and as you can see from the references section of any paper, other researchers read the published research and then build their new models using that information and continually refining the process.


Sure, they do their best to give an explanation after discovering it works, but nobody ever publishes the hundred other models they tried that don't work out.


This is a problem in most areas of scientific research. There is no incentive to publish negative results, so no one does.


Yes, they explain it, but in a very hand-wavey way.


This is the most thorough write-up around computer vision.

Can anyone else recommend other equally broad works worth looking at?


Two Minute Papers on YouTube does short overviews of cutting edge graphics and vision research.

https://youtube.com/user/keeroyz


+100! This is absolutely amazing! Would love to see a similar high-level overview of progress in other tasks where CNNs (or other neural nets) have had such a tremendous impact.


Is it me or is the advance in AI seem much faster than any other tech that came before it? The rapid advancement in just one year is pretty mind boggling. Very hard to catch up.


I believe it is because much of the ideas were already formulated, but the recent advancements in silicon tech, especially GPUs was a huge contributor.

Pair that will exceeding amounts of data being collected from/posted by users, you're got a great catalyst for major companies to be interested, which in turn lead to more research and refinement of old ideas.


As someone who has been doing machine vision related work for most of the last 15 years, it is exceedingly gratifying to see my field 'growing up' and achieving such success.

12 years ago I was working for a machine vision startup with a Prof. two post-docs, a sales guy, and me as the only paid employee.

Now, I'm more likely to be working in a team of dozens, if not hundreds of machine vision researchers and developers.

Quite a change.


I thought it was slower. A lot of the core stuff goes back to the 60's and it's all been slow, incremental improvements since then. The biggest game changer has been hardware finally becoming powerful enough.


Yup, most of the progress, such as YOLO, which, while impressive, ultimately essentially amounts to incremental innovation, there really haven't been any paradigm upturning changes.


Remember Smartphones only started really existing in 2007. Mobile phones are from 1990s. Personal computers from 1970s. Transistor from 1920s

Pace of innovation is generally increasing as more and more complex stuff becomes commoditized. But also we forget just how recent a lot of things we take for granted really are.

Give it 10 years and AI will feel just as common and boring as a laptop does today. Remember how excited you were in the 1990s when you upgraded a new PC.


I'd use the launch dates of the products and switch theorizing of the transistor to a more relevant milestone, after which it's less dramatic (or even slowing):

Microprocessor 1971 - Personal computer 1976 - mobile phone 1983 - smartphone 1996.

Your most recent date, smart phones, warrants more specific correction since it's so recent history: Nokia Communiator 1996, i-mode phones 1999, Ericsson R380 2000, Danger Hiptop 2002, Blackberry 2002. 2007 in smartphone history is marked by the introduction of the iPhone 1, 11 years after the first smartphones, but it didn't become popular quickly and didn't have apps etc. In 2013, smartphone sales surpassed feature phone sales.


> Transistor from 1920s

Since 1947, when it was invented.


The field effect transistor was patented in 1926 (https://en.wikipedia.org/wiki/Field-effect_transistor) - practical devices came later.


Touche.

I must have been thinking of those glass things that came before transistors and filled a similar purpose then


Radio tubes, specifically the Triode (basic amplification tube, a lot like a transistor only not solid state and much higher voltage). Invented in 1904.


You get that impression because the article focuses on the problems at the bleeding edge of progress that have seen a lot of improvement right at this time. There are lots of problems that had been solved before and there are lot of problems that AI still can't solve.


Keep in mind that deep learning is mostly pattern matching, not true AI.


At 1,50 in the Yolo V2 youtube video I couldn't help but laugh when the algorithm detected the gun in the woman's hand as a cell-phone, made me think of the whole ET Walkie-Talkie/gun censorship scenario.


Yeah, I saw that. It seems that the algorithm was trained and optimised only for that video. Person, bike, suitcase, tie, chair, cellphone.


Car AI scenario: classifying human as pavement with 100% certainty.


That's better than having the algorithm detect your camera as a weapon.


The standard of super-resolution algos is impressive. I wonder if anyone has tried up-ressing the sound and audio of pirate videos by training models on Cam vs Blu-ray


Could be a fun game to guess the movie based on only YOLOv2s output (https://www.youtube.com/watch?v=VOC3huqHrss).


Great summary of recent Computer Vision. Over 203 references!

One exciting topic I did not see mentioned (maybe I missed it) is image denoising. Excellent paper here - https://arxiv.org/abs/1608.03981. The CNN proposed in the paper is really compute intensive but the denoised images are really amazing.


This is pretty nice work, but isn't it a summary of 2016 work?

ImageNet 2017 has happen since then[1] (small incremental progress), but lots of progress in things like super resolution.

[1] http://image-net.org/challenges/LSVRC/2017/results


Excellent!

I'd suggest anyone interested in jumping into building computer vision models should check out the excellent free MOOC on fast.ai


Nothing about capsules?


Capsule networks[1] have literally not been used on anything larger than what is considered a toy dataset for computer vision yet. Whilst there is potential there I'd argue it's too early to ascribe extreme levels of excitement towards it.

[1]: https://arxiv.org/abs/1710.09829


Very good explanation of capsule networks: https://www.youtube.com/watch?v=pPN8d0E3900


But Geoff Hinton...


It's new and interesting, but hasn't outperformed any existing methods yet. High representation efficiency though.


> High representation efficiency

Capsules do require much fewer parameters, they generalize 10-20% better to new viewpoints, they are much more robust to adversarial examples and can better recognize overlapping objects; but, on the other hand, capsules currently require much more training data to achieve the same performance, even though in theory (if they would actually learn inverse graphics) they should require less data, and they add a lot of expensive additional structure (roughly 10X). I am rather pessimistic about whether the approach will lead us anywhere; it seems sub-optimal to model all possible child-parent configurations explicitly. That has a quadratic nature to it and my hunch is that can be done sub-linearily.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: