Hacker News new | past | comments | ask | show | jobs | submit login
FingerIO: Using Active Sonar for Fine-Grained Finger Tracking (washington.edu)
637 points by jonbaer on Mar 18, 2016 | hide | past | web | favorite | 118 comments



Regarding everyone's latency concerns, as someone who has done low-latency audio processing on Android -- in their defense I'd bet almost anything the demo is meant to only demonstrate the math behind this. Depending on the platform (Android cough), low latency audio processing can be almost a dark art itself. And hey look, they're doing this on Android.

My guess is that they decided to release the demo earlier instead of spending days/weeks getting up to speed with low-latency audio processing in the Android JNI.

It's an academic demo/press release. Not a software release for production/market.


I'm curious, besides doing things in C/C++, are there any "magic tricks" to doing low-latency audio processing on Android? Looking at the chart here [1] it still seems to be a good bit behind iOS.

[1] http://www.androidpolice.com/2015/11/13/android-audio-latenc...


In the Java layer? About the only thing you can do is ensure that you're using the devices native sampling rate: typically 48 kHz for phones, and 44.1 kHz for tablets. Non-native sampling will induce a rather large latency hit. The buffering + buffer size stuff, unfortunately, is really only accessible in native IIRC.

To be completely honest, it's been a long time since I've messed with audio-stuff on the Java-side, so not sure if/how much things have changed for the better.


Thanks. I actually meant in addition to doing the computation side of things in C/C++, are there any undocumented tricks or pitfalls to avoid?


This is a good explainer of the various latencies in the Android audio pipeline: http://superpowered.com/androidaudiopathlatency


And a cool follow-up article: http://superpowered.com/android-marshmallow-latency

Last time I worked on Android Audio was around the 4.1-4.2 era, and it was absolutely brutal compared to iOS. Glad to see that it's improving finally!


A lot of production input devices for stuff like waving hand gestures and pen input also have unacceptably high latency. Getting latency down is hard, and makes a huge difference to usability.


I work with sonar and the physical positioning of the sensors is important in trying to get useful results. Why is it these academic types don't release the apks or software? Just publications and maybe a video.


You should try asking these academic types for the data and source code.

A lot of researchers are more than happy to discuss their work, but a big part of the academic industry is your research's impact and references. A way to get a better handle on who is looking at or following on with your work is to implicitly ask them to have a conversation with you before getting the whole kit 'n caboodle.


These sorts of comments are why I come here. Thank you. Without prior awareness that something is common practice it can be extremely difficult to recognize it as an option. I simply lack the motivation for this behavior. If I were to release a teaser about my work it would be because I'm not ready to release my work.


I read an abstract for paper once (on debluring images) and could not find the original paper for free. I emailed the author about the situation, and got a full color hard copy in the mail shortly after. My best hope had been a .pdf by email, but he exceeded that by far. It's still on my shelf a decade later, while a .pdf probably would have gotten lost in the GBs.


Additionally, a number of these research programs take the source code and make a go of a small business product.


And in many cases the universities own the IP associated with the research that goes on in their departments so they keep the source and treat it like a company would treat a trade secret. It's likely Washington will patent this and try to license the patents.


What's the legal situation with these academic research papers? Am I allowed to implement an algorithm from a research paper and then either sell the software or release it as open source?

I assume other academics are allowed to reimplement methods in order to reproduce the result and to compare to their own methods. Can I do the same as a learning exercise?


I think we left the topic of "science" about 4 posts up. I don't know what's being described here, but it's not science. Yet somehow, I get the feeling that I'm paying for it.


> implicitly ask them

I think you mean "explicitly"


No he means implicitly. By not providing everything the academic implicitly requests a conversation before providing everything. The request is never stated but is implied by the assumption that there is no other way to acquire everything.


Ah I misunderstood who was implicitly requesting the conversation, that makes sense.


I actually assumed that before. The way you had quoted it was revealing. My comment reeks of nerd-rage now that I re-read it. Damn italics. I should have mentioned that it would be fair to say: The academic wants an explicit request for everything. I can see how it would have been easy to misinterpret his statement in this way. It's true. It's just not what was stated.


[deleted]


If you're reading the paper you're aware of their existence. If you want to know more, you have to ask them, but they never directly (explicit) state this. It's implied and something of a cultural (academic/research culture) unwritten rule.


Did you ask? Many years ago I decided to reproduce an algorithm used to detect copy/paste image modification.

http://blog.jgc.org/2008/02/tonight-im-going-to-write-myself...

The researcher was happy to provide me with their test images to verify that my implementation worked. Try asking.


Have you considered providing this tool as a service? It would be great to easily detect forgeries without CLI experience.


No. I wrote that 8 years ago, I've open sourced the code, anyone who wants to take it and run with it can. It's too much work for me to maintain a system that would do this for people.


It depends on the conference or journal they submit to. I typically request that authors release data and code in the review and the same is requested of me when I submit a paper for review. I don't know, maybe CHI doesn't have that sort of culture. Or maybe they do and these students just don't have the time right now and plan to do it right before the conference in May.


Indeed, there is no culture of replicability at CHI (and even less so at UIST). Reviewers usually reward novelty and cool PoC videos, not thoroughness. It is quite rare (especially for U.S. labs) to also publish source code, schematics, or raw data. There have been some initiatives advocating for replicability, and some researchers indeed publish everything, but in the whole, a quick, shiny video of a PoC implementation is often sufficient for a paper to be accepted.


Having been in this kind of situation before: because the software isn't ready for any kind of release. They probably spent a week gathering and tuning the parameters on the DSP for each individual phone they ran it on, and without their knowledge of the system it'd take you a month and a half to get it working on your phone.


If they are publishing a paper, their methods (including custom source code) are ready for release for peer review. Ftheit software sucks, thet should not rely on it for scientific results.


Here's a case that I have published under before: Their software is ready for their own use for research by them personally. However, it has no help, documentation, automation, magic, or even small amounts of assistance. It doesn't work until you have it spit out a page of numbers, which you feed into Matlab and hack on for a week before compiling a new copy of the app that has the resulting calibration matrix baked into the code. The instruction manual for the apk you're asking for would be equivalent to a bachelor's in computer science and half a PhD on DSP and machine learning. That's why they're not releasing it. It proves that their math works and the approach is valid. Replacing their expertise and making it fast enough and reliable enough for general release would require a startup, six developers three of whom need PhDs, a UX team, and a year and a half of work. The approach works and they can prove it. That's it.


Product development isn't actually the purpose of science.


I wonder how accurate it really is. The demo video didn't match up with the movements at all and the on-screen drawings looked like prerecorded video that they were trying to sync to.

It's a neat idea, but without a dedicated component or an extremely high-speed RTOS, you're not going to come close to the level accuracy that's really needed to do the math and still allow interaction.

I don't mean to rain on the parade, but I just don't think they really have anything usable.


To me it looked like a ton (two seconds or so) of latency, not pre-recorded video.

And I would expect latency for such heavy DSP work on a phone.


This is correct. The movements match up just fine, but the reaction time there is something above 2 seconds.


They need to buffer to get a Fourier transform and to do the autocorrelation.


They say they use an "inaudible high frequency soundwave", so that should be > 20kHz. Shouldn't a buffer of a few milliseconds be more than enough then?


Presumably the buffer is longer to make the system more robust by avoiding spurious detections, not because of some fundamental limit like the Nyquist rate. You would need to set the buffer size experimentally.


> I just don't think they really have anything usable.

I'm not sure that's the purpose. I wouldn't think of this as being intended to be a fully usable product right now. They could be intending it merely as an interesting experiment to explore new possibilities for interaction with mobile and wearable tech.

Another CS student came up with a virtual keyboard using the iPhone's accelerometer[1]. It only had ~80% accuracy[2], so was it all that useful or practical? Probably not. But could it lead to another person or company refining the technique for production in the future? Certainly.

[1] Video: https://vimeo.com/49780741

[2] http://www.gottabemobile.com/2012/11/13/cs-student-turns-iph...


I bet you have RSI within a week banging on a desk like that.

If it comes down to using something other than your fingers, someone has the nose working as a user input device:

http://www.looknohands.me


I would totally disagree. I don't think it would have to have much fidelity or low latency be insanely useful. The huge advantage is no extra hardware necessary. And maybe it might be hard for you to believe it could do what you want it to do but I'm guessing your vision is pretty narrow.


This reads like a personal attack.


If the tech is great then wonderful, but I still remember LeapMotion...


Have you tried the Orion SDK? It's an order of magnitude improvement in tracking accuracy, even with the older hardware. https://developer.leapmotion.com/orion


I tried the "massive improvement" they released before Orion and it was still terrible.

Only so many times a company can say they've got their issues ironed out before I stop believing them.


Orion really is quite impressive. I tried the improvement as well. Orion actually works.


Hmm, I'll have to try it out tonight. I have some 4-year-old v0.6 and v0.8 hardware in a drawer somewhere.


LeapMotion is not precise?


We need to immediately improve on PIN-code protection upon cash-withdrawal in ATMs. The problem has been there for a while, but man, it gets easier and easier.


Or be (not, it would seem) overly paranoid like me, my PIN patterns and entering do not involve moving my fingers horizontally, I put my hand down on the pad with my fingers on set keys, cover it with my wallet, and type it in. I also always double one key to make it that much harder to get by observation, a trick I learned from a sysadmin with major access to my school's systems back when we all had to use public terminals a lot.


By keeping your fingers longer on the keys you're actually making it easier for the person after you to just take an IR shot of the keyboard and reduce their search space to 16 combinations or less.


This is designed to foil garden variety skimmers now and in the foreseeable future, not someone "after" me. Who's going to go to that much trouble when there are I presume many many more people entering their PINS in ways that are easily skimmed?


What does it mean to “double one key”?


Have the PIN or password repeat one key in the sequence. Like "mwfabrrpg", if you type quickly an observer won't notice that the 'r' was typed twice in quick sequence.


That's a great observation -- while it's not like ATMs were super secure to start with, now anyone who can mimic this sonar tech can put any device that just looks like it's supposed to be at the ATM near the pad, and pick up people's PINs


You don't even need the sonar, because the buttons are clicky and you should be able to triangulate the origin of the sound quite easily. Also, I guess this would work with just one microphone with simple pattern matching (I assume every click + echo patterns from the structure makes every button sound quite different). The microphone should listen vibrations in the structure (not air waves). The device could be quite far away from the keyboard if it's connected in the same structure and can hear the clicks.


Sorry, I'm not sure how this is relevant or if this sentence even makes sense. Can you explain?


I suppose he means that a small microphone/speaker setup near an ATM machine's keypad could allow you to track the position of the finger and get the corresponding keypresses without any intrusive modification to the ATM itself.

Of course that would only be valid if you proceed to then steal the person's card right afterward, so all in all, not that useful.


Isn't it easier to just put up a small camera? Then you can also record the card's number if you fail to steal it afterwards.


It would be easy to build a device that did something similar to this and captured pin codes on ATM machines.


Currently you can see everywhere ATMs are asking you to visually check that keypad and card-input looks exactly like on picture. With this technology, you don't need to make a fake keypad to "hear" PIN sequence, you could just "listen" to it from somewhere else, some place not seen well.


I guess he means that you can place a smartphone near an ATM and catch the PIN-code with this technology somehow.


Really interesting.

Reminds me of SOLI (which is radar rather than sonar): https://www.youtube.com/watch?v=0QNiZfSsPc0

Is there a way of trying this out? I know it'd only be demo line drawing applications but it'd still be interesting to try.


I love it when people find ways of using existing hardware with software innovation to make new interactions such as this!


I'm looking forward to the first theremin app.


I assume it would be impossible because the sound of the music would interfere with the sonar?


Wouldn't the music only be on the audible spectrum? Sonar works (can work?) on the inaudible spectrum.


Headphones


I wonder how much power all the processing draws. Judging from the slow movements and the delayed update on the screens in this video, it's pretty heavy on the processor.


The question is, is the multisecond latency because of processing and code efficiency limitations for a academic research project, or is it because the data is unusable without two seconds of smoothing? Given what I see in the video I could argue either way. I do note the latent signal is still a bit noisy, but then, touch screen input isn't necessarily clean either.

But it's also worth keeping in mind this is all off-the-shelf hardware. It seems very likely to me that if a cell phone or smart watch was designed to do this from the get-go that several easy hardware improvements and maybe a bit of custom DSP work would make this work much better. (By "easy hardware improvements", I mean things like speakers intended to emit frequencies for sonar, microphone arrays intended to receive them, etc.) From that perspective, even if the system we saw is fundamentally limited I'd still call it incredibly promising considering the constraints it is operating under!

If I were a smart watch manufacturer I'd be falling over myself to get one of my best engineers and one of my best recruiters an appointment with these people.


The accompanying paper[1] claims the phone lasts four hours running the current version of fingerIO, but also that improvements could be made to preserve power (such as reducing the sampling rate).

[1] http://fingerio.cs.washington.edu/fingerio.pdf


They are calculating autocorrelation and a Fourier transform so they need to buffer the data. Two seconds is probably the shortest buffer that works reliably.


Uhh.. not the best name choice in my opinion.


Yep. When I read the title I thought it was a parody on IO-names.


Cats will love it. Thats why the first ultrasound remote never became standard


Aww, this is just going to wreak havoc for whales and dolphins once divers start using these! ;-)

https://en.wikipedia.org/wiki/Marine_mammals_and_sonar


What do you mean?


I suspect he means that this would be audible to cats and have unintended side effects.


Similar to SixthSense [1] work from Media Lab.

[1] https://www.media.mit.edu/research/highlights/sixthsense-wea...


This is much like Project Soli. I started dreading devices like that. Sonars everywhere.


First thing i thought of is the horrible privacy implications when this tech becomes ubiquitous and cloud based


I can understand the "I", but what's the "O" part of this?


The images drawn on screen?


hm, that logic would mean a computer mouse is an input-output device, not only an input device.


This is not same as FingerIO since it does not uses sophisticated signal processing but still interesting. Make sure that you remove earphones before using it.

https://danielrapp.github.io/doppler/


This is cool. Thinking about smartphones as sensors opens up so many possibilities, even if their capabilities aren't nearly as accurate as dedicated devices. Wondering if the sonar information can be combined with images from the camera to create a close-range depth camera?


Latency looks like a real issue in this demo. If it can be improved this could be important, but think about how impatient you are if your smartphone doesn't respond to your touch immediately. Users have been trained to be irritated by laggy interfaces.


There has been a lot of recent work with gesture based computing: Intel Real Sense, Google's Soli, Myo, Leap Motion

https://github.com/melling/ErgonomicNotes/blob/master/README...

Leap Motion made huge improvements a few weeks ago with their Orion SDK:

http://venturebeat.com/2016/03/04/leap-motions-hyper-accurat...

We must be close to actually getting something basic for our desktops.


This looks interesting. Can you try it somewhere?


Doesn't seem like it. Until anyone other than the researchers tried it, I don't really believe it works outside of controlled environments.


How about security? How does it protect other uses from controlling your device?


How does your phone protect other people from pressing your volume buttons, or using your touchscreen? Security is pretty irrelevant for a HID.


it's obviously pretty different when you don't have to be touching the phone to interact with it


Not when it's a few inches away. It's less distance than voice interaction.


A lot of people are a few inches from my pocket every day. And I can easily imagine software listening for such sonars and sending back fake responses. I think debarshri has a valid point.


> A lot of people are a few inches from my pocket every day.

Really? And your phone is on? With the screen visible for them to see what they are interacting with. And as they are fumbling around in your personal space inches from your hand or pocket, no one notices.

Get real! If true you should be more worried about pickpockets than some random gestures.


According to their paper (which is written well IMO), their prototype uses a double swipe gesture to trigger or stop the detection. I suppose the idea could be extended to use something like a lock/unlock pattern, similar to the swipe patterns on Android lock screens.


I wonder about the other kind of the security. I doubt there's any serious protection against a rogue background-running application tracking your finger and gathering sensitive information, like keystrokes.

You know, check for what's the foreground process, notice it's a password manager or bank application (or even a lock screen would do, who knows, maybe user had reused the same PIN elsewhere), blast the speakers with ultrasound and wait for the typing-like movement patterns.


I have a feeling this is also really depending on the hardware. The demos were probably designed around the specific brand of watch and cellphone since they'd need to know exact distances between the microphones/speakers.

It's a really cool concept. I wish they'd open source what they have, or at least have plans to open source it. However if this came about via University funding, they'll probably claim IP on it. If it was a student's own fellowship, he/she/they might decide to create a start-up out of it.


I wonder how well it would work in a noisy environment?


How many microphones do cell phones typically have? I guess I assumed one, though background noise cancelling would certainly be improved by having more. For this kind of positioning it seems the minimum needed would be 3 - and the Android SDK can access those audio streams separately unprocessed? Pretty neat.


Isn't it too early for April fools ?


Exactly my thoughts. I was watching the video thinking this is a bit far fetched. Just like when I learned about Google's moon base.


I was thinking of something along these lines for a proximity sensor / motion detector application where you don't need very much accuracy.


I wonder if you can just run it on any smartphone or you need to configure the positions of microphones beforehand


What happens when there are multiple devices around each other emitting the signals? Great proof of concept.


I don't get it, how do they track specifically the tip of a finger? Or are they not?


they probably track the closest object to the phone. just a guess. then ignore subsequent echos for a split second.


Ultrasound tracking is always problematic, because of all the noise.

I have the feeling, every few years someone has the idea again, to use ultrasound for something and it starts promising, but then the accuracy and lag doesn't go away and dogs and cats go wild.


cool idea but the tracking will never be good enough to be practical.

if people really want this type of interaction then phones will start to incorporate specialized hardware for it.


For a second i tought we were April 1st


It might as well be alchemy


sonar keylogger enabled


very interesting!


It's impressive!


Very cool, good job :)


Fake video.....


My intuition tells me this just doesn't hold water with respect to information theory.. i.e. the number of bits of useful information about a finger you can pull from a microphone. Putting aside human digits, has anyone even demonstrated that you can reliably detect an eighteen-wheeler rig moving toward a phone with this technique? And what about the range of the speaker? Complete nonsense.


Your intuition is probably wrong: off-the-shelf consumer-grade microphones can typically gather 44.1-96k samples per second at 16-24 bits per sample. That's a lot of potential information.

Also, consider that your ear/brain apparatus estimates object positions and occlusions from audio signals all the time.

Also also, the eighteen-wheeler problem is vastly different from the finger problem, as the latter is smaller, slower, and closer to the microphone, each by 1-2 orders of magnitude.


Are you suggesting the video is fake? I think your intuition is wrong.

To address your comment more directly, I don't see any information theory type limit immediately applicable here for finding xyz coordinates using echolocation. That's done in a variety of contexts.


From an information theory perspective, think about the number of samples. Even 22Khz yields a tremendous number of individual points.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: