Hacker News new | past | comments | ask | show | jobs | submit login
Kinect finally fulfills its Minority Report destiny (engadget.com)
120 points by shawndumas on Dec 9, 2010 | hide | past | web | favorite | 44 comments

gorilla arm: n.

The side-effect that destroyed touch-screens as a mainstream input technology despite a promising start in the early 1980s. It seems the designers of all those spiffy touch-menu systems failed to notice that humans aren't designed to hold their arms in front of their faces making small motions. After more than a very few selections, the arm begins to feel sore, cramped, and oversized — the operator looks like a gorilla while using the touch screen and feels like one afterwards. This is now considered a classic cautionary tale to human-factors designers; “Remember the gorilla arm!” is shorthand for “How is this going to fly in real use?”.[1]


[1]: http://ftp.sunet.se/jargon/html/G/gorilla-arm.html

humans aren't designed to hold their arms in front of their faces making small motions

Well, humans aren't designed, but we did evolve holding our arms in front of our faces making small motions.


I know it's hard to remember around here, but it's not just a UI buzzword.

(Which is why I see promise in this tech; using touchscreens is unnatural, but using gestures in front of your body to communicate with something a couple of feet away is quite human.)

With our upper-arms nearly straight down most of the time, yes. Our biceps are quite durable, and have to lift less than 1/2 our arm weight; our shoulder muscles have to lift the whole frickin' thing, and from a worse leverage standpoint if your arms are "out".

Maybe that's the solution to gestural interfaces - quit with the in-front-of-my-face stuff, and go towards more sustainable movements, like we already have with keyboards and mice; everything at elbow height.

the phrase, "for 8hrs a day" is implied...

Of course, that's a valid point. But again, lots of (perhaps most?) people engage in conversation for several hours per day, and a lot of that time is spent gesturing, mostly unconsciously.

I'm not envisioning it as a primary input device, naturally (although the deaf seem to get by fine). But I don't think I've ever been in a situation where a heated conversation made my arms tired; as a replacement for a few basic mouse tasks, I bet it could even improve ergonomics.

Just try it: raising your arms as if gesturing conversationally is way, way less stressful than reaching out to touch your monitor.

Right now I'm sitting back in my chair, with my elbows on my chair's armrests. With comfortable armests I reckon I could gesture all day. If I had a hundred odd inches of monitor surrounding me like in Minority Report it might be a reasonable way to control things -- easier than moving a mouse pointer all the way from one side of a vast piece of screen real estate to the other.

I'd love such an interface precisely for the ergonomics. The mouse is very bad for my wrists. I imagine waving my hand through the air wouldn't be.

Also, the gorilla arm would go away fairly quickly. This used to make me tired:


Now it doesn't. I expect people would get used to minority report interfaces, particularly if they didn't immediately try to transition from 8 hours of keyboard to 8 hours of minority report.

My real question: is source code for this demo available?

I don't think it's absurd to use this for hours a day, given that for many deaf people signing is their only way to communicate face to face, and people will find ways to talk all day.

I work in construction, I spend my days holding 5lb drills in front of me. Using the standard gesture position (forearm down) your arm won't get tired during a day. I basically live my day in the 'gesture position' often for well over 8 hours a day, always with weights in them.

When the upper arm is lifted is when your arms get not so much tired but literally drained, which is what I think the Kinect will likely have to adapt to.

So you're taking it as assumed that there is one, singular interface that people will be using for the whole day. Even now, while designing and thinking things through I usually alternate between typing in a text editor, drawing on a whiteboard, taking notes on paper, discussing verbally with other people in the office, and just pacing up and down thinking to myself. The whiteboard for example is essentially a specimen of the much-maligned vertical touchscreen interface, and I don't see why it couldn't be made into a computationally-driven dynamic display.

So Dale Herigstad is the guy that did a lot of the design work on Minority report actually still does a lot of work in this space.

I saw him at a small conference recently talking about the future of these types of interfaces and I think where he's going is trying to make much smaller movements meaningful. One example was having EPG on your TV that you could navigate by just flicking your hand. He's looking at ways that smaller, more natural movements can be used to for controlling a lot of the interfaces around you.

(behind pay wall unfortunately ) http://www.wired.co.uk/magazine/archive/2010/03/start/dale-h...

I use a standing desk. I just realized that I can prop my elbows up and do all the finger-waving I want.

Boy, if my computer-use consisted exclusively of browsing snapshots and rotating them, this would be AWESOME.

Don't be afraid, it's not going to replace your mouse and keyboard. It's going to augment them.

That means you can just work normally on your computer, until that moment comes where you want to sit back comfortably and... browse snapshots and rotate them.

I predict there will also be much more interesting input assistance beyond photo browsing. I'm using a Magic Trackpad in addition to mouse/keyboard for a bunch of gestures (window management). However, the number of distinct gestures that can be performed on a flat surface is fairly limited.

I'll gladly take the additional gestures that a kinect-style device will give me. It could very well revolutionize the way we interact with window managers without forcing us to grow gorilla arms.

It's cool, but I'm pretty sure I can flick through hundreds of photos much more efficiently with my mouse and keyboard (and without the tired shoulders).

I think gesture based interfaces may have some quick utility, like remotely operating equipment for small sequences of activities (gestures operating an Asimo comes to mind since it doesn't yet have a real input system like a mouse & keyboard -- come here, go there, stop, move aside, etc.) but I've yet to see a demo of this interface method, in this context, that isn't wildly less efficient than existing input methods.

More efficiently than the example or in "Minority Report"? Sure, even if a less-sophisticated computer user (such as a cop) might not.

You cannot, however, use a mouse + keyboard faster than performing very small gestures mainly with fingers (as in, raise your hands from the keyboard and perform similar gestures).

I can hit the right arrow key on my keyboard hundreds of times a minute. Combined with a mousewheel for some spot checking and zooming, it only slows that down a hair.

I'm an amateur photographer and after a week of taking photos I have thousands of photos I need to vet. This kind of gesture-based interface would take me until the universe grows cold to go through a single week's worth of shots.

> Sure, even if a less-sophisticated computer user (such as a cop) might not.

I'm not sure how learning a set of rather specific gestures is less complicated to a novice user than hitting the right arrow key or the spacebar a bunch of times (or clicking on a button labeled "next photo"). I've tried to use gesture based systems a number of times in the past and have always ended up just turning that feature off, mostly because I couldn't remember which one of a dozen gestures meant the particular thing I wanted to do.

The idea is a nice one, use something that humans do all the time to build an intuitive interface, but the current state-of-the art doesn't understand normal human gestures very well. Why does two fingers vertically up mean "grab this" and not some other random gesture? In the end the problem is that we're trying to use arbitrary physical gestures, intended for a physical, 3-dimensional world, to interact with virtual objects in a 2-dimensional window, in ways that might not have any particular meaning to us as a form of normal gesturing (let alone cultural specific gestures, the number of different ways people do fairly universal gestures like point at something is mind-boggling).

I don't think that there isn't any value in this work, I just think that the application that these interfaces are currently being designed for simply don't make any sense.

Somebody else here mentioned that it might make a great deal of sense in 3d modeling sense mice-keyboard combinations are rather clumsy in that application. I happen to think using gestures as a control mechanism for robotics or machine operation might make more sense (imagine controlling a crane remotely by making appropriate hand open/close arm up/down gestures).

I agree with you that the composer-style arm flailing is not great (even if intuitive to non-techies), but I do consider smaller, closer-proximity gestures a perfectly viable direction of research.

> I'm not sure how learning a set of rather specific gestures is less complicated to a novice user than hitting the right arrow key or the spacebar a bunch of times (or clicking on a button labeled "next photo").

They are just not as good with traditional computer input methods, negating the advantage that you or I would gain there.

The point about 3D is very good, and something that can be workable even with 2D displays (in a "3D" environment, like your average FPS). Rotating, pulling things from "behind" something else etc. For certain categories of use, it is probably better than voice commands - another natural UI for humans - in the same way some of us still prefer the CLI over a GUI.

The current technology obviously is not quite there, but it is already possible to implement a virtual keyboard by observing finger movements. Combine that with gestures performed above the virtual surface, and you might have something.

Am I the only one who is thrilled more about the depth capturing and projection than the UI applications?

Since I don't own a holographic projector I just posted some images and videos of recent relevant Kinect hacks on my blog http://www.pmura.com/blog/2010/11/the-underestimated-power-o...

It was also way more efficient to drive a horse than any of the first cars.

The key here is not this precise application but the technology that it implies.

The other day I was walking in the stairs in CSAIL, and through the windows of the conference room I saw the guy filming this. I got late to my meeting (in the room above this). "The future is here", I thought.

Even if there are some bugs and issues with the interface now, imagine that just in few years this will be mainstream :)

All we need now is a more responsive kinect.

As a proof of concept this kind of interface is fine but for real use it's slow and unresponsive.

Isn't the latency experienced with the games arising in software (the algorithms running on the xbox, not the Kinect itself)?

According to the web, "The depth sensor uses a monochrome image sensor. Looking at the signals from the sensor, resolution appears to be 1200x960 pixels at a framerate of 30Hz." It would perhaps be better to be able to run it at a lower resolution and a higher framerate; maybe it's possible. Processing 35M pixels per second does sound like something that could use up a CPU cycle or two. It also sounds like something that OpenCL would be good at handling, doesn't it?


My guess would be that the high resolution is required to interpret the IR pattern at the maximum distance supported by the kinect (tens of feet). I'm wondering if the 30Hz is a limitation of the camera sensor or the pattern matching.

It's interesting how 3D, holographic interfaces have been in so many sci-fi movies etc., yet they all seem to get referred to as the "Minority Report" interface.

Relevant: http://tvtropes.org/pmwiki/pmwiki.php/Main/HolographicTermin...

It's good to look out for the little guy, but I think this is misplaced. The Minority Report interface was neither 3D nor holographic; just projected onto glass. What it did have was UI using individual fingertips as control points without touching the screen. So this system, for example, is pretty specifically a Minority Report interface.

I don't know for a fact that Minority Report was the first conceptualization of that specific interface, though.

What was interesting about Minority Report is that the technical consultant for the interface depicted in the movie is a guy who is actually working on a real solution; John Underkoffler. Contrast that with most movie computer interfaces that involve a lot of inane keyboard slamming or hand waving that has no real chance at ever becoming an actual product, and you can see where the difference in interest comes from. That's not to say that no other movie used a notable technical consultant, but Underkoffler's group have come closer to implementing the real thing than anyone else I know of.

Obviously it's still a bit choppy, but it's clear this is only a few software improvements away from being a very usable peripheral.

Just from waggling my hands around a little in front of my monitor, I feel like the touchscreen fatigue Apple made a big deal of isn't as much of an issue, since you're holding your arms upright, which has much more structural support than having your arm outstretched to touch the monitor. It actually feels like a very natural way to control virtual desktops or application switching. Additionally, this is a peripheral that can be added to any display of any size, or even used across multiple displays.

Is there a reason this tech hasn't gotten traction as a PC peripheral before Kinect?

Because using a mouse is faster and more convenient perhaps?

A mouse is faster and more convenient for some tasks, sure. You wouldn't use a mouse to replace keyboard input, though— or vice versa. Different interfaces for different tasks.

Is it really so hard to imagine a task for which gesturing is appropriate?

"Is it really so hard to imagine a task for which gesturing is appropriate?"

Yes. Can you think of some examples? I can't think of a single task I'd like to do on a desktop computer that wouldn't be easier and more-precise with a mouse. You know what's easier than pinch-zoom? A scroll-wheel. Multi-touch is great for phones/tablets, but this seems pretty silly to me as an interface for, y'know, computing.

"Yes. Can you think of some examples? I can't think of a single I'd like to do on a desktop computer that wouldn't be easier and more-precise with a mouse."

Currently we have mostly just mouse+keyboard, so it's pretty much expected that majority of present applications would be optimized to work best with these inputs.

Once we will have more expressive inputs, new types of applications will start to pop up.

For example, I often work with 3d. For such applications, mouse + keyboard controls feel very clunky. Even simple tasks like setting up cameras or placing objects in the scene are pain.

It would be much more natural to manipulate 3d objects using gestures in 3d space.

Use if for a screen on a wall in a public place, like a train station. Passengers can gesture to the viewport to see different timetables etc., but they can't actually get physically near the things and damage them. Basically: give the public access to expensive electronic interface while keeping well protected from physical contact.

So I'm sitting here typing on my laptop. To my right is a large monitor. Sometimes it's a little hard to type on the laptop while watching the monitor, so I have a second keyboard under it in case I need to do anything significant. I have a mouse between the two.

Let's say I'm coding on the laptop, and I want to scroll the docs that are up on the monitor. Three things need to happen here: 1) focus on the docs, 2) scroll, and 3) focus back on my work. My goal of course is to do this as swiftly and effortlessly as possible.

So, using the keyboard: I alt-tab from editor to browser, hit the space bar to scroll down, alt-tab back.

Alt-tabbing is more of a hassle the more programs I have open. Additionally, the scrolling is either a lot (page down) or a little (down arrow); usually what I want is something in between. It could be made easier if I had a specific keystroke set up to switch to Chrome, or to scroll by x amount, but that's far from intuitive.

Using the mouse: I move my hand down to the trackpad, or over to the mouse. Wiggle it around briefly to locate the cursor, then drag it across the screen to the second monitor. Use two-finger scrolling or the mouse wheel to scroll to where I want to. On OS X I blessedly don't have to click to focus scrolling, so I could start typing immediately, but to avoid confusion (and because I want to scroll my code) I need to drag the pointer back.

This provides a much better scrolling experience, but the process of moving focus across monitors with a pointing device is a huge drag.

Using a hypothetical desktop gesture reader: I lift my hand in front of the second monitor. This focuses on the window. I draw two fingers down (or up) to scroll. I drop my hand back to the keyboard and begin typing immediately.

My elbow stays on the table, so it's quite comfortable. I have excellent granularity in scrolling— not as good as the mouse wheel, since I'm not actually touching the screen, but probably about as good as two finger scrolling. Most helpfully, the task of focusing on the browser and back is about as effortless and intuitive as it possibly could be without reading my mind.

I browse the web a lot on my desktop, and I find myself missing my laptop's two-fingers-and-swipe-down scroll.

Perhaps you want one of these apple trackpads in your stocking http://store.apple.com/us/product/MC380LL/A?fnode=MTY1NDA1Mg...

I'm pretty sure that up until a month ago or so I would have said that a mouse or trackball is the "best" interface approach. However, we were away for a couple of weeks and I only used my iPad during that time.

Coming back and using a desktop PC with a mouse felt very strange - so I'm now far more willing to consider that approaches like this might be feasible after all.

Movies like Minority report can now be made with $150 instead of expensive CGI.

I could see this on a desk, like MS Surface.

Now the cpu needs to be about 3x faster so there is no perceptible lag (it's pretty obvious right now).

Very nice of Microsoft to back down on the anti-hacking threats, this is great for everyone.

I wonder how long until someone hacks it to help certain disabilities.

Looks interesting, and maybe something that's actually useful will come of it, but I don't think I'll be trading in my mouse anytime soon.

Add a really awesome voice control interface to this and I could see it being nice in a consumer context.

I like the part of the video at 0:30 where my arms would be completely worn out.

Looks like it might be a nice iPad app.

getting closer, but not there yet.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact