

Navigation for the Visually Impaired Using a Google Tango RGB-D Tablet - DanAndersen
http://www.dan.andersen.name/navigation-for-the-visually-impaired-using-a-google-tango-rgb-d-tablet/

======
DanAndersen
This is a project I've been working on for the past couple of months, using
the Project Tango tablet for a navigation system for people with visual
disabilities.

It uses pose estimation and point cloud data to (1) build a chunk-based voxel
environment of the user's surroundings, (2) render a set of depth maps
surrounding the user, and (3) use the depth map and OpenAL to generate 3D
audio that gives indications of where mapped obstacles are.

I don't have it at a state where folks can try it out, but I did do a writeup
of my approach and wanted to share it.

Demonstration video (with quiet audio) here:
[https://youtu.be/EnNuDiJazBs](https://youtu.be/EnNuDiJazBs)

------
dm2
This is awesome, great job, I think you've just given humans echolocation.

If someone was given a similar device at an early age that was semi-
permanently attached to them, would their brain possibly be able to create a
map of the room?

There have been previous attempts but the Tango device didn't exist then so
the hardware was bulky and usually required a backpack.

~~~
DanAndersen
I definitely think it would be possible. I find it interesting to think about
eyesight in the same way -- even though an image is projected onto our
retinas, there's not a little homunculus looking at our retinas to see the
image; it gets translated into electrical signals that our brain interprets.
There seems to be a great amount of plasticity in the brain that lets us remap
senses and view tools as extensions of our bodies.

There has been some prior work on using depth cameras for navigation for the
visually impaired. For example, a smart cane can detect objects beyond its
reach and give haptic feedback. Microsoft Research did some work with putting
the Kinect on a helmet and giving audio cues for navigation
([http://research.microsoft.com/pubs/184208/VisionForTheBlind....](http://research.microsoft.com/pubs/184208/VisionForTheBlind.pdf)).
What I'm interested in is taking that sensory input and making it less
immediate by giving it a memory -- letting it build up a picture of an
environment rather than needing to point a device at something in order to
know something about it.

One big issue is figuring out how to sonify depth information so it's useful.
One simple approach is to do a sort of sweep across each frame from left to
right, letting each row of an image correspond to a certain pitch. I don't
think this is a good approach, as it seems very vision-oriented and is likely
to sound just like noise. Maybe if someone was using it from birth, but for
relatively fast training I doubt that approach. Other approaches do more
interpretation -- Microsoft's work detected faces, walls, and floors, giving
each a distinct sound for greater recognition.

~~~
bramd
> What I'm interested in is taking that sensory input and making it less
> immediate by giving it a memory -- letting it build up a picture of an
> environment rather than needing to point a device at something in order to
> know something about it.

Have you tested this approach with blind users? I think building a picture of
an environment is a good task to offload to the brain and a good skill to
have/develop for blind people.

> One big issue is figuring out how to sonify depth information so it's
> useful. One simple approach is to do a sort of sweep across each frame from
> left to right, letting each row of an image correspond to a certain pitch. I
> don't think this is a good approach, as it seems very vision-oriented and is
> likely to sound just like noise.

I think this is a quite good approach, but agree it has a high learning curve.
However, that high learning curve might reward the end-user with a system that
is more flexible. By preprocessing the input and generating audio based on the
detected patterns you limit the applicability of such a system. That being
said, a generic system that gives "unfiltered" output and has additional cues
you can set for example for fast approaching objects might be useful.

