

Microsoft Research uses Kinect to translate between spoken and sign languages - hackhackhack
http://thenextweb.com/microsoft/2013/10/30/microsoft-research-uses-kinect-translate-spoken-sign-languages-real-time/

======
mabbo
There are three primary problems with what I see in this video. I base these
arguments on the 3 years of American Sign Language I studied in University,
and the thousands of dollars I spent drinking and talking with Deaf people and
other ASL speakers. I'm by no means perfectly fluent, but I get by. Beer
helps.

First issue: The signing they are doing in the video is very simple and
limited. Not just in the number of signs, but in the complexity of what they
are doing. Their face remains perfectly neutral. They move slowly and
carefully. They are signing individual signs one at a time. That's not how a
signed language works!

Signing without facial expression is like speaking in complete monotone,
making no eye contact with the person you're speaking to. You can probably get
your point across, but you're not having a real conversation, and you're
missing something important. The lack of flow and motion to the signing is
also awkward as hell. In short, the people in those videos are not actually
fluent in sign, they're trained actors who have memorized certain motions.

Second issue: There are concepts in sign language that have no equivalent in
English, or many spoken languages.

One example- in ASL you have variables/registers, in the programming sense.
You can sign "John", then point to your left. From now on in this
conversation, if you point left, you mean John. If while you speak, you want
to describe John moving to your right, you can do that (explained next). Now
pointing right means "John".

How might you 'move' John to your right? You could have a single finger
pointed up, like John was a finger puppet, and walk him to the right. If he
got into a car, you could have that single finger get into another specific
hand form that generally means 'vehicle'. Then you could draw a tree with your
free hand, and drive the car into the tree. You just told the story about John
driving into a tree. There were no specific nouns or verbs used. This is how a
lot of conversations work in ASL.

The point I'm trying to illustrate above is that description and conversations
happen very differently in signed languages than they do in English. It isn't
that the words come in a different order, it's that the idea you are trying to
communicate is explained in a completely different way. You're comparing
apples and coconuts.

Third issue: every city has its own dialect. Signed languages aren't written
down, there isn't much global media of people signing, and the result is that
dialects change constantly. Drive to any other major city, entire words have
changed. Talk to a family across the street, they might have different signs
for some words. Sure, the core language is the same, but you've got a lot of
different nouns and verbs.

Going even deeper, some people here have already mentioned the difference
between ASL and Signed English. Some people sign English words one at a time
in the order of English grammar. Some people go into the 'pure' ASL realm, as
I described above, and use very complex concepts not used in English. It's not
a choice, they'll have simply learned ASL that way. Most people are somewhere
in between the two extremes, and it's a continuous domain, not discrete, with
a huge variety.

I won't discourage research into this, and I think what they're doing is
awesome- I considered doing a master's thesis on this exact idea. I just want
to put a nice big disclaimer here that this isn't useful yet, and there's a
long way to go.

Edit: Wall of text much mabbo? Woah man, take a breather.

~~~
Shivetya
Well I seem to remember when they were giving computers the ability to
interpret speech they were pretty limited in what they could do. Even training
your own machine with available software at the time required you to go
through sequences.

So I see this implementation as the same, its part of the long road. First,
start with the basics. Prove the machine can recognize what you intend. Then
step it up. From there you can broaden it other dialects and what not.

Still you cannot help but cheer them on

------
jrochkind1
This is really neat.

> _While this is clearly a massive achievement, there is still a huge amount
> of work ahead. It currently takes five people to establish the recognition
> patterns for just one word._

I don't get it. They're saying they need five (only five!) seperate people to
train the thing, and then it works pretty well for everyone? That seems to be
working pretty darn well to me, I don't see the problem or 'huge amount of
work'.

Probably something lost in the journalism.

~~~
blackkettle
yeah for speaker-independent ASR you typically need hundreds of speakers and
thousands of hours of data.

~~~
jrochkind1
I think the journalist must have misunderstood something told to them about
the limitations of the project, and failed in reporting it.

------
batbomb
Having worked for leading Video Relay Service for the deaf, I can say without
a doubt that, unless this is _seriously_ flawless, it won't catch on very well
with the deaf community. Also, it's been anticipated for a long time. I don't
see it disrupting VRS services for at least a decade. Furthermore, in the
video, everyone signing is doing so at a snail's pace.

That being said, the FCC still pays $6 a minute for VRS services.

~~~
evincarofautumn
Yes, the pace is extremely unnatural, akin to enunciating every syllable in
spoken language according to “standard” pronunciation. However, this is not
yet a consumer product: it’s foundational research toward better systems in
the future. Eventually it could be as good as existing speech-to-text
solutions—which is to say not very good, but tolerable.

------
calbear81
I could see this being an awesome sign language learning tool that would give
me instant feedback on whether my signing was done correctly.

------
Benvie
Michael Crichton had this shit working for Congo with apes like two decades
ago. This is nothing! _hacker news dismiss w / flourish_

------
nobodysfool
This would be a good application for google glass (would only work one way).
Other than that, I see no benefit over a simple pencil and paper for face to
face communication, or a simple text chat for anything other than face to
face. Although I will say that I have had text chats with deaf people before,
and the text was definitely based off sign language (at times it was difficult
to understand because the grammar is so different).

~~~
Kronopath
Care to elaborate on the grammatical differences? You've got me curious now.

~~~
robflynn
I cannot speak for other sign languages, but ASL uses a few different forms of
word order -- Object, Subject, Verb is used a good bit. There are often
rhetorical questions thrown in that are immediately answered by the signer.
Facial expression is also very important to convey meaning.

A few examples: The signs for 'wish' and 'hungry' look the same. The
difference is the facial expression.

If you wanted to sign, "All we want to do is eat your brains", the order would
most likely be: 'We want do-what? Your brains eat.'

There is another form of signing called 'Signed English.' It borrows most of
ASL's signs but puts them in standard english order. It has been a bit
controversial in the past, though.

You also run into odd colloquialisms or regional signs. I'm not quite sure
what to call them. A quick example is the word 'grass.' I learned the sign for
grass but, for some reason, that sign means 'truck' where I live currently.

Anyway, I'm rambling. If you have any specific questions I can try to answer
them.

 _edit_ Watch this youtube video with captions enabled. The video is captioned
in english and in "ASL":
[http://www.youtube.com/watch?v=UQYjZc7gKXc](http://www.youtube.com/watch?v=UQYjZc7gKXc)

~~~
Kronopath
Thanks a lot! That video especially is very illustrative. And Jonathan Coulton
is great, although the concept of an "ASL song" kind of baffles me.

~~~
robflynn
It seems odd at first, but 'ASL Songs' allow deaf and hard of hearing
individuals to better experience that bit of culture (music.) Some folks are
simply hard of hearing and may have trouble understanding what they're
hearing.

ASL can be very expressive and can be an art form itself. There's a neat thing
in the deaf culture called 'ABC Stories.'

[http://www.youtube.com/watch?v=wBUdzGH6WbU](http://www.youtube.com/watch?v=wBUdzGH6WbU)

They are stories that are told using the alphabet finger spelling signs.

------
hoffcoder
I remember doing Sign Language recognition through Kinect in my Master's
thesis. I did it for a vocabulary of 140 words of Indian Sign Language with a
success rate of 70% using the Machine Learning approach. Here is the video of
the prototype that I built: www.youtube.com/watch?v=2oqD-_UCHxQ

------
tsumnia
Good on them! I remember in my Masters program contemplating gesture
recognition with sign language. Ultimately I went with proving people are who
they are based on their body language; but this was number 2 on the list.

Does anyone know what techniques they are using to accomplish this? I did the
changes in Active Appearance Models, converted them into 'sound files', then
ran them through Microsoft's HTK Markov Model system to get my stuff.

In theory, they might be doing a similar thing.

------
hipaulshi
kudos to the MSFT research!

