Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SeeMeSpeak, a crowd-sourced dictionary for sign languages (kaeff.net)
25 points by kaeff on Oct 21, 2013 | hide | past | favorite | 8 comments

I am one of the authors: Ask me anything!

[In the far future] any plans to hook people up with e.g. a Kinect so you can capture point clouds and allow people to rotate the signs / see them from different perspectives? Or is this completely unnecessary?

As someone who hasn't learnt sign language myself, I'm not sure, but I think I'd find it easier to mirror a sign by seeing it from the person's point of view (i.e. "over the shoulder") rather than looking from a camera's point-of-view in front.

Also, what's the current state of the art like for detecting sign language? Could you use similar tech to test people and make sure they're performing signs correctly?

Kinect was thrown around. I am not sure how helpful it is for learners. Our first attempt would be to implement a "mirror" view next to a given video were you can see yourself doing what the person on screen does. The biggest problem is that this would mean that everyone recording needed a Kinect. I'll touch that subject at the end.

Over the shoulder is problematic, as the face is very important. What the inside of your hands show to you is neglectible anyways. Also, we assume that this problem will go away when you see yourself side-by-side with a given person. If you hit refresh some times, you will find some videos that have the same person from the side and from the front. I think that is very good, but has a huge production overhead.

Detecting sign language: there is a problem there, and that is documented corpus. Be aware that sign languages are just as much subject to dialects as other languages. Some signs are completely different from nothern germany to southern germany.

Which brings me to the problem of corpora: we only included DGS here, because that was the only language we spoke that we found a well-tagged corpus under a permissive license for (by scraping a wiki...). Even that corpus is only 800 videos. There are some corpora which basically amount to a bunch of videos which not necessarily depict a single phrase and are not transcribed. Others are owned by the universities that created them - they could give us a license, but that misses the scope of our project: we make this corpus freely available.

We attribute the lack of user-generated corpora to the fact that producing very short videos (3-5 seconds) is still a huge hassle. You need a camera or a capture tool, encode the resulting videos, upload them to a wiki, transcribe them properly there... In a time where browsers are gaining recording capabilities and every laptop comes with a cam, we wanted to make this easier for users. Our recording workflow is certainly not finished yet, but this is a POC that is already much quicker.

Thats also the reason why we won't have a look at the Kinect soon: it's by far not as widespread as cameras.

In regards to handling video upload and encoding, have you checked out Transloadit[1]?

[1] https://transloadit.com/

We thought of this, but using this service would have been against the rails rumble rules

Oh sorry, I missed that part. What about now? Improvement after using 3rd parties is acceptable?

I don't think this is needed anymore. We have a avconv pipeline already working right now :) .

There are a few Kinect projects for this, one example is this:


But sign language not only consists of the hand movements. The fingers and lips are important, too. The kinect resolution is not good enough for those right now. And don't forget the facial expressions.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact