As someone who hasn't learnt sign language myself, I'm not sure, but I think I'd find it easier to mirror a sign by seeing it from the person's point of view (i.e. "over the shoulder") rather than looking from a camera's point-of-view in front.
Also, what's the current state of the art like for detecting sign language? Could you use similar tech to test people and make sure they're performing signs correctly?
Over the shoulder is problematic, as the face is very important. What the inside of your hands show to you is neglectible anyways. Also, we assume that this problem will go away when you see yourself side-by-side with a given person. If you hit refresh some times, you will find some videos that have the same person from the side and from the front. I think that is very good, but has a huge production overhead.
Detecting sign language: there is a problem there, and that is documented corpus. Be aware that sign languages are just as much subject to dialects as other languages. Some signs are completely different from nothern germany to southern germany.
Which brings me to the problem of corpora: we only included DGS here, because that was the only language we spoke that we found a well-tagged corpus under a permissive license for (by scraping a wiki...). Even that corpus is only 800 videos. There are some corpora which basically amount to a bunch of videos which not necessarily depict a single phrase and are not transcribed. Others are owned by the universities that created them - they could give us a license, but that misses the scope of our project: we make this corpus freely available.
We attribute the lack of user-generated corpora to the fact that producing very short videos (3-5 seconds) is still a huge hassle. You need a camera or a capture tool, encode the resulting videos, upload them to a wiki, transcribe them properly there... In a time where browsers are gaining recording capabilities and every laptop comes with a cam, we wanted to make this easier for users. Our recording workflow is certainly not finished yet, but this is a POC that is already much quicker.
Thats also the reason why we won't have a look at the Kinect soon: it's by far not as widespread as cameras.
But sign language not only consists of the hand movements. The fingers and lips are important, too. The kinect resolution is not good enough for those right now. And don't forget the facial expressions.