Hacker News new | past | comments | ask | show | jobs | submit login
Hawking looking for technical help to maintain and improve his speech system (theverge.com)
67 points by shawndumas on Dec 29, 2011 | hide | past | favorite | 22 comments



I've been working on http://ktype.net for 18 months now. KType is an iPad app to improve communication for individuals with speech & motor disabilities. When I talk to people, I actually mention Hawking as the most famous use-case and that while he can have the best engineers and computer scientists work on his device, very few devices are available for my late grandpa who had stroke or cousin who is paralyzed from an accident and can't talk. I'm making KType for them. Kinda funny to see Hawking looking for assistance with improving his systems. Wonder how I could show him a demo of KType :)

My end goal is to turn KType into an integrated box like Hawking probably has but make it easy enough that anyone can set it up. I've seen old specs of his system but that was long ago. I wonder how good the word completion/prediction is. I built my own standalone prediction algo based on data-mining 10m tweets and can search through 1m words and phrases on the iPad in real-time as the user types (you can see that in the demo video). Here's how I mined the Twitter data: http://ktype.net/wiki/research:articles:progress_20110209

If anyone's interested, I could write how my algo works. I'm quite certain that it is significantly better than typical smartphone auto-suggests and it was quite an engineering challenge to make it work on the iPad (bit-counting and bloom filters and tons of optimizations). I've thought about making it a standard C-library that you call from your code - pass it a 1 to 3 word phrase and it will return 3-7 potential word matches.


As far as I can tell, the Hawking example shows that even with pretty much the best possible resources available, just about every special-use system for the disabled is more or less doomed to become something like a stovepipe system; a system that's a wonder when its created but by its very nature of having limit audience, subject to limited maintenance after a certain period - maintenance that usually can't prioritize upgrading for the next version of whatever OS it runs on. Steve Hawking, the world's best known quadriplegic, is still running WINDOWS XP.

Given this, the choice of iOS as your OS seems kind of unfortunate. Why choose an OS where the hardware makes a point of being a black box and where the support for, say, an old version ten years from now seems rather doubtful.

And thing with Hawking is that his system work. It's a virtual certainty he has no interest in your system 'cause getting used a system is lifetime affair. But someone will be used to your system. Will you be happy explaining to them they have to upgrade or even abandon their system because backward compatibility is not a high priority for Apple?


> Given this, the choice of iOS as your OS seems kind of unfortunate. Why choose an OS where the hardware makes a point of being a black box and where the support for, say, an old version ten years from now seems rather doubtful.

KType is almost entirely PhoneGap (HTML5/CSS3/CoffeeScript) except for the prediction and TTS libraries. I could probably make an Android version in a week and plan on doing so in the near future. The reason I chose iPad is the ubiquity and stability. The best part about using tablets with PhoneGap is that the hardware/OS completely gets out of the way of the user. KType is the only interface that the end-user sees and interacts with. And as long as KType does not change drastically, users will not be inconvenienced.


I guess...

Another thing to notice is that Hawking's system is hooked to his home automation system, possible in a fairly complex way. That is something you'd want and you'd want to think about how to do it an OS independent way (or again, choose an OS that at least aims for backward support) - my shallow Googling seems to indicate there's no standard support in "HTML5/CSS3/CoffeeScript" for this.


Thank you for doing something valuable for the world. I see too many people building trivial things and calling themselves innovators.


I'm interested in this kind of algorithm on principle, and analysis of successfully hurdled engineering problems are probably of interest to most readers of hacker news. Please elaborate!


I think I posted a few of these ( http://ktype.net/wiki/ ) on HN as I wrote them but barely got any votes so I wasn't sure if HN was the best avenue for it. When I get some time, I will do a write up on how I built the suggestion engine. Here's the gist:

* Built a list of 1m words and phrases (1-4 ngrams) and gave each a weight based on how many different people mentioned it and how often: http://ktype.net/wiki/research:articles:progress_20110209

* Sort this list in the following order: first-letter ASC, word length DESC, weight DESC. About 12MB in size.

* For each word/phrase in the sorted list, compute a 4-byte (32 bits) value as follows: First 26 bits = 0s and 1s to indicate which letter is present in the word/phrase. So CAT = 101...1000000 (A = 1, C = 1, T = 1, everything else = 0). Then normalize the weight and store it in the last 6 bits. So that's 1m x 32bits = 4MB of precomputed data.

* Build an index containing start-end position of all words of same length that begin with the same letter. Words starting with 'A' of length 20: position 1-200. Words starting with 'A' of length 19: position 201:543. And so on. About 32KB in size.

* Upon app init, load the 12MB word list, 4MB precomputed data, and 32KB of index. Takes 10s on iPad1, 5s on iPad2.

* When user types 'CAR' (3 characters), find the range of all words of length 3-9 that begin with C. Start position P1 = Start position of 'C' of length 9. End position P2 = End position of 'C' of length 3. Everything in between P1 and P2 is fair game for suggestion. Sometimes this can be as small as 100 words (e.g. user typed XP) or as many as 150000 (user typed ENT).

* Now, create a 26bit value from what the user typed, like we did with all the words in our 1m long list. So CAR = 101......100000000. Here's the fun part: For each precomputed 26bit value, AND it with the CAR-bits and count the number of 1s in the result. This tells us how many letters that the user typed, are present in each of the 3-9 length word/phrases that begins with a C. Since this is integer math, even looping through all 1m entries is near instant on the iPad. If least 2 out of 3 characters that the user typed are present in a word/phrase, go to next step.

* So user typed 'CAR' and we now have a list of words like 'CATARACTS', 'CRIMSON', 'CARBON, 'CRAYON', 'CHARM', 'CARE', 'CAR', 'CRY' etc. How do we know which one the user is trying to type? Well, depends on which word fits best and how high the weight is. By 'fit', I mean 'CAR' => 'CAMERA' is a better fit than 'CAR' => 'CRAYON' because the letters 'CAR' are in the correct order in the word 'CAMERA' but 'A' and 'R' are flipped in 'CRAYON'.

* If we do a distance calculation between the words at this point, it will just match the shortest words that have similar letters. So CAR will match CAR, CRY, CARE and not match CATARACT at all. We want it to match longer words also because those are much harder to spell out. E.g. PMGN should match POMEGRANATE and PYGMALION better than PONG or PIGEON. Spell-checks work by measuring distance between words - http://norvig.com/spell-correct.html but we're not making spell-check. We're making a word-predictor. So we need to find what words fit CAR and PMGN and how well.

* Turns out it's easy. To find out the fit, simply remove all letters in the word/phrase that the user did not type. So CAMERA => 'CARA'. 'CRAYON' => 'CRA'. If user typed PMGN, then POMEGRANATE will become PMGN, PIGEON will become PGN. Now take the http://en.wikipedia.org/wiki/Levenshtein_distance of what the user typed e.g. 'CAR' and each resulting phrase: CARA, CRA, CAAR, CR, CA etc. Do some funky math with the resulting distance x word length x word weight x exact-match-ness and give it a score. If score > min-fuzzy-value, return it. Else word does not match. The min-fuzzy-value is calculated based on how many words begin with the letter, have the same length etc. If user typed 'ENT', we want min-fuzzy-value to be quite high, if they typed 'QLX', then we want a really low min-fuzzy-value or else there won't be any positive matches.

* Once you have at most 100 (or 1000 depending on processor speed) matches that have a high enough score, sort descending by score and return top 3-7 as configured by user.

This is just the algo for single word match. The multi-word match is similar but has a lot more conditions to speed things up. But it works significantly better than single word match because it really narrows down the potential matches. Typing 'T' can mean any of the 20,000 words of 2-5 in length. But typing 'T' after 'EIFFEL' really narrows it down.

While my approach to word suggestion may seem pretty obvious, what I didn't mention were the 50+ approaches that did not work because of performance, quality, or stability issues. I'd love to hear any questions about it and I can provide my ObjC code (after I clean it up) if someone wants to build on it.

Please note I wrote all of the above off the top of my head and not as a detailed research paper. I left out lots of minor optimizations and kludges. Also I'm pretty sure I can describe the indexing and bit-counting better given some free time.


Sounds like you have some interesting algorithms there, but I think you might want to spend some more time looking at well-known algorithms that might make improve both efficiency and accuracy.

For efficiency, you might consider one of the various tree-like structures that would allow you to not bother searching every single word. That would become significantly more important with a larger database, or a less powerful device.

For accuracy, consider arithmetic coding, and take a look at some of the work and writings from the Dasher project (http://www.inference.phy.cam.ac.uk/dasher/). In addition to their impressive input system, they've done a ton of work on accessibility.

Also, depending on your input system, you should consider the possibility of common typos or equivalent; for instance, on a keyboard you'd consider adjacent characters as likely typos.


> I think you might want to spend some more time looking at well-known algorithms that might make improve both efficiency and accuracy.

I went through hundeds of research papers on the topic and found nothing that could help me. Tries and BSTs are awesome for a lot of things but not what I was trying to do. If you read above, I don't search through each of the 1m words - I narrow down to the letter based on word-length and then find all possible words that match at least 66% of the letters that the user typed. This allows for typos and relaxes the filters. Going from 1m to 1000 words takes under 0.01s on iPad1 because of the bit-counting algo. The hard part is picking the best 5 out of the 1000 that match and is what takes some time. If user types 'CTA', how would pick the best match from hundreds of words like CATARACT, CITADEL, CANTINA, CRUSTACEAN etc.?

> For accuracy, consider arithmetic coding, and take a look at some of the work and writings from the Dasher project (http://www.inference.phy.cam.ac.uk/dasher/). In addition to their impressive input system, they've done a ton of work on accessibility.

Dasher was one of the inspirations for KType actually. When I came across it years ago, I tried to make it work for my cousin but it requires a lot of dexterity and timing, even when configured properly. And when I looked at it last, it did not support a high-level of word-completion / finishing sentences. On KType, if the user wants to type 'GIVE ME SOME FOOD', they type 'GV', select 'GIVE' on screen, then 'ME' on screen, then type 'S' and select 'SOME' and so on.

Additionally, KType can take input from external bluetooth devices and I'm working on making it accept input from bluetooth joysticks, so they wouldn't have to touch the screen at all. I also wrote a simple algo to ignore unintentional touches a user might make if they do not have fine motor skills: http://ktype.net/wiki/research:articles:progress_20111105

> Also, depending on your input system, you should consider the possibility of common typos or equivalent; for instance, on a keyboard you'd consider adjacent characters as likely typos.

In addition to handling typos, I offer a special keyboard layout to minimize the amount of movement a user would have to make when typing: http://ktype.net/wiki/research:articles:progress_20110228

Last thing I wanted to do was reinvent the wheel for no reason. The problem is that in this field, most all of the technology is proprietary and even though lot of research is done to invent systems that help others, they are purely research and rarely ship. In the long-term, KType is a system to integrate with other hardware for I/O and provide the user with a single, easy-to-use interface.


> Dasher was one of the inspirations for KType actually. When I came across it years ago, I tried to make it work for my cousin but it requires a lot of dexterity and timing, even when configured properly.

Dasher does not necessarily require dexterity or timing. You don't have to run it in "continuous" mode, where it moves to the right constantly or relies on something pointing to the right for speed. You can put Dasher into a mode where it has a single binary input: go up-right one step, or go down-right one step. Effectively, you enter one bit of information at a time, and Dasher's arithmetic coding makes the number of bits you have to enter as efficient as possible. In a mode like that, you don't need either dexterity or timing. You can still enter any possible text, but you'll find that it takes far fewer bits to enter common English text than random gibberish. Without arithmetic coding, you'd have to enter an average of log base 2 of 26, or 4.7 bits per letter, ignoring capitalization or punctuation. With arithmetic coding, you can easily get to a few bits per word, even with Dasher's simplifying constraint of keeping things alphabetically sorted (so it won't offer you vowels versus consonants even though that might require fewer bits of information than a partition of the sorted dictionary, because the sorted dictionary provides a simpler mental model).

> And when I looked at it last, it did not support a high-level of word-completion / finishing sentences. On KType, if the user wants to type 'GIVE ME SOME FOOD', they type 'GV', select 'GIVE' on screen, then 'ME' on screen, then type 'S' and select 'SOME' and so on.

Dasher has supported high-level language models for a long time. You can feed it a giant corpus of text, and it'll extract not just letter models but word models as well. Once you steer into 'G', you'll find 'i' much more common than 'h' or 'j', so 'i' will have a larger region to aim for (but you can still steer into the smaller 'h' if you want to spell "ghost", or the even smaller 'j' if you want to talk about the GNOME Javascript framework "gjs"). Once you write 'Gi', 've' will have a visible region all its own. Once you write 'Give', you'll already see a visible region for 'me', and so on. When you write common phrases, you very quickly start writing significant fractions of a sentence at once.

To use one of the other examples from your twitter corpus, if you typed "hulk", you'd likely see a visible region for "smash". :)

So, yes, Dasher has very strong predictive models.

Dasher also avoids the "modal" approach of switching between typing individual letters and selecting completions; they all use the same input mechanism. I also like their philosophy that as long as you can cause your body to output one bit of information, Dasher can work for you.


I'd be interested. Also worth noting that the 9 mil tweets link seems to 404, and looking through their site it appears to no longer be available. Any chance you could suggest other resources for this type of data? Asking that you host the file wouldn't be fair, as I assume the bandwidth would kill you easily.


I think there was a licensing issue with the original data so they probably pulled it.


Tweets? At first that seemed like a little biased prediction set, but on the other hand people who have to make such an effort to communicate will want write few words. However, writing "Tweets" takes almost no effort at all, so it is interesting to consider.


I used tweets because they're informal and mention a lot of "what to do" and "how I feel" topics, which I think is very important for basic communication, especially for individuals with multiple disabilities. A bonus is that I get a lot of pop-cultural lingo (e.g. "Harry Potter", "Hulk SMASH") that makes my suggestions more fitting for informal communication than plain old dictionary lookups. And you are right about the tacitness of tweets - helps in building a list of common phrases like "on my way to" and "by the way."


There is no amount of money that should be considered "too much" to keep this man as productive as he can be.

Seriously. If it costs 10 million pounds a year to run that chair, it's worth it, and the rest of humanity needs to chip in to help.


But why? You really believe he is making important scientific contributions NOW? I think it's more of sustained hype from his earlier contributions, plus a guy in a chair makes for a compelling media story.

That said, I'm all for social health and providing for the needing and handicapped, but that's regardless of if they are "worth it" or not.


He's well know and respected for science. And I mean Michael Jordan well known. Science needs every hero it can get right now. As a spokesman alone he's worth every penny. As a role model...


Speaking purely about his brain, and nothing else, I'd wager money on him producing something useful over many other brains.



Fifteen years ago, I had an interview for this job. This was before the Internet was popular, and it was advertised in The Guardian.

Worth applying just for the interview, to have tea and a personal chat with Stephen Hawking.

One of the questions was "You're in the US, and the Professor's screen shatters. It's Thanksgiving Day, and no shops are open. What do you do?"

We quickly realised that I wasn't good enough at electronics to do the job!

I found it hard talking to Hawking, as I didn't know what to do while he is composing his sentences. It seems rude to look at his screen, or out the window, or to stare at him.


Probably too limited, but a former colleague works for the company that's developing this:

http://www.think-a-move.com/tongue.html


Oh, the fun I could have with that job. First prank, Stephen Hawking sounding like Scooby Doo. Or how about a plugin which would translate his speech to jive? Or maybe Darth Vader for the Star War fans?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: