
HTML5 Speech Recognition (in Chrome) - philfreo
http://slides.html5rocks.com/#slide24
======
zmmmmm
Wow, this is pretty intriguing and may actually be a solid differentiator in
the browser space since it probably requires a considerable stored database of
speech samples coupled with a decent back end server farm to do it
effectively. Hard for the other players to replicate. Clever move Google!

I wonder if they will add this as a standard feature for any text field at
some point? It's probably not going to get much sunlight if it requires a
chrome-specific attribute on the field.

~~~
alextgordon
Couldn't the other browsers yank the code from Chrome and use Google's servers
too?

~~~
TomOfTTB
Not really. I mean, if Google chose to let them than sure. Otherwise it
wouldn't be that hard to have Chrome (which is based on an open source project
but is not itself open source) send an encrypted key in each packet ensuring
it came from a Chrome browser.

That said both Microsoft and Apple have voice recognition engines built into
their client OS which seems like a much better option latency wise.

~~~
aaronsw
And other browsers could just extract the key and ship it as well.

~~~
TomOfTTB
I'm assuming there would be some kind of algorithm to generate the encrypted
key.

~~~
rryan
The algorithm has to be seeded with a secret (Stream ciphers work this way).
You can't get around having to have Chome own a secret, and it's really hard
to protect a secret when you're in a hostile environment (e.g. the user's PC).
This is part of what makes DRM really hard.

------
51Cards
I suspect this slide's covert purpose is to make me (yes, just me) sit here
and say 'hello' to my computer like a moron for 3 minutes while seeing no
effect what-so-ever. In this it has succeeded brilliantly.

Running Chrome, mic is on, no one is home... sigh. I so wanted to be wowed.

~~~
tompagenet2
You need to click the little microphone icon - sorry if you've already tried
this.

~~~
51Cards
Interesting... no Mic icon here. Just a text box centered in the slide. Both
in Chrome 7.0.517.44 and Safari 5.0.2. The mystery deepens.

Edit: Aha! Update to 8.0.552 and presto... Mic icon. Very slick.

Edit again: "Hello" ---> Lowes. "Hello Mr. Webpage" ---> homeless services
"This is a test" ---> test

Probably a crappy notebook mic to blame. Ah well. Recognition on my Nexus One
is quite solid so I can't blame Google's algorithms.

------
bajsejohannes
It actually does a pretty good job at simple words and sentences. So, jumping
in the deep end, I tried "The reflected binary code was originally designed to
prevent spurious output from electromechanical switches". Can anyone get it to
recognize that? I did manage to get it to respond correctly to every word by
itself (sometimes only after a couple of tries), but not the whole thing.

(non-native speaker)

~~~
zain
Californian here, and it was really close on my first try:

 _the reflected binary code was originally designed to prevent spurious output
from electrode mechanical switches_

I'm kinda blown away. Here's an mp3 of what I sound like, if anyone is
curious: <http://cl.ly/3WDv>

~~~
bajsejohannes
Thanks for this. From now on Chrome will be the judge of how clearly and
accent free I speak :)

------
bemmu
Varies from poor to amazing.

"I have met Jesus, he was a nice guy" -> "ice melt cheese"

"hacker news is amazing" -> "hacker news"

"are you afraid of santa claus?" 100% correct

"if a woodchuck could chuck wood how much wood would a woodchuck chuck" 100%
correct

------
ImJasonH
Did anybody check out the slide before this, device orientation?
<http://slides.html5rocks.com/#slide23>

That's pretty awesome too, I could see this being great for mobile web apps,
especially games.

~~~
grigory
This worked great on my aging macbook. That's just amazing, I had no idea it
came with sensors to support such functionality.

~~~
ImJasonH
All Apple laptops since 2006 have had a sudden motion sensor to detect when
the laptop is falling and lock the hard drive platters in place to prevent
damage. Apple exposes this to software basically just because they can, AFAIK.
And to support cool features like this.

<http://en.wikipedia.org/wiki/Sudden_Motion_Sensor>

~~~
mvelie
It appears the new macbook air no longer includes the motion sensor since it
was originally designed to protect the hard drive, and the new air is lacking
one.

------
gilaniali
How are they accessing my laptop mic? Is it the Google voice plugin?

Shouldn't the browser ask for permission before allowing access?

~~~
gojomo
They're considering clicking the microphone icon permission. (You did have to
click the icon, didn't you?)

~~~
johnswamps
Hm, I wonder if it's susceptible to clickjacking

~~~
nkassis
Good point.

This could be easily use for spying.

Next up wikileaks will use this when government IPs are discovered on the site
;p

------
colanderman
Why does this have anything to do with HTML5 -- isn't it up to the UA to
determine how best to accept form input? Specifying in the form that a
particular field is a "voice recognition" field seems to be encoding
presentation details in what should be structure.

I can understand that it's important to mark a particular form field as more
"important" than others (and thus more likely that a user would like to use
their voice to input text to it), but wouldn't this be better served by
semantic markup declaring the field as a "primary" field or some such?

------
varenc
Is there a speech recognition engine built in to chrome this is leveraging?

~~~
yanw
_Speech for HTML Input Elements_
<https://docs.google.com/View?id=dcfg79pz_5dhnp23f5>

~~~
varenc
That does not specify where the recognition is happening.

------
mdxch
I whipped up a Chrome extension for voice search if anyone is interested:

<http://dl.dropbox.com/u/1047706/VoiceSearch.crx>

<https://github.com/raneath/chrome-voice-search>

------
ImJasonH
Aw, curse words are censored? That's pretty ####### lame.

~~~
bajsejohannes
Indeed. I hope they did it to not accidentally offend people by falsely
recognizing a swear word. If so, they would probably be better of taking the
non-swear neighbor word.

------
Herring
I thought speech recognition was still very inaccurate & hasn't improved much
in the last 5-10 years. Has it suddenly become usable?

~~~
rbarooah
Not for me in this case. Every sentence I tried was mangled in the traditional
way:

Once upon a time in america -> ants on a time in america

The owl and the pussycat went to sea in a beautiful pea green boat. -> the owl
and pussycat when to see in a beautiful p cream

Google is not evil -> google is evil

I'm not joking about that last one.

~~~
cypherdog
I tried your sentence "The owl and the pussycat went to sea in a beautiful pea
green boat" and got "seattle hookers gatwick to see a beautiful pizza ri
boat". You win.

~~~
rbarooah
Actually I think you win!

------
GeneralMaximus
This is exactly like the speech recognition on Android. It works brilliantly
with short phrases that also happen to be popular searches on Google (or
Google Voice Search) but fails at longer or obscure sentences. It's all about
the data, baby.

I use Voice Search heavily on my Desire, but I prefer to type out my
communications because of this exact limitation.

------
ugh
That is awesome, works even for German without a problem. I couldn’t get it to
recognize an English sentence properly (which probably only means that my
English pronunciation is horrible). I’m wondering, however, how they manage to
recognize the language in the three word sentences I tried.

~~~
27182818284
A lot of it is statistical inference. I've run into weird glitches where it
chooses the completely wrong word that still fits. For example I used a
sentence that ended with "cool!" but it transcribed it to "excellent!"

Obviously, "excellent" sounds nothing like "cool" but the sentence still
worked because it was using the neighboring words to try and guess what should
go there.

~~~
ugh
If speech recognition can with some accuracy identify the language of speech
after only three words it already exceeds my own capabilities. Whenever I
truly don’t know which language someone is going to talk to me I nearly always
need more than three words to orient myself. That’s why I’m so impressed.

------
wildmXranat
It was rather good, but not even close to rely on it for anything practical.
It felt a bit like this <http://www.youtube.com/watch?v=5FFRoYhTJQQ>

------
codejoust
What version of chrome does this work on? Either I'm missing something or on
an older version of chromium: Chromium 5.0.375.127 (Developer Build 55887)
Ubuntu 10.04.

~~~
jorlow
I believe it's available in later 7 builds and all dev channel builds after
that. Note that it is not yet available in the beta or stable channel builds
(since we're still perfecting it).

Also, I really recommend you upgrade your Chromium version! I believe security
updates are only back-ported to the current stable release, which means you
haven't gotten any such updates for a while! (Stable is now at 8.)

------
zmanian
Two things I would want upon seeing this.

1\. Chrome extension to use speech recognition in every text box.

2\. Speech recognition inside the google apps: Gmail, etc.

~~~
dannytatom
<https://github.com/dannytatom/speak-it>

Right now it turns any text input into a speech input, but I might change that
later. Or, at least, have an option to disable it on certain sites (have never
created a chrome extension, no clue how long that'd take).

~~~
Sephr
<input x-webkit-speech> works for me, so you may want to just apply the
attribute to all input elements regardless of type (and textarea elements for
future support). I suspect that support for voice input on input types such as
date may also be added eventually.

Edit: More at the HTML5 speech input proposal at
<https://docs.google.com/View?id=dcfg79pz_5dhnp23f5#y1f9> . It's apparent from
this that you should also use the attribute on select elements too. I also can
get x-webkit-speech working in current stable Chrome with an input type of
speech.

~~~
dannytatom
Thanks for the link, I've went ahead and made it work on any input field and
text area that isn't in the not allowed list.

~~~
Sephr
Don't forget select elements too. An easier way for you to do your whole
extension could be to use an XPath expression or
`document.querySelectorAll('textarea, select, input:not([type="' +
notAllowed.join('"]):not([type="') + '"])')` (results in `textarea, select,
input:not([type="checkbox"]):not([type="radio"]):not([type="file"]):not([type="submit"]):not([type="image"]):not([type="reset"]):not([type="button"]))`).
Also, may I ask why are you abstracting Array.indexOf away and extending the
Array prototype with a non-standard method for such a simple problem?

~~~
evanrmurphy
Your line of code is so long it's causing this Hacker News comments page to
have horizontal scrolling! I've never seen that before. :)

In case you didn't know, HN will format code if you prefix it with two spaces
like this:

    
    
      document.querySelectorAll('textarea, select, input:not([type="' + notAllowed.join('"]):not([type="') + '"])') (results in `textarea, select, input:not([type="checkbox"]):not([type="radio"]):not([type="file"]):not([type="submit"]):not([type="image"]):not([type="reset"]):not([type="button"]))`)
    

It can preserve indentation too:

    
    
      document.querySelectorAll(
        'textarea, select, input:not([type="' 
        + notAllowed.join('"]):not([type="') 
        + '"])')

------
nowarninglabel
"Hack the planet" -> "Mayo clinic"

You win this round Google.

------
sandipagr
wow this is really good. It recognizes almost everything and I am not even a
native speaker.

