

Google misspellings stats for "britney spears"   - nilsjuenemann
http://www.google.com/jobs/britney.html

======
noahl
It seems really weird that every single one of these spells "spears"
correctly.

Is this supposed to be a representative list of misspellings? Or given that
it's posted under /jobs, is it data for an interview exercise of something
like that? (The page doesn't say.)

~~~
aamar
As I mentioned below, it's used to illustrate that Google works on cool stuff
on a page called "Why Work at Google?".

But it's an interesting question: what kind of puzzle could use this as source
data? How about something along the following lines: Construct a simple
probabilistic program (or FSM, with probabilistically weighted transitions)
which outputs variants in proportion to the misspellings in the file.

------
blauwbilgorgel
I remembered seeing these stats before. They are a few years old, right?
Google lists the publishing date for those stats as 1 April 2002. Seems like a
subset too.

I really enjoy using Google's spelling correction, but what really would blow
my mind, is if I one day can search for:

    
    
      ntoymru d[rstd
    

and get results for "britney spears". It seldomly happens when I calibrate my
index fingers over the wrong keys.

~~~
Alterlife
> ntoymru d[rstd

>It seldomly happens when I calibrate my index fingers over the wrong keys.

I actually do that on purpose a lot... "unreadable password generator".

~~~
Leynos
The trouble is, I think a lot of password crackers now include dictionary
words offset on the keyboard in this way, so someone using a dictionary word
disguised like this is still at risk of having their password inferred from a
hash.

~~~
Alterlife
*@97oeh[5@9446Qg975%yq5-J6)qww294ewq43o9ht.

That's a substitution for: I wouldn't worry about that-my passwords are long.

For banking passwords, I use a substitution algorithm that's mathematically
based.

------
Jun8
Now, when I first saw this, I hypothesized that the number of mistakes would
go down roughly with edit distance. Not exactly true, see for yourself:

[https://docs.google.com/spreadsheet/ccc?key=0Ao73DTH98IRgdC1...](https://docs.google.com/spreadsheet/ccc?key=0Ao73DTH98IRgdC1BM1NrTDFJOWFaU0JWemRDeko2OFE&hl=en_US)

I used a perl module (<http://search.cpan.org/dist/Text-
Levenshtein/Levenshtein.pm>) to calculate the edit distances.

~~~
d_r
Many of the differences aren't typoes (classified by edit distance) but sound-
alikes.

~~~
Jun8
Updated the document with a column giving the Soundex code
(<http://perldoc.perl.org/Text/Soundex.html>) for just the first name (since
last names are all the same. For the 593 entries, there are only 296 unique
Soundex codes.

EDIT: BTW, the original form of the name is _Brittany_ , so the top
misspelling technically is not. This is one of the cases, like _babe_
converting to _baby_ where the misspelled and/or affectionate form became
dominant.

------
nixme
Not a single person misspelled "spears" as "speers" ?

~~~
barumrho
By the time you spelt britney wrong, I would think Google will complete the
whole name.

~~~
brendano
I think they made this list before auto-suggest existed.

------
seagaia
Do they only have those stats for "britney spears"? (And if so...why?)

I looked (admittedly not very hard) to see if this was some kind of google lab
where you query with some phrase and see the misspellings, but to no avail.
That would be interesting - anyone know links?

~~~
aamar
It's used as an illustrative example on the "Why Work at Google?" page:
<http://www.google.com/intl/en/jobs/swe/whygoogle/>

I thought at first it might be the source data for some kind of programming
puzzle, but searching "link:www.google.com/jobs/britney.html site:google.com"
did not immediately reveal it. Upon reflection, a puzzle of that sort seems a
bit uncharacteristic.

~~~
jleader
It's worth noting that the Wayback Machine shows the Britney misspelling page
existing almost unchanged since March 2002
([http://wayback.archive.org/web/*/http://www.google.com/jobs/...](http://wayback.archive.org/web/*/http://www.google.com/jobs/britney.html)).
The back link has varied a bit over the years.

------
josscrowcroft
That's interesting, but maybe would have been better if they'd discounted the
phrases where the user clicked "Show results for [mis-spelled query] instead"

Some of those look like legitimate queries..

~~~
leahculver
1807 briney spears

Probably just searching for a damn pickle.

~~~
tcarnell
Yeah! I agree I think its a bit bold to say that Google 'corrected' "brandi
spears" to "britney spears" - there almost certainly are numerous "brandi
spears"...

------
tcarnell
Actually, it seems these are only the spelling stats for misspelling 'britney'
- every single 'spears' is spelt correctly. I was expecting to at least see
'britney speares' listed.

------
jcr
I've always wondered if spelling could be improved through electrical shocks?

~~~
flarg
Don't overreact! Speaking for myself, I often horrify my sister when I Google
for something with a terrible misspelling because sometimes I am a lazy
typist; and Google fixes it for me like magic! If only real-life was as easy
;)

~~~
yarone
I often mistype stuff because my fingers are offset on on the keyboard! I try
to type "hello world" but it comes out as "y3oo9 294ie" or similar.

I always wondered how often folks do this and why Google doesn't detect it and
suggest "did you mean"?

~~~
jarin
Probably because that adds 6x more possibilities to the search space (and
that's assuming consistent transposition in a certain direction).

------
mikemaccana
Isn't the name ' britney' itself a misspelling?

~~~
mikemaccana
Some background from WP:

"Brittany is a female given name of French and Latin origins, after Brittany,
a region of France...

Brittany is also derived from 'Britannia', a 2nd century Roman goddess."

------
cleverjake
what was the job link to that?

------
srehnborg
Really? - 2 grittney spears

~~~
joshu
g is above b on my keygoard.

------
leif
damn, I was hoping to see britney spores.

------
VBprogrammer
Oh come on, half those people are typing one handed.

Sorry.

