
Build a better spell checker, win $10,000 - willf
http://web-ngram.research.microsoft.com/spellerchallenge/
======
dstein

      However, by submitting your entry, you:
      are granting us an irrevocable, royalty-free, worldwide
      right and license 
    

I'll stop reading there. I can't honestly believe that competent programmers
would be willing to work for free for one of the worlds largest multi-national
software corporations, with the chance of being paid a relatively small
amount.

~~~
blahedo
What on earth? I know we all love a good MS pile-on, but if this contest
weren't being run by Microsoft, it'd be an academic shared task with results
to be written up and published in a workshop of some conference, probably ACL
or EMNLP (which will probably still happen, actually), for _no_ prize money
and yet with most or all of the authors open-sourcing their research code
anyway.

And this isn't even substantially different from putting together OSS to do
the job; even GPLed code lets MSR "use, review, assess, test, and otherwise
analyze" it.

EDIT: To be even clearer, the idea of a shared task, run a bit like a contest
(with a validation corpus and a secret test corpus, and a designated winner at
the end), is _really_ well established in the Natural Language Processing
community, and the usual expectation is that you publish at the end. When I
say in my first paragraph that this "'d be an academic shared task", I'm not
speculating. The remarkable thing about the MSR contest is _not_ the
publication requirement, but that they're paying out money at all.

~~~
mseebach
Either that, or it might get build in one of these newfangled "start-ups".
Good luck buying, for $10,000, a company that has a spellchecker better than
Microsofts.

~~~
chc
Good luck building a company around a spell-checker, no matter what level of
quality, in a world where most people already have Office.

~~~
derefr
A sufficiently-smart spell-checker would be indistinguishable from a
translation aide, sentiment analyzer, voice-recognition classifier, etc.
Basically, whatever tech is required to make spell-checkers better has other
uses than spell-checking, and _those_ might be valuable.

~~~
adrianN
A sufficiently-smart spell-checker would be indistinguishable from an
artificial intelligence. A company producing one of those things would make
billions, but not by selling a spell-checker.

~~~
derefr
I didn't mean it would be indistinguishable from _all_ of those things
(because it is an AGI and could act as any of those things if it wished), but
rather that it would likely have to incorporate one or more of those things to
spell better. I have a feeling sentiment analysis alone would be a pretty good
next step for spelling/grammar-checking.

------
vaksel
$10K? That probably wouldn't pay for a single programmer's efforts to do
this...essentially you'll be working for Microsoft for free.

Hell Microsoft, at least do like Netflix and make the prize something worth
pursuing.

~~~
kgo
Almost as big of a ripoff as that company that gives you less than 20,000
bucks for a 2-10% steak in your company:

<http://tinyurl.com/yb74jsr>

Or maybe both of these deals have other less tangible benefits to the people
who end up winning.

~~~
asmosoinio
Please don't use tinyurl or other URL shorteners. Pleople want to see the
actual the link, and the posting software can handle long URLs nicely anyway.

~~~
27182818284
Though I normally would agree with you, Tinyurl was fundamental to the point
of this person's post. So much so that even before clicking the random Tinyurl
I thought to myself, "Nobody would tinyurl something in this context, it must
link back to Y-Com."

------
fredoliveira
I always find it a bit disingenuous when I see this kind of competition. I
quickly went through the official rules [1] and they're unclear about the true
motivations behind this offer. Sure, you get $10k if you develop a great spell
checking algorithm, and Microsoft claims no ownership over your
implementation. But then there's two clauses that I feel weird about:

* "are granting us an irrevocable, royalty-free, worldwide right and license to: (i) use, review, assess, test and otherwise analyze your entry and all its content in connection with this Contest; and (ii) feature your entry and all content in connection with the marketing, sale, or promotion of this Contest (including but not limited to internal and external sales meetings, conference presentations, tradeshows, and screen shots of the Contest entry in press releases) in all media (now known or later developed)"

* "understand and acknowledge that the Promotion Parties may have developed or commissioned materials similar or identical to your submission and you waive any claims you may have resulting from any similarities to your entry"

I'll admit that this kind of contest pokes my CS brain and that other people
will be at least curious enough about it to participate. But then you're
getting $10k whereas Microsoft would be getting a bunch more out of your work.
Am I wrong? Possibly. But my eyebrow moved when I read these pages.

[1] [http://web-
ngram.research.microsoft.com/spellerchallenge/Doc...](http://web-
ngram.research.microsoft.com/spellerchallenge/Docs/SpellerChallengeOfficialRules.pdf)

------
jodrellblank
_The Expected F1 (EF1) is the harmonic mean of expected Precision and
Recall.[..] the Expected Percision is defined as..._ (Rules page)

Could at least have run the Spell Check Challenge pages through a spell check!

------
maximveksler
 understand that we cannot control the incoming information you will disclose
to our representatives in the course of entering, or what our representatives
will remember about your entry. You also understand that we will not restrict
work assignments of representatives who have had access to your entry. By
entering this Contest, you agree that use of information in our
representatives’ unaided memories in the development or deployment of our
products or services does not create liability for us under this agreement or
copyright or trade secret law;

Come on... What a fucked up plan is this? Let someone work for free, then let
you whole engineering team "review" this... so ooops sorry if we remembered
your algorithm. We didn't claim we wont.

I can't believe anyone will be willing to participate in this. This is a day
time robbery.

~~~
chc
That is almost certainly not what this is about. It's just a common accusation
in lawsuits, so Microsoft is just covering its ass. This is the same reason
why a lot of Hollywood studios will return spec scripts unopened — if they
even open a package from somebody who hasn't signed an agreement like the one
MS presents here, they're in danger of getting sued for big bucks.

------
colinsidoti
I did a proof of concept for a better spell checker about 3 years ago. Query
groups of two to three words in a search engine and look at the word count.
Then replace the word in question with other words that are similarly spelled
and run a query with each. The word with the highest result count is extremely
likely to be the correct word. Really, it's surprising how accurate it is.

The glory of this is that it works with proper nouns that don't occur in
dictionaries (IE: xkcd). In Google's initial demo for Wave, they showed
"Icland is an icland" be corrected to "Iceland is an island." I'm fairly
confident they took a similar approach. There's also a good chance it could
work for other languages, because it doesn't use anything specific to English.

The disappointing part is that most of the accomplishment comes in "Suggestion
Intelligence First," meaning that from a list of 5, the top result is the
correct result. In most cases, the Suggestion Intelligence is just fine, you
will just need to pick the right one yourself.

If anyone's interested, this was my presentation:
[http://soe.rutgers.edu/sites/default/files/gset/Presentation...](http://soe.rutgers.edu/sites/default/files/gset/Presentation08-Hungarian.pdf)

And this was the "research paper." Unfortunately it was a three week program,
and myself and the other coder (IE: the ones who understood how the thing
worked) didn't contribute much to the paper. Feel free to ask if you have any
questions. The email in there isn't actually my email:
[http://soe.rutgers.edu/sites/default/files/gset/Paper08-Hung...](http://soe.rutgers.edu/sites/default/files/gset/Paper08-Hungarian.pdf)

------
xenophanes
> Q. Who is not eligible to compete in the challenge?

> Entrants who are younger than 18 years of age;

Ugh. They just ruled out a large portion of the hackers for whom 10k is a lot
of money.

~~~
chollida1
They probably didn't have much of a choice on this one, strictly for legal
reasons.

~~~
xenophanes
I don't know. You don't have to be 18 to win 10k at a chess tournament.

~~~
fletchowns
_irrevocable, royalty-free, worldwide right and license_

If words like that were used in said chess tournament, you probably would have
to be 18 to compete.

------
stonemetal
<http://norvig.com/spell-correct.html>

10K please.

~~~
milkshakes
might as well link the paper: [http://www.scribd.com/doc/13863110/The-
Unreasonable-Effectiv...](http://www.scribd.com/doc/13863110/The-Unreasonable-
Effectiveness-of-Data)

------
ggruschow
I'll pay more with far less onerous terms. _Seriously._ I'll happily sponsor
grammar aid work too. Get in touch: <http://gruschow.org>

------
willf
I should note that I work for Microsoft, but not for Microsoft Research, and
of course, I don't speak for them.

~~~
astrodust
I hope you get paid more than a $10,000 prize for working at Microsoft.

------
rabblern
Probably this has already been thought about but it seems to me that a good
checker should look for typos arising from the proximity of certain letters on
the QWERTY keyboard.

Particularly potential typos which would be correct spellings of different,
unintended words (these pass undetected by existing checkers).

------
Pewpewarrows
Does this mean IE will finally get a built-in spell checker?

------
seanmcq
Try another 3 zeroes.

~~~
astrodust
Right on. Netflix forked out a million for their contest, and they have market
cap that's puny compared to Microsoft.

