I'd love to see how good your model gets.
> Alex went to the kitchen to store the milk in the fridge.
Corrected:
> Alex went to the kitchen to the store the milk in the fridge.
Gathering a large, high quality dataset from the internet is probably not so easy. A lot of the content on HN/Reddit/forums is of low quality grammatically and often written by non-native English speakers (such as myself). Movie dialogues don't necessarily consist of grammatically correct sentences like the ones you'd write in a letter. Perhaps there is some public domain contemporary literature available that could be used instead or alongside the dialogues?
EDIT:
Unrelated to this project, I have this general fear of language recommendation tools trained on just low-quality comments or emails. A simple thesaurus and a grammar-checker are often enough to find the right words when writing. But a tool that could understand my intent and then propose restructured or similar sentences and words that convey the same meaning could be a true killer application.
> A lot of the content on the HN/Reddit/forums is of the quality grammatically and non-native written by written English speakers (such myself). as UNK
Yeah. It's got a long way to go. No idea where "as UNK" came from.
I'm working on this exact thing right now fwiw, in my app called Prompts. I imagine machine assisted apps are going to percolate up over the coming years, or months, in a way we haven't seen yet. It's pretty exciting, if we can get it right. Written language seems like a decent place to start.
"Alex went to kitchen to store the milk"
corrects to
"Alex went to the kitchen to store the milk"
Original: The complex houses married and single soldiers and their families.
Deep Text Corrector: The complex houses married and a single soldiers and their families.
OT: does anyone know of a more substantial list of garden path sentences that people use in testing NLP software?
[0] https://en.wikipedia.org/wiki/Garden_path_sentence
(https://en.wikipedia.org/wiki/Wikipedia:Database_download)
EDIT: I see this has already been suggested, along with a large amount of other source in another comment by daveytea.
http://www.deepgrammar.com/evaluation
https://blogs.nvidia.com/blog/2016/03/04/deep-learning-fix-g...
How about books?
The language in many of those books may be a bit archaic, though.
See also Distributed Proofreaders: http://www.pgdp.net/c/
I'm going to assume that it was meant to be ironic (or errorful).
https://en.m.wikipedia.org/wiki/Corrector
Off-topic but I've been waiting for the third and final book in the trilogy for a long time ... I've come to the conclusion that Rothfuss can't find a way to tie all the plot threads together.
I'm also wondering if anyone else thinks Rothfuss looks like Longfellow in a lot of his publicity shots.
