
Ask HN: A tool for writing English that checks “popularity” of used sentences? - twa927
As a non-native English speaker I find that the best way to check grammar is to google whole parts of sentences (in apostrophes - exact match). It&#x27;s because there are multiple exceptions to language rules and some wording just can feel &quot;not right&quot; despite being correct.<p>Is there a tool that does something like this automatically?<p>I thought about writing such tool by myself, but it seems there are no good-quality, free search engine APIs that allow many calls. Or, maybe there are some open APIs to book dumps or something similar?
======
IanCal
You might like to check out writeful:
[http://writefullapp.com/](http://writefullapp.com/)

~~~
pythonbull
Great app. Are you working on Android version as well?

~~~
IanCal
Not my app, I'm afraid :) I just found out about it through my company.

------
antaviana
AFAIK, an ex-Googler had that very same itch and he founded
[http://www.linguee.com](http://www.linguee.com) to try to solve it.

~~~
rossng
I've found Linguee very useful for English -> French translation.

I think it draws heavily from the huge corpus of professionally translated EU
regulations and documents.

~~~
bbotond
Agreed, it has been extremely useful to me too for translating various
Hungarian technical terms into English. Naming classes and database tables is
much easier this way because most often the right term cannot be found in even
the most detailed technical dictionaries but Linguee somehow just knows it.
And it also shows the context so you can be very confident in your choice.

------
barryhunter
There are quite a few Ngram datasets available
[https://www.google.com/search?q=download+n-gram+dataset](https://www.google.com/search?q=download+n-gram+dataset)

... these are almost certainly used in many spelling and grammar checkers. (To
help with where the same spelled word is used in different context)

[http://www.aclweb.org/anthology/W12-0304](http://www.aclweb.org/anthology/W12-0304)

~~~
twa927
Yes, I remember trying to use Google Books Ngram Dataset [1], but it was too
tedious for me to setup and maintain a server with the data for a purpose of a
quick-and-dirty tool (that's why I asked for a ready API). Still, using it is
probably a nice idea for a more ambitious side project or even a startup.

EDIT. Actually I would happily pay for a tool that implements the idea.
Grammarly has paid plans but $30/month is too steep (for my types of usages),
and the types of grammar checks it performs is not exactly what I need (which
is what real people in real situations use).

[1]
[http://storage.googleapis.com/books/ngrams/books/datasetsv2....](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)

~~~
plusepsilon
We (foxtype) actually have a dev tool that does exactly this.

If we publish it as an online tool do you think people will find it useful?

We have multiple corpora, some language models built in neural networks, etc.

------
aytekin
I wonder if there is a tool like this:

1\. You enter a sentence

2\. It gives out 5 different ways to say the exact same thing.

Such a tool not only would help ESL people but also it would help native
speakers find more relaxed or formal versions of a sentence.

~~~
infinitone
We're building a tool that does something similar but for email. Currently
we're targeting cold sales emails- the idea is, you enter a recipient's email
and we aggregate data about them and surface relevant, personable sentences
that you can use in the email. You'll also be able to change the tone of these
sentences (funny, professional, casual, etc.)

Learn more: [http://emailfox.co](http://emailfox.co)

~~~
taneq
Something like this will only work if your clients don't screw it up by
spamming the same targets over and over.

If you can somehow get it through to your clients that they should only ever
spam one target once, then hell, I'm for it.

~~~
infinitone
Yup, agreed. I think overtime we'll also try to compensate for this by build a
ML model from all the emails being sent to provide more 'fuzziness' to it.

------
infinitone
Check out [http://foxtype.com](http://foxtype.com) \- does some of that but
more grammar-like heuristics such as conciseness, complexity.

On a side note, I'm part of a team working on
[http://emailfox.co](http://emailfox.co) which will provide 'Smart Sentences'
for you when composing an email, based on a recipient. Allowing you to write
personal, relevant emails faster.

~~~
angry-hacker
People on mobile: looks like foxtype already is a product, it's a chrome
extension.[0]

On mobile it just asks your email so they tell you when they launch (on
mobile?). Horrible to have landing pages like that. Absolutely useless product
page on mobile.

[0]
[https://chrome.google.com/webstore/detail/foxtype/npcfiblhbj...](https://chrome.google.com/webstore/detail/foxtype/npcfiblhbjecbjogpdhilnigldhjnfal)

------
rtrsqrrl
Try [http://www.netspeak.org/?locale=en](http://www.netspeak.org/?locale=en)
it seems to do some of the things you asked. It is implemented on top of
n-gram corpora.

~~~
twa927
It looks helpful, but I would like to paste a whole document and be told which
fragments look suspicious because of low popularity. The site requires putting
wildcards and completes only a single n-gram.

~~~
rtrsqrrl
I did not get that from your question. This takes parts of a sentence (i think
at most 5-grams) and a few operators e.g.:

If you ask for similar words to 'much' in a fragment.

    
    
      'and knows ... #much ...'
    
      =>
      
      and knows a lot, 3.500, 65,2%
      and knows a lot about, 2.100, 39,5%
      and knows a great deal, 690, 12,6%
      and knows much, 630, 11,5%
      and knows lots, 380, 7,1%
      and knows lots of, 300, 5,5%
      and knows a good deal, 100, 1,9%
      and knows practically, 53, 1,0%
      and knows very much, 45, 0,8%

------
Xeoncross
You could probably use some of the Ngrams datasets to figure this out. Parse
some books from [https://www.gutenberg.org/](https://www.gutenberg.org/) or
use the google ngrams corpus. Pay attention to the year(s) which you wish to
model english from - grammar and form keep changing!

------
rebelde
I have been thinking of doing something like this (using Ngrams for grammar
check for non-natives) for a while. I would be happy to fund development if
you or somebody else are interested in working on it.

------
franciscop
From XKCD themselves, an editor that only allows for common words:
[https://xkcd.com/simplewriter/](https://xkcd.com/simplewriter/)

------
0xdeadbeefbabe
www.grammarly.com (haven't tried it though) In the demo they showed it turning
a sentence into a more colloquial sentence.

I'm a native English speaker, and I'd like to know appropriate punctuation for
a given combination of words. I'd like to search through a list.

~~~
vittore
Thank you, I especially like macOS editor.

------
ChicagoBoy11
When I'm conflicted about different phrasings of things (for instance, if
there is a hyphen or there isn't on when writing compound words), I usually
just use a google search and go with whatever result has the most number of
hits. That could be a suitable enough proxy for your use-case, and perhaps you
could just use the google search service as an API...

Of course, the RIGHT way to do this would be to use the n-gram datasets that
people here have suggested :-)

------
mrtimuk
In FAQ: "Why does Google Books only provide feedback on 5 tokens or less?"

You mean "..feedback only for 5 tokens or FEWER?" Use your app! ;) //runs away

------
camelite
Some thing like this: [http://corpus.byu.edu/bnc/](http://corpus.byu.edu/bnc/)
?

------
hendler
To improve the qualitative aspects of writing, in this case for job listings
primarily, check out [https://textio.com/](https://textio.com/). There's no
API, but I think it will help you think about what "popular" language means.

------
nl
What you want is a language model. This will give you the probability on a
word by word basis.

Something like [1] is pretty much state-of-the-art. It's worth noting that the
kind of writing you are doing change the probability significantly. [2] shows
this quite well.

[1] [https://colinmorris.github.io/lm-
sentences/#/billion_words](https://colinmorris.github.io/lm-
sentences/#/billion_words)

[2] [https://colinmorris.github.io/lm-
sentences/#/brown_romance](https://colinmorris.github.io/lm-
sentences/#/brown_romance)

------
adrianratnapala
Bah, if you have good reason to be confident that your sentence is correct
even if English speakers might feel it is wrong, then I say you should just
write it anyway.

I like to read such things because it makes me think about what is being said
and how the language works. If we always use "popoular" patterns then our
writing becomes cliched and boring and people's eyes will glide right over it.

~~~
QuantumRoar
You have a point. But as a non-native speaker trying to learn a language, you
aim to become so fluent that people will not notice you're foreign. You want
to be able to play with the language.

A big part of learning a language is to become familiar with frequent speech
patterns and slang. A language is not a sterile set of words with attached
grammar but a slippery gelatinous blob that molds itself to the culture and
people. Spoken languages are quite lively. If I want to integrate myself and
joke around with natives, I need to learn to mold it the same way as natives
to. In order to learn how to do that, you first have to start imitating.

~~~
adrianratnapala
Perhaps it also depends on your learning history.

Right now I live in Germany, and speak pretty ungramatically, but from being
here I copy a lot of everyday idiom without really understanding it. So what I
would like is the opposite of what you are looking for: confidence that my
German sentences (especially written ones) are formally correct.

I don't mind if that makes me look like a well taught foreigner. Right now I
sound like a badly taught foreigner.

------
KayL
If you can read Chinese, there's interesting tool:

[http://www.pigai.org/guest2016.html](http://www.pigai.org/guest2016.html)

It extracted common phrase from the sentences with explanations & suggestions
& count usages from corpus.

------
ecesena
Never found it, but if you build it count me in as a user. Same issue, same
solution.

------
plusepsilon
Thanks for the mention above (foxtype.com).

We're currently building an online editor checks

~~~
plusepsilon
Oops, accidently sent.

We're currently building an online editor that:

1 checks for compatibility of words in a sentence (essentially popularity) 2
give example sentences for a certain word 3 word suggestions depending on
context.

Language models would be a decent way to check popularity though it would be
noisy. Sentence level rewrites would be hard unless you make it template
driven.

------
hyperpallium
\incidental Use quotes (") for exact match, not apostrophes (').

------
0b01
[https://github.com/rickyhan/bodine](https://github.com/rickyhan/bodine)

This is a tiny tool I wrote a long time ago. There's also writefullapp.com
which is closed source.

------
kamillarott
I can suggest [http://samedaypapers.com/](http://samedaypapers.com/). It
always helps me)))

