
Ask HN: What's with the state of open-source dictionaries? - Bodell
I&#x27;ve been writing recently, using libreOffice. Not very far in, but I&#x27;ve already have had to add a fairly long list of words to the dictionary. I also have noticed that the browser highlights a lot of these as well. So my question is whats going with the state of of these open source dictionaries, and how do we fix it? List words below:<p>amongst
another&#x27;s
cohabitate
conception
decontextualized
depersonalization
easeled
else’s
experientially
gamified
incompatibilist
incongruencies
oxymoronic
pathologies
replicable
tangly
telecom
unimpassioned
unmoving
untransmittable
======
Someone1234
Firefox's dictionary has the same issues/limitations, with most of the same
words being missing. It is actually a legitimate weakness of the browser
(particularly compared to Chrome which has fantastic dictionary).

You could use something like e.g. Grammarly but their privacy policy is
frankly shocking[0] (they take & store all text you ever write for as long as
they like for any usage that they like, and record every site you visit, your
geographical location, device info, third party cookies, etc).

> we may keep some of your Personal Data for as long as reasonably necessary
> for our legitimate business interests

#

> you may contact Grammarly to request deletion of your data. Grammarly will
> evaluate such requests on a case by case basis, pursuant to our legal
> obligations.

#

> Does Grammarly review User Content? to improve our algorithms as described
> in the User Content section of our Terms of Service.

#

> Does Grammarly share my Information? We need to do so in connection with a
> merger, acquisition, bankruptcy, reorganization, sale of some or all of our
> assets or stock, public offering of securities, or steps in consideration of
> such activities (e.g., due diligence). In these cases some or all of your
> Personal Data may be shared with or transferred to another entity, subject
> to this Privacy Policy.

Etc etc etc etc.

[0] [https://www.grammarly.com/privacy-
policy](https://www.grammarly.com/privacy-policy)

~~~
deskamess
Is there a grammarly alternative? Their product is pretty useful.

~~~
thekyle
I use LanguageTool which is open source, although I don't know how bad their
privacy policy is compared to Grammarly.

It also has native LibreOffice and Google Docs extensions which I believe
Grammarly lacks. [https://languagetool.org/](https://languagetool.org/)

------
natmaka
AFAIK many dictionaries don't have all forms (past participles, possessive
"'s"-suffixed forms of all words...)

Debian Linux "wbritish" contains 99k+ words, including only 3 of the words you
listed (amongst, conception, conception's).

$ dpkg -S /usr/share/dict/british-english wbritish: /usr/share/dict/british-
english $ wc -l /usr/share/dict/british-english 99156 /usr/share/dict/british-
english $ egrep -we $(echo "amongst another's cohabitate conception
decontextualized depersonalization easeled else’s experientially gamified
incompatibilist incongruencies oxymoronic pathologies replicable tangly
telecom unimpassioned unmoving untransmittable"|tr ' ' '|')
/usr/share/dict/british-english amongst conception conception's

------
sorryforthethro
I believe these use FreeDict project [https://github.com/freedict/fd-
dictionaries](https://github.com/freedict/fd-dictionaries) but I'm not sure
how change requests flow down or upstream.

------
chrido
Langtool[1] is Open Source and can be either used with a subscription or
selfhosted. Works well when used in combination with a few gb of n-grams [2].
There is also a Libreoffice plugin and most important also one for emacs.

[1] [https://languagetool.org/](https://languagetool.org/) [2]
[https://languagetool.org/download/ngram-
data/](https://languagetool.org/download/ngram-data/)

------
NikkiA
Some of this must be a US dictionary specific thing, because the GB dictionary
(in firefox) isn't quite so awful[0], not far off, but a bit better.

e: It appears that M-W lists amongst as a 'less common' variant of among, so
I'm guessing that whatever was the original source for the US dictionaries in
LO/OO/Firefox just took the most common words - it's possible that this was a
space optimisation that made sense in 2002 or something, but doesn't today.

[0] [https://i.imgur.com/5D0SDXH.png](https://i.imgur.com/5D0SDXH.png)

------
aibara
What English dictionary are you using? Using LibreOffice (US English) with no
custom words, 10 of the words on your list appear as correctly spelled
(amongst, another's, conception, depersonalization, else’s, experientially,
pathologies, tangly, unimpassioned, and unmoving).

But in general I do have problems with this, and I've added hundreds of words
to my custom dictionary.

------
Bodell
After downloading the dictionary from libreOffice, it seems there are only
~53,000 entries. Compared to the OED's second edition of nearly 300,000
entries, I would say that is staggeringly small.

------
sansnomme
A more interesting question is, how do you do spellchecks lazily and
efficiently on arbitrarily large documents that's being edited in real-time?
Rerun on modified sentence? Paragraph?

~~~
saghm
Given for most editors, a cursor can only be in a single place at a given
time, you really only need to check the word where the cursor is (i.e. where
the most recent change happened) in real time to make it appear instant to
most use cases. I'd bet that if you pasted a large amount of text into the
average editor at once, the spell check would take a bit longer to "catch up".

~~~
Someone
Also, even if you paste a large amount of text, spell checking likely can be
done as good as instantaneously.

Think of it:

\- whatever you use to indicate spelling errors should not affect text layout,
so you can layout the text on the screen in a single thread while another
thread spell-checks it and adds markers where needed.

\- You only have to spell-check what is on the screen, say 2,500 words max.

\- short words likely are spell-checked in about zero time, and longer words
fill the screen more rapidly, so you can have fewer.

\- modern CPUs are crazily fast.

I suspect sansnomme’s question is about grammar-checking/more advanced
linguistic analysis, though.

For that, “you only have to check what’s on the screen” and “modern CPUs are
crazily fast” still apply.

