

Writing Smell Detector (WSD) - a tool for finding problematic writing - john_horton
http://onlinelabor.blogspot.com/2012/02/word-smell-detector-wsd-tool-for.html?spref=tw

======
GertG
I like the idea, but have some remarks/questions:

1\. I don't understand some of the rules at all, possibly because I haven't
read "Style: Toward Clarity and Grace". In particular, the that/which rules
seem to detect any that or which after two words. Why?

2\. With just the rules that you've implemented and from your example output,
it's already very clear that this approach is too crude to be practical. It
simply gives too many false positives. For example, of the 5 "smells" found by
the a/an rule, 4 are false positives.

To fix this rule, and many others, in a satisfying manner, you'll need to go
beyond "knowledge-free" regular expression parsing and add linguistic
knowledge to the system (word lists, part of speech tagging, pronunciation,
syntactical parsing etc.).

For one example where you'd really need not just annotated word lists but a
proper grammar, look at the false positive for the "No serial comma" rule: "In
traditional markets, buyers and sellers are responsible for making".

3\. I really like the fact that the rules are easily extensible. It might be
challenging to maintain that while allowing for the kind of linguistically
rich rules proposed above.

4\. Even in its current form, the system could do with some indication of
weight and/or certainty. Obviously, not all rules are equally important, nor
equally reliable. After adding linguistic knowledge, this would become all the
more important, since linguistic ambiguity makes many rules probabilistic by
definition.

~~~
js2
that/which is probably there to highlight the dubious restrictive vs non-
restrictive "rule": <http://andromeda.rutgers.edu/~jlynch/Writing/t.html#that>

~~~
john_horton
yep - and I agree about the dubiousness (see my comment above).

------
lutusp
It looks like a nice tool, but I bet it can't detect the really difficult-to-
detect but important kinds of errors, like passive voice (i.e. "the grass was
eaten by the cow" -> "The cow ate the grass").

I look forward to the day when these programs become so clever that they cross
a threshold and begin to actually teach people how to write effective prose.

~~~
jmah
That's not an error, it's a valid stylistic choice.

~~~
alttag
In scientific writing and inter-office memos the use of passive voice is
strongly discouraged by pretty much every style guide. <smirk>

Sure, it's a stylistic choice, but one best adopted by writers whose purpose
is entertainment, not conveying information. Active voice is nearly always
simpler and clearer.

~~~
sunchild
Anything to do with lawyers, too.

------
lazylland
The best way I've seen to get feedback on such issues is while you type. I
just saw the result list, and I'm a bit overwhelmed with the output.

I've resorted to writing the draft in MS Word where I get basic grammar checks
as I write.

------
Eventh
Did you release it under an open-source license? I couldn't find any license
on the github repo.

Without a license we don't know how we are allowed to use it, and/or modify
it.

~~~
john_horton
yeah - sorry. I'll add a GPL notice to the repository tomorrow.

~~~
andreasvc
In the betesnoire.json rule file "irregardless" is misspelled as "iregardless"
(funny that a non-word can also be misspelled...). Didn't feel like forking
for that one character, and unfortunately I couldn't comment via github.

------
pepijndevos
It'd be fun to have this as a Wordpress plugin.

