
Shell scripts to improve your writing - jaybosamiya
http://matt.might.net/articles/shell-scripts-for-passive-voice-weasel-words-duplicates/
======
erlehmann_
There is also GNU style and GNU diction:
[https://www.gnu.org/software/diction/](https://www.gnu.org/software/diction/)

; cat <<EOF >/tmp/testfile

Diction and style are two old standard Unix commands. Diction identifies wordy
and commonly misused phrases. Style analyses surface characteristics of a
document, including sentence length and other readability measures.

These programs cannot help you structure a document well, but they can help to
avoid poor wording and compare the readability (not the understandability!) of
your documents with others. Both commands support English and German
documents.

EOF

; LANG=C diction --lang en --suggest /tmp/testfile

/tmp/testfile:2: These programs cannot help you structure a document well, but
[they -> (do not use as substitute for "each, each one, everybody, every one,
anybody, any one, somebody, some one")] [can -> (do not confuse with "may")]
help to avoid poor wording and [compare -> "Compare" to points out
resemblances, "compare with" points out differences.] the readability (not the
understandability!) of your documents with others.

3 phrases in 5 sentences found.

~~~
eriknstr
I like the way you format commands using semicolons at the beginning. I might
start doing this too when writing online.

------
empath75
Nobody should be using Strunk and White as a style guide after primary school.
There's nothing fundamentally wrong with passive voice, or adverbs, or any of
the other things that he mentions here.

[http://www.chronicle.com/article/50-Years-of-Stupid-
Grammar/...](http://www.chronicle.com/article/50-Years-of-Stupid-
Grammar/25497)

He doesn't even correctly identify what weasel words are. ("Some people say",
"It is believed", etc). I'm not sure why 'very close match' is any more
opinionated than 'close match' is. It's not as if the latter is precisely
defined.

~~~
gkya
I really don't get what's wrong with passive voice at all. In my mother tongue
(Turkish) and my L3 (Italian) passive voice is a part of the educated speech.
Certainly it is harder to use than direct speech (in most cases), but why
neglect it?

~~~
schoen
I agree with the criticism of the criticism of the passive voice (that is, I
think it's often appropriate to use it, and a blanket prohibition is wrong).

A couple of ideas:

* There's an idea that passive voice is inappropriate because it "avoids responsibility" (for example, because it does not say who made a decision or performed an action, where that information might be important). An example could be when an organization says "your application was denied" (where it would somehow seem more honest or more relevant to say "we denied your application" or "the vice president denied your application"), or in a political context referring to violence without referring to the perpetrator of that violence -- "thirty people were killed" (by whom?).

However, critics have pointed out that these concerns don't correspond
perfectly to the active/passive voice distinction, among other things because
we can still state who was responsible when using the passive voice and
because we can still avoid stating who was responsible when using the active
voice. Also, sometimes clear or honest writing wouldn't need to assign
responsibility at every moment, in every context, or in every sentence.

* There's an idea that the passive voice is inappropriate because it sounds too formal and hence makes writing less accessible, less enjoyable to read, or lends the writing an unwarranted air of authority. The active voice may sound more direct or straightforward in many contexts, while the passive voice may sound unduly formal, abstract, or academic.

This is probably also true, but doesn't appear to justify a blanket
prohibition either.

* Edit: also compare [https://en.wikipedia.org/wiki/E-Prime](https://en.wikipedia.org/wiki/E-Prime), a writing style that tries to avoid using the copula (certain uses of the verb "to be"), based on the view that it makes philosophically unjustified observer-independent claims that could be made more precise by showing _whose perception or belief_ is being described (like "spinach is yucky" vs. "George H. W. Bush dislikes spinach" or "George H. W. Bush finds spinach yucky"). The copula isn't the same as the passive voice, but this is a (controversial) example of another way in which people have suggested constraining their writing in support of "taking responsibility" for certain propositions or observations.

~~~
gkya
A response for each of you points, not really for starting an argument, but
for stating my view of the matter.

1) The language allows you to emphasise either the object or the subject.
Either way, though, both a passive and an active version of a given sentence
can hide or tell some information equally. Compare:

The pizza was eaten.

Somebody ate the pizza. (The amount of info these sentences give is
practically equivalent.)

It's more about the author, whether or not he wants to tell something to the
reader.

2) Passive need not be formal in all its uses, nor does every text need be
formal and accessible, and a formal text is not necessarily inaccessible.

3) That's a crazy nitpicking and a silly exaggeration. Copula-heavy text is
boring to read, but while the copula verb and the auxiliary verb for passive
forms are the same (to be), in its second role, it is not also acting as a
copula. The verb 'to be' is not the copula, but one of its uses is as the
copula. So if one wants to give up on copula, however mad that may be, he need
not give up on all the uses of the verb to be. That said, if it is an artistic
choice to avoid copula to the extent possible, I can't really criticise that.
It can't be affirmed as a general rule though.

Edit: Oh, also, in a sentence like "Bob seems terrible.", 'to seem' is
basically a copula. Copula is basically any verb that links the subject to a
predicative.

~~~
schoen
> A response for each of you points, not really for starting an argument, but
> for stating my view of the matter.

Thanks for responding.

> [...] while the copula verb and the auxiliary verb for passive forms are the
> same (to be), in its second role, it is not also acting as a copula. The
> verb 'to be' is not the copula, but one of its uses is as the copula. So if
> one wants to give up on copula, however mad that may be, he need not give up
> on all the uses of the verb to be.

I agree that E-Prime users should try to distinguish between copulative and
non-copulative uses of "to be" and that passives are non-copulative. By
mentioning E-Prime, I was just trying to draw an analogy with another way of
restricting language in the name of "taking responsibility".

> Edit: Oh, also, in a sentence like "Bob seems terrible.", 'to seem' is
> basically a copula. Copula is basically any verb that links the subject to a
> predicative.

According to E-Prime users, using "seems" instead of "is" could typically make
the scope and basis for disagreements clearer (because then you can talk more
readily about _to whom_ something seems a certain way?).

------
randomstring
I read the whole article thinking "this needs to be an emacs mode!" only to
get to the punchline at the bottom.

>> Benjamin Beckwith has contributed a "writegood" mode for emacs inspired by
these scripts.

This is going into my .emacs right now.

~~~
mortenlarsen
Funny coincidence that it was named "writegood" by a guy named Benjamin and
"Silence Dogood" was a pen name of Benjamin Franklin.

------
seanwilson
Is there anything like this for Google Docs, Gmail or Atom?

We've had spellcheckers for decades but I find it really surprising that
automated grammar and proofreading checkers aren't in common use yet. For
example, having my email client highlight overly long sentences, duplicate
words, ambiguous references (e.g. what noun does "it" refer to) and more would
undoubtable save proofreading time and doesn't sounds that difficult to
implement. I see online comments every few days of someone pointing out the
word loose/lose is used incorrectly for instance.

I recall that many grammar checkers suffer from false positives though but has
the technology not advanced?

~~~
confounded
VC funded browser extension:
[https://www.grammarly.com/](https://www.grammarly.com/)

~~~
jgalt212
These guys are huge youtube advertisers. Or at least youtube thinks me
writings skills could use some improvement.

------
carlosbarreto
Hello, I wrote similar script to do the Belcher diagnostic test in LaTeX
documents. This test consists in highlighting parts of the text with potential
problems (e.g., vague pronouns, weak verbs, and passive voice, among others).

You can find the instructions to do the test (and some good examples) in the
book Writing your journal article in twelve weeks: A guide to academic
publishing success.

The script and its documentation are here:

[https://github.com/carlobar/BDT_latex](https://github.com/carlobar/BDT_latex)
[https://github.com/carlobar/BDT_latex/blob/master/docs/docum...](https://github.com/carlobar/BDT_latex/blob/master/docs/documentation.pdf)

------
mrob
You may also be interested in LanguageTool, which catches many potential
problems in several languages. It's the best Free Software style and grammar
checker I've seen:

[https://languagetool.org/](https://languagetool.org/)

------
killercup
Oh, that reminds me: I rewrote that bash script (and a bit more) as a Rust lib
(+ CLI) for fun last year. If anyone wants to build on it:
[https://github.com/killercup/english-
lint](https://github.com/killercup/english-lint)

------
raverbashing
Looks like good guidelines overall, I just have one complaint

> Bad: We used various methods to isolate four samples. > Better: We isolated
> four samples.

The first sentence is right if a different method was used to obtain different
samples. This is relevant information

~~~
bryanrasmussen
I thought replacing 'quite difficult' with 'difficult' was unfair because I
have always supposed quite to mean very or extremely when used in this manner.

~~~
hyperpape
Later he cautions against adding 'very', so he's consistent at least.

~~~
bryanrasmussen
what about slightly difficult. It's basically a language denuded of gradation.

~~~
hyperpape
I think absolutism on this point is a bad idea. But I do think that many
people, including myself, overuse qualifications in contexts where they don't
add anything. In the example of a close match, what distinguishes a "very
close match" from a "close match"? Better to omit the "very" unless you can
quantify it, or otherwise make it clear what it adds.

------
sdegutis
So, on a related note, someone wrote a blog post a few years ago that just
perfectly epitomized the annoyingly pretentious writing style everyone in the
tech world seems to have, and I recommended to him that he use simpler phrases
and words and sentences. Everyone in that chatroom criticized me as both an
idiot and an asshole. Skip ahead a year or two, and PG writes the same fucking
thing in a blog post and posts it here, and everyone praises him. What I took
from this is that I'm not actually an idiot after all, and I should probably
stop listening to people who say that I am.

~~~
teach
First of all, this isn't written by Paul Graham; it was written by Matt Might
and posted by someone else entirely. Just because it got upvotes on Hackernews
doesn't mean that PG was involved.

Secondly, there is a difference in context, timing and tone in an article like
this. Here, a PhD supervisor is speaking generally about the sorts of errors
his students tend to make.

In your situation, you were "attacking" a single person publicly. If you had
had the social grace to make your comments _privately_ and in person (rather
than in a chatroom) they probably would have been better received.

If your takeaway from this post is just to "ignore the haters" then I'm afraid
you're missing out on some real opportunity for self-improvement.

~~~
bdowling
He was probably referring to this article.

[http://paulgraham.com/talk.html](http://paulgraham.com/talk.html)

~~~
sdegutis
Yep that's exactly it. And I had the same thought of "this is the last straw"
too that Paul's talking about there. I just got fed up with seeing everyone
write like that, _everywhere_. Thanks for the link.

------
ScottBurson
In the footnote, what is that comma after "Regehr" doing there? Delete it!

But yes, "note that" is a bugaboo I have to battle in my own writing. With
rare exceptions it can just be deleted; occasionally it indicates that the
following point deserves more emphasis than I have given it.

------
feld
I've kept these in my GitHub for a while now

[https://github.com/feld/technical-writing](https://github.com/feld/technical-
writing)

------
emmelaich
[edit: I see @erlehmann_ beat me to it but am leaving it here anyway]

I would be great if someone could enhance the programs `style` and `diction`
[1] to incorporate these hints. And make a browser add-in for good measure!

I used `diction` religiously back when I used a UNIX System V system -- it
helped me a lot.

1\.
[https://www.gnu.org/software/diction/](https://www.gnu.org/software/diction/)

------
BuuQu9hu
Another similar tool: [http://proselint.com/](http://proselint.com/)

------
5706906c06c
Love this! English is my third language; I often fall for using passive voice,
adverbs or fillers. Much like this script, I found Grammarly to be extremely
helpful in forcing me to rethink the above when composing.

------
danso
I do something similar to the OP. When I'm working on a long project with
multiple pieces that I might have left incomplete, I'll use a grep-like (like
ack, which has PCRE) to quickly look for placeholders or cusswords:

    
    
           ack -C -i 'tk|to ?do|lorem|fu.k|shit|wt[fh]|[!?.]{3,}'
    

Maybe the OP's goal is to have his phds practice more shell scripting and
syntax. But it seems the same effect could be achieved with grep and the flag
to filter from a file of patterns, rather than creating an unwieldy single
string to enumerate all the possible words. Instead of having to write if/else
logic to provide a lackluster CLI, have students create a repo of weasel words
and use git clone/curl with grep.

I didn't read through his third script for detecting duplicate words, but
couldn't it be achieved by using PCRE regex and backreferences?

[http://stackoverflow.com/questions/2823016/regular-
expressio...](http://stackoverflow.com/questions/2823016/regular-expression-
for-consecutive-duplicate-words)

Off-topic, but I've been meaning to write a post on how learning the command-
line made me a significantly more productive writer. I do most of my writing
on sites built from static site generators, such as Jekyll and Middleman and
Sphinx. For many of my tutorials, I have to describe graphical elements which
require taking screenshots.

I of course know the OSX keyboard shortcut to turn on the screen grab utility
and interactively make a selection. But this saves the screenshot to a default
location with a generic file name. To include that image in my blog, I have to
move it over to my working directory, rename it, and then write the img code
and src attribute to my blog post. It's enough annoying small steps that
including images in my posts was a huge chore.

Sometime ago, this blog post on OS X Terminal Utilities [0] made it to HN's
front page and I learned that screencapture could be invoked from the
Terminal. So I wrote a little Ruby wrapper that, when invoked from the
command-line with an argument for output path, would call screencapture after
a 2-second delay -- enough time for me to Cmd-Tab from a Terminal to the
application I want to screensnap -- and then save the snap to the specified
destination and output HTML/Markdown that I could paste into my blogpost.

Sample usage:

    
    
          $ screenpy images/path/to/screenshot.jpg
    

stderr:

    
    
          Writing to: images/path/to/screenshot.jpg
    	Format: jpeg
    	quality: 75
    	optimize: True      
          ![image screenshot.jpg](images/path/to/screenshot.jpg)
    

stdout:

    
    
           <img src="images/path/to/screenshot.jpg" alt="screenshot.jpg">
    

I've iterated the tool, converting it to Python and including the was-sdk so I
could upload to S3 if I need an absolute URL. And I've written plenty of other
utilities since...but it's hard to overstate how much being able to operate
via CLI has smoothed my writing experience. It's not just that it saves me
time, but I'll write visual-heavy posts that I would have never even tried,
especially back in my Wordpress days.

[0] [http://www.mitchchn.me/2014/os-x-
terminal/](http://www.mitchchn.me/2014/os-x-terminal/)

[1]
[https://gist.github.com/dannguyen/bfb45408d43986eefdf83b59bc...](https://gist.github.com/dannguyen/bfb45408d43986eefdf83b59bc9e8629)

~~~
ams6110
For linux, the scrot utility will do that.

[https://en.wikipedia.org/wiki/Scrot](https://en.wikipedia.org/wiki/Scrot)

~~~
icebraining
They say it follows the UNIX philosophy, but that's clearly not right;
delaying is a job for sleep(1). xwd is the true UNIX philosophy abiding
screenshot tool :)

[https://en.wikipedia.org/wiki/Xwd](https://en.wikipedia.org/wiki/Xwd)

------
kazinator
> _My Ph.D. advisor, Olin Shivers, ..._

And you're proudly shell scripting. :)

------
plg
A fantastic book for improving your writing:

On Writing Well by William Zinsser

