
One sentence per line, please - kibwen
http://rhodesmill.org/brandon/2012/one-sentence-per-line/
======
CJefferson
Or, you can use a diff command which understands word diffs. git has '--word-
diff', svn unfortunatly doesn't have such an option, but
'<http://www.sable.mcgill.ca/~cpicke/swd/> provides a nice script (there are
others). Don't know about mercurial.

I hate working with authors who try to force line breaks into text
unnaturally. I have heard many justifications for it over the years, but I
find it hard to understand why anyone would do it, other than because their
tools (soft word wrap in the editor, word-based diffing) are terrible.

~~~
keithpeter
The original article did say "In the tutorial, I ask students whether or not
the Sphinx text files in their project will be read by end-users."

I certainly would not send a file with 'semantic linefeeds' to _anyone else_.
I use markdown quite a lot and LaTeX a little, so I'm happy with

    
    
      source file -> formatter -> file for reading
    

The humble fmt (Linux) deals with a file with 'semantic line feeds' and
produces reasonable paragraphs. I think I will try this out as a lot of text
editors have line oriented tools. It might be that the result of a lot of
rearranging that way will resemble a Burroughos cut-up, but I shall see.

I'm sure someone will come up with a regexp that can take a 'standard' text
file and split the lines on full stops and commas to restore an edited
readable text file to 'semantic linefeed' source, thus allowing round trip
copy editing.

~~~
lutusp
> I'm sure someone will come up with a regexp that can take a 'standard' text
> file and split the lines on full stops and commas to restore an edited
> readable text file to 'semantic linefeed' source, thus allowing round trip
> copy editing.

Yes to the first, no to the second. The reason is that the act of recombining
lines into paragraphs makes the assumption that lines broken by single
linefeeds need to be merged into a paragraph. But in text, a list of items,
meant to be read as a list, is also broken by single line feeds, and _must
not_ be turned into a paragraph.

One often sees posts here by beginners that include a list of items, but the
rendered version assembles the list into a (typically unreadable) paragraph.
More experienced hands know to break the list up with double linefeeds to
defeat the "intelligent" reformatting algorithm.

The bottom line is that a recombining algorithm cannot distinguish a list of
items from a paragraph of individual lines. The act of breaking text into
lines loses information irretrievably.

Allow me a prediction: All these conventions that break text into individual
lines and then try to reassemble them, i.e. this forum, the e-mail convention,
and a thousand other examples, will eventually be abandoned in favor of
leaving the text alone. This will happen when people realize they're throwing
away information that cannot be recovered.

When I wrote Apple Writer in the late 1970s, the first change I made to common
practice was retain the paragraph structure people naturally used in entering
text (even though the displayed text was broken into lines on word
boundaries). At the time, this was a bigger departure than it is now, and it
helped make my program successful. But if I had been told then that people
would still be defending the practice of breaking text into individual lines
35 years later, I would have laughed out loud.

~~~
keithpeter
point taken for plain text files, but I use markdown so list items have
asterisks at beginning of the line, so I have a pattern to distinguish list
items from flowing text.

"But if I had been told then that people would still be defending the practice
of breaking text into individual lines 35 years later, I would have laughed
out loud."

Laughing is good for you! If I read you right, you invented the soft line
wrap? Excellent!

------
lutusp
> One sentence per line, please

I beg to differ: One paragraph per line, please. The natural lexical unit is
not the sentence, but the paragraph. These three sentences belong together as
a unit, and should be separated from other paragraphs by a double linefeed.

If someone wants to later break paragraphs into separate sentences for some
reason, it's child's play, and that option is implicit in this formatting. But
if someone wants to reassemble individual sentences into paragraphs, as anyone
knows who has tried to reassemble lines into paragraphs (as with an e-mail in
its delivered form), it's nearly impossible to get right.

Complete thoughts reside in paragraphs, groupings of sentences, not in the
sentences. One paragraph per line, please.

<http://en.wikipedia.org/wiki/Wikipedia:Dont_use_line_breaks>

A quote: "Do not use manually entered hard line breaks within paragraphs when
editing articles."

A long list of justifications follows in the article.

~~~
jmmcd
Diff and merge algorithms are line-oriented. If a whole paragraph is on one
line, and I edit one word, then the diff consists of the whole paragraph,
which is bad. Some tools are able to do word-by-word diffs, as mentioned
below. But none are able to merge correctly in the scenario where two branches
have each edited one word in the same line.

I rarely edit Wikipedia so I don't know what its diff algorithm does. But I'm
pretty sure it doesn't do any merging in any scenario, so that partly explains
why one clause per line would not help there. That's quite different from
Sphinx and TeX, which are usually stored in merge-capable version control. I
think all of Wikipedia's justifications are specific to the case of editing in
a text-box on the web and without diff/merge algorithms.

> if someone wants to reassemble individual sentences into paragraphs, as
> anyone knows who has tried to reassemble lines into paragraphs (as with an
> e-mail in its delivered form), it's nearly impossible to get right.

The post proposes a single line-break after each clause or sentence, and then
a double line-break after each paragraph. TeX and Sphinx do the right thing in
those cases.

~~~
bunderbunder
> Diff and merge algorithms are line-oriented.

No, diff and merge algorithms are symbol-oriented.

For reasons that are as historical as they are technical, most diff and merge
_programs_ choose to break documents up on a line level of granularity in
order to produce the symbols that are passed into the algorithm. But that's a
design decision, not a technical one.

A diff/merge program that operates at a word level of granularity should be
just as capable of handling two words edited on the same line as a line-
oriented diff program is of handling two lines edited in the same function.

> The post proposes a single line-break after each clause or sentence, and
> then a double line-break after each paragraph.

It's a lot easier to rewrite the typographical conventions a piece of software
conforms to than it is to rewrite the typographical conventions that millions
of humans grew up using. Teaching the diff/merge program to recognize that
CRLF isn't the only text boundary out there would achieve the same effect* at
much lower cost.

*I realize that abbreviations complicate it somewhat. I'd submit, though, that if a basic diff/merge program is being relied on too closely in a scenario where that actually causes any consequential problems then the real error might be between Mr. Diff User's keyboard and chair. For normal diff usage 'failed' symbol boundary determinations like that are fine, the same as how a traditional line-oriented diff program doesn't critically suffer from the way it would interpret me inserting a carriage return into a line of code.

~~~
jmmcd
> No, diff and merge algorithms are symbol-oriented.

Point taken, and as soon as someone puts word-by-word merging into svn or git,
I'll change my opinion.

> I'd submit, though, that if a basic diff/merge program is being relied on
> too closely in a scenario where that actually causes any consequential
> problems

[EDIT removed some response -- maybe "that" referred to a narrower scenario
than I thought and parent didn't intend any insult.]

------
cperciva
FreeBSD has a strict rule that each sentence should always begin a new line.
The reason isn't so much to simplify editing (not relevant with modern
editors) or to make diffs more compact (size hardly matters); rather, the
biggest reason is to make "svn blame" work better.

------
jmmcd
This is a topic that has often bothered me when collaborating with people on
writing latex.

I use Emacs with auto-fill-mode and Meta-Q to fill paragraphs. This usually
works out ok in my own files because usually a re-fill of existing text only
affects a few lines.

When other people get involved, diffs and merges are ruined, as the article
says.

But the article's solution sounds like a lot of work. Do I have to manually
break lines that get longer than 80 characters (or whatever my limit is)? Am I
supposed to turn soft word wrap on?

I think the right solution is to rebind Meta-Q in Emacs to some magic command
that refuses to reflow any text which is reported as unmodified by the version
control, but does reflow new/modified text according to the article's rules,
and also imposes an 80-character limit.

Edit [[http://stackoverflow.com/questions/539984/how-do-i-get-
emacs...](http://stackoverflow.com/questions/539984/how-do-i-get-emacs-to-
fill-sentences-but-not-paragraphs)] has a lot of solutions for getting Emacs
to fill according to the article's suggestions.

~~~
duskwuff
> Do I have to manually break lines that get longer than 80 characters (or
> whatever my limit is)? Am I supposed to turn soft word wrap on?

In most situations, there should be _some_ sort of semantic break in your
sentences every 80 characters or less. (It may not be demarcated with a
comma.) If there isn't, you may want to consider reworking those sentences for
clarity.

~~~
mturmon
Yes. When writing LaTeX, I put line breaks in after phrases and clauses, as
well as at every sentence end. I'm following exactly Kernighan's advice from
old nroff documentation.

As a result, my text is very ragged right. I don't even notice it because I'm
concentrating on meaning, not form.

------
humdumb
I love the PWB! [line break] Sometimes the best stuff does not have the best
marketing.

Any chance the author can post a copy of the documentation his father had
saved for the Documenter's Work Bench?

So many UNIX utilities are line-based, paragraphs just complicate things.
[line break] Yet we still type in paragraphs.

You can take the above text and feed it through fmt (one of my favorite
utilities) and you get an opening paragraph with two sentences, a single line
paragraph, and a final paragraph with two sentences. You can control the line
length too. Want 40-column output for better readability? Easy, when using
fmt. But you need input that is single lines.

Have you ever ran PDF's through pdftotext or pdftohtml and been frustrated by
the formatting? Line breaks from hell.

If documents were distributed in the format the author describes we could
convert them into PDF's and other pretty printing formats. But converting from
these "paragraphed" formats into readable plain text can be a real nuisance.

------
keithpeter
<http://vanemden.wordpress.com/2009/01/01/ventilated-prose/>

'Ventilated Prose' was a term used by Buckminster Fuller. The blog author
linked above is drafting rapidly, then adding line breaks after each
sentence/clause as an editing aid. He mentions Vi and the use of dif

[http://lists.canonical.org/pipermail/kragen-
discuss/2008-Mar...](http://lists.canonical.org/pipermail/kragen-
discuss/2008-March/001086.html)

Found in the comments to the blog post linked above. Just parking these for
the inevitable return of this topic.

------
chj
Nice read. Didn't know that this style was documented by bwk.

------
tgb
I've been using semantic line breaks for a while for LaTeX since, using Vim,
this is the only editing style that is sane on default settings. But that
shouldn't necessarily be true; what can I do to make vim friendlier to work
with files that are one-paragraph-per-line or similarly formatted? I'm not
particularly concerned about version control just editing.

------
lmm
I found the incredibly thin page unreadable (had to use Clearly). One sentence
per line makes sense for the "source" that you edit, but use something like
Latex or Markdown so that we don't have to read it in that form.

------
Camillo
This clown doesn't even know what a sentence is. In his first example, he has
1/5 sentences per line. Then he changes it to 1/7 sentences per line, i.e. he
moves _away_ from his supposed "one sentence per line" target. Yes, in the
text of the article he admits that maybe he was thinking about clauses, but I
have a zero tolerance policy towards objectively wrong titles.

And of course, the real solution to the problem of "fussing with the lines of
each paragraph so that they all end near the right margin" is to use a text
editor that soft-wraps lines. Yes, computers have recently become powerful
enough to make that possible while editing the document! Amazing.

~~~
jdp
The title is _One sentence per line_ , but he clarifies to include clauses in
the body of the article. This also has nothing to do with where the margin
ends, but editing text in developer formats that is later transformed to end-
user formats in a way that gels well with the tools of the Unix environment.
Your comment is unnecessarily venomous and not representative of the contents
of the article.

~~~
Camillo
Have you noticed how Github helpfully shows which words have changed inside a
line when looking at diffs? If we stop teaching people to bend over backwards
to accomodate 70s technology, maybe we'll have more young hackers fixing our
tools.

~~~
jdp
> Have you noticed how Github helpfully shows which words have changed inside
> a line when looking at diffs?

GitHub's web UI only does a line diff. Which is not particularly helpful when
you change a word or two in a six sentence paragraph. It's possible to do a
word diff locally of course: `git diff --word-diff` but that's not the general
use case for a code host, right now most code is line oriented and the UI
suits it well. That is one of the reasons that the article advocates
formatting thoughts with newlines, good portability. These input texts are
closer to code than prose, so why format it like it was? Splitting thoughts
into units digestible by your coding environment has the huge benefit of
working with individual thoughts instead of individual paragraphs.

