

Message threading (1997-2002) - tdonia
http://www.jwz.org/doc/threading.html

======
songgao
I was interning in Rackspace this summer. For a while we were working on an
email notification sender and we wanted email clients to group messages
related the same event into a single conversation. However, different "event"
might have same message subject, and we wanted to suggest how messages should
be grouped. So we looked into "In-Reply-To" (RFC-822) and
"References"(RFC-2822) fields. We ended up implementing RFC-2822 since it
obsoletes RFC-822 and we figured if we want our message grouping work on most
email clients, the safest way was to use the up-to-date standard (2822).

Interesting fact is that, among three clients we tested, only mutt faithfully
implemented the standard. It honestly grouped all messages referenced to the
same ID into the same parent, despite subject or sending time. However,
neither Gmail or Outlook respects the "References" field.

In Gmail, it seems subject of the message plus [one of <time of message sent>
and <References>] are used for grouping. But it certainly doesn't exclusively
rely on "References" since we got messages referenced to same parent message
grouped into different conversations.

In Outlook, "References" field is ignored completely. It only relies on
subject of messages. We got messages for different "event" from more than 10
days from each other, grouped into same conversation.

~~~
jvehent
People who haven't taken the time to work with mutt sometime fail to
understand how convenient and user friendly it is. Pretty colors and nice
interfaces do not always increase productivity. Sometimes displaying threads
properly is all that's needed.

Also, the "mark thread as read" feature in mutt (^R) is a life saver when
sorting through dozens of discussions.

~~~
ubernostrum
I have taken the time to try to work with Mutt, and some other console email
tools. I would love to be using something like that full-time, since I
practically live in terminal windows.

But ultimately I can't; I don't need pretty colors or graphical tricks. I _do_
need something which understands that it's not 1975 anymore. My mail no longer
lives in a local spool file, and there is no longer a local sendmail. I have
multiple email addresses, which use IMAP, and remote SMTP.

And none of the console clients -- including Mutt -- can really do that. Every
couple years I try Mutt again, and try whatever the current crop of attempted
successors are, and get thrown right back into 1975 again and give up in
frustration.

(and Mutt, last I checked, actually considers it both a feature and a point of
philosophical purity/pride to refuse to acknowledge the fact that anything
other than "shell out to local sendmail" exists. Also, IMAP was unbearably
slow, usually requiring a full re-fetch/re-index of the entire remote inbox,
potentially hundreds of thousands of messages, every time Mutt started up, and
multiple IMAP accounts were an unholy mess)

~~~
yrro
While I find working with multiple accounts awkward, I find that Mutt has no
problem submitting mail via SMTP; and once I enable message header & body
caching, IMAP accounts become perfectly fast to use.

~~~
danieldk
And if you have GMail, you can patch it to do server-side search as well:

[http://people.spodhuis.org/phil.pennock/software/mutt-
patche...](http://people.spodhuis.org/phil.pennock/software/mutt-patches/)

Still, I like Mail.app better, since it supports multiple accounts well and
has excellent search.

------
greenyoda
1\. I'm impressed by the amount of analysis and the clarity of thought that
went into designing this algorithm. It's not just something you can sit down
at the keyboard and pound out.

2\. This is a great example of the perils of re-writing code that you don't
completely understand:

 _4.0 eliminated the "dummy thread parent'' step, which is an absolute
necessity to get threading right in the case where you don't have every
message (e.g., because one has expired, or was never sent to you at all.) The
best explanation I was able to get from them for why they did this was, "it
looked ugly and I didn't understand why it was there.''_

~~~
wrs
Re (2): This is also a great example of the perils of writing a complex
algorithm and failing to provide overview documentation (like this article!),
not just inline comments.

However, in the absence of an overview, if the code that you don't understand
was written by jwz, you _might_ want to study it very hard before removing it.
:)

~~~
enneff
Doesn't matter who wrote it. If the code works you'd damn well better
understand it completely before rewriting.

~~~
zmmmmm
> If the code works you'd damn well better understand it completely before
> rewriting

But then, one of the primary reasons that usually motivates a rewrite is that
nobody understands the old code. (sometimes acknowledged as such, and other
times indirectly in the form of "every time we try to fix a bug we break
something else, this code is terrible").

~~~
greenyoda
Re-writing code that nobody understands and hoping that it will work correctly
is just wishful thinking. If people really wanted to understand the code, they
could do the hard work of reading it, tracing it, writing tests for it, etc.
In many cases, you can transform the code into something readable by applying
a long series of simple refactorings.

For some code, breaking behavior you don't understand doesn't make a
difference. If you're Facebook, you can arbitrarily change the user interface
of your site and your users have no say in the matter, since they're not
paying customers. But other developers don't have it that easy. If you support
code that people have built their own applications on top of, you can't just
break stuff. If you make backward incompatible changes to the Linux kernel
APIs, or break Microsoft Excel so that macros that have been working for years
stop working, people all over the world will be very unhappy.

~~~
zimpenfish
If you have a well defined spec and a reasonable test suite, you can throw
away the code nobody understands and still have the replacement code work
correctly.

The chances of having a well defined spec and reasonable test suite in a place
where there's code nobody understands are left as a calculation exercise for
the reader (but I'd start at 5% and work downwards.)

~~~
greenyoda
It's not likely you'll ever find an accurate spec for a piece of legacy
software that's been around for years. Even if there was a spec that
completely defined the _original_ behavior (which is doubtful), that spec
probably won't reflect all the new features and other changes that were added
over the years. You'd have to merge the original product spec with all the new
feature and change specs and hope that you didn't miss anything. (In many
cases, it takes a bug report to realize that an item in the spec was defined
incorrectly, incompletely or ambiguously.)

In practice, I think that the only "spec" that's likely to capture the exact
behavior of the code as it exists today is the code itself.

Also, the lack of a complete spec probably implies a lack of a complete
acceptance test suite. Note that I said "acceptance test", not "unit test".
Unit tests from the original code are useless for testing the re-written code,
since they're specific to the particular implementation of the product you
already have, which may have a completely different set of classes from the
one you'll be replacing it with.

~~~
zimpenfish
Indeed and I covered all that in my second paragraph.

------
gregschlom
I implemented jwz's algorithm for my now defunct email client
([http://betterinbox.com](http://betterinbox.com))

It was fun and worked extremely well, though it did give different results
than gmail on some instances.

~~~
zura
It looks quite interesting, I wonder what went wrong.

~~~
gregschlom
Several things:

1\. Got tired after working on it for a year, mostly on my own. I felt the
need to join a team doing something bigger.

2\. Ran out of money

3\. Wanted to relocate to Silicon Valley, but as a foreigner it would have
been too complicated to move with my startup.

4\. The project was too ambitious. A full blown email client is _hard_ to
write.

5\. I was a Windows guy a that time, but all potential early adopters were OS
X users. Though we did had OS X support, the app wasn't as nice or polished as
it should have been.

It's interesting because we started working on this roughly at the same time
as the Sparrow team. In the end, they released way before us because they
focused on a narrower niche (simple gmail client for OS X, instead of the
cross-platform email + todo list manager that we were doing). They won :)

------
mfincham
For what it's worth, Balsa
([http://pawsa.fedorapeople.org/balsa/](http://pawsa.fedorapeople.org/balsa/))
implements this as a threading option.

Edit: pointed to correct URL

------
hendry
You could use Dovecot's "thread references" to produce an appropriate data
structure from a variety of mail stores.

See "Write a decent mailing list Web archive system" on
[http://suckless.org/project_ideas](http://suckless.org/project_ideas) for an
example.

------
pestaa
Very insightful article. I do wonder though if "say no to databases" still
stands as of now. I agree that performance-wise files are hard to beat for
most problems, but we're storing data in databases because they provide
guarantees a filesystem doesn't, eases deployment and configuration, etc.

~~~
rogerbinns
The problem with Netscape 4 was that it introduced a database that was (in
theory) human readable, but poorly specified, buggy and inconsistent. If you
are going to change then things should be better on at least one axis, and
preferably no regressions on the others.
[https://en.wikipedia.org/wiki/Mork_(file_format)](https://en.wikipedia.org/wiki/Mork_\(file_format\))

It is also worth pointing out this era predated SQLite.

~~~
fennecfoxen
See also: [http://www.jwz.org/hacks/mork.pl](http://www.jwz.org/hacks/mork.pl)

------
jbverschoor
I'm always frikking annoyed by gmail and mail.app and airmail app with the
fact that they try to guess a thread..

Messages with the same subjects are not threads!

------
longlivedeath
I read the title as an epitaph.

------
taeric
Can we look forward to this coming to twitter soon? :)

~~~
martindale
Twitter has an explicit "in_reply_to" field which references the parent tweet
ID. Since they've seized control of most clients (and most clients implement
the correct parameters), this has wildly increased the coherence of Twitter
threads, at least in comparison to a few years back.

~~~
taeric
Apologies, I made a poorly directed joke at the new UI of twitter. I'm
honestly not against it, personally, but I know it has garnered the ire of
quite a few of my friends.

------
frozenport
It tickles my fancy thinking about an era when C++ was compared to C.

