
Cleaning up from an IMAP server failure - Aoyagi
http://blog.fastmail.fm/2014/03/13/cleaning-up-from-an-imap-server-failure/
======
sz4kerto
That's why I use Fastmail. For some reason, I feel that they're pretty clear
about what they are doing. I had a problem a few months ago, one of my Sieve
rules stopped working. The second reply for the support ticket already came
from a developer, and we figured it out quite quickly.

At Google, I felt that there's no human I could ask, at Microsoft
(Outlook.com) there're the incredibly simple, low-paid humanoid robots pasting
stupid 'answers' on answers.microsoft.com (things like I should try clearing
the browser cache, uninstall plugins, etc. when there's something wrong on
their side).

~~~
drfritznunkie
Same here. I've been slowly migrating all my mail off Gmail as it's
continually gotten slower and less reliable for me. Not to mention that FM
actually does backups, has a great webclient with nice logical keyboard
shortcuts, and is working on a CalDav complaint calendaring solution. Now if
they'd only develop a replacement for Google Voice...

And it needs to be mentioned, they're the gold standard for how a private
company interacts and supports open source projects. I used to run a large
Cyrus installation, and their contributions to the project as well as the time
and effort they put into supporting Cyrus users on the mailing list is
exemplary and something I wish happened more often. I'm pretty sure the large
majority of their work on Cyrus for FM has been contributed directly back,
they were instrumental (Bron's work in particular) in many of Cyrus' killer
features and its vastly improved and expanded feature set over the past
several significant releases.

If you're looking for a supremely talented and caring bunch of people to look
after your email I cannot recommend _anyone_ else. If you need proof, go read
the Cyrus mailing list archives where you can actually see how they work with
the community, improve the product and make excellent technical and tactical
decisions about how a large email service provider should be run.

~~~
616c
I was falling out of love with Gmail, and at that time I was reading their
site or a post regarding them on HN. Once I saw brong's Github branch as
mentioned, I was sold and got an account.

[https://github.com/brong/cyrus-imapd](https://github.com/brong/cyrus-imapd)

Their public communication, blog, old Slashdot interviews, everything else are
a gold standard to me. Bought a VPS to stand up my own backup Cyrus server.

By the way, Cyrus IMAP server has a beta CalDAV and CardDAV integration, where
your contacts and calendars are stored in special IMAP folders and then pulled
into the proper protocol. Through Fastmail I learned this might be the best
self-contained communication platform for me and others. Just a FYI.

[http://www.cyrusimap.org/mediawiki/index.php/Downloads#IMAP_...](http://www.cyrusimap.org/mediawiki/index.php/Downloads#IMAP_Server_with_integrated_calendaring_and_contacts)

~~~
drfritznunkie
Not that I thought being owned by Opera was a bad thing, but now they're
completely employee owned again:

[http://blog.fastmail.fm/2013/09/25/exciting-news-fastmail-
st...](http://blog.fastmail.fm/2013/09/25/exciting-news-fastmail-staff-
purchase-the-business-from-opera/)

So I have high hopes that they'll continue to improve all that core
functionality goodness.

The CardDAV and CalDAV stuff has been around for ages in Cyrus, but it's
always needed a lot of work. I'm happy that the FM team has decided to take it
on, even in beta it's been working wonderfully for me.

I remember attempting to use KOLAB groupware, which was based on Cyrus, but
instead of improving the crappy CalDAV functionality (at the time) of Cyrus,
they simply saved all the calendar objects as messages in special folders. Not
necessarily a bad thing, but then they made the unwise decision to store the
calendar data in a binary blob inside the message, meaning that they couldn't
leverage IMAP/Cyrus' native indexing. Which meant that every message in the
calendar folder had to be fetched, downloaded and parsed by the client for
every view. Ugh, what a disaster. It basically fell over on calendars with
more than a couple dozen events on them.

------
roeme
Points to take away from this:

    
    
      - Random corruption is a thing
      - Always make sure that your replication can't accidently screw you:
       - Replication doesn't replace a separate backup system
       - A version of your backup data must become immutable at some point
      - Make sure you monitor the right metrics of your system.
    

And most importantly:

Avoid unnecessary noise in your monitoring channel(s).

I keep preaching this; people think they can keep on top of a noisy log, but
the ugly truth is that your brain becomes numb and you will miss things. At
the very least, you begin to tune out since "it's not that important".

edit:fmt

~~~
dspillett
You missed on from the list: Always make sure a recent backup has been tested
in a meaningful way.

At least have an automated restore scheduled, and have it report to you any
error. Have that process run what-ever verification tools your DB supports to
as this will catch some corruption that won't be seen in a simple restore.

Better still (though this is very app dependent so difficult to create general
rules for) do some actual data verification. If you app keeps an audit trail,
keep a copy of the last backup around and compare anything that should not
have changed between them (as there are no relevant audit entries) still
identical.

All this serves to increase the confidence that your backup will save you
if/when disaster strikes.

------
nly
Admitting publicly that they've lost customer mail, when they don't really
have to, seems pretty honourable. The fact that their mail servers are
encrypted is also mildly interesting, although I don't see how it offers
anything in terms of real protection for customer privacy.

~~~
gommm
It could offer some protection in case the servers are seized which is
important since their servers are stored in the US.

~~~
nly
The authorities are unlikely to just seize the servers of such a notable
business without warning. Even if they were to do so though, Lavabit and
Levison have shown us that US mail providers can be compelled to cough up
keys... and the nature of full disk encryption means doing so will inevitably
enable access to bystanding client mail. It's understandable why FDE is used,
and it's still slightly better than nothing, but per-mailbox encryption would
allow for more selective disclosure in such extraordinary circumstances.

------
ballard
Ran an email hosting service and used zimbra. Email migration (through
acquiring failed shops that were tech clueless) and replication is nontrivial
because of edge cases. Backup, backup, backup and test a zillion times before
doing it for real. Rolling anything your own or using OSS as-is rarely scales.
Neither are commercial products.

------
otterpro
I would've recommended ZFS, as I found it more resistant to corruption than
any other file system that I've ever used. It's not certain where the problem
was --faulty cpu, faulty raid card, faulty hard disk, faulty memory -- but my
bet is that it was bad hard disk / file system.

------
Zaephyr
A writeup like this makes me want to use them more. So many companies are
afraid of their mistakes.

------
balladeer
All said and done, a week(s) long outage is just not acceptable, unless it is
(for you).

~~~
brongondwana
It was a couple of weeks in which some emails weren't accessible - the account
was still usable for new emails and emails before Feb 26th. There was
definitely no week-long outage - the outage itself was (robn can correct me
here, since I wasn't present for it) about half an hour.

~~~
robn_fastmail
Actually exactly half an hour between when I got woken up and when I sent my
report email.

------
m1
> "I left imap21 up, but with nothing talking to it or trusting it any more."

Poor imap21.

~~~
brongondwana
You won't believe what happened next!

~~~
brongondwana
(er, I mean - what happened next will shock you, and then move you, and then
leave you feeling numb to future clickbait headlines)

------
mercurial
Excellent writeup. Makes me wish I was working there :)

~~~
grinich
Are you in SF?

~~~
nmjenkins
Umm, we're actually in Melbourne, Australia (I work for FastMail).

~~~
mercurial
Thanks for the great blog post and keep up the good work.

