
Ma.gnolia Data is Gone For Good - Anon84
http://www.datacenterknowledge.com/archives/2009/02/19/magnolia-data-is-gone-for-good/
======
look_lookatme
"It turns out that Ma.gnolia was pretty much a one-man operation, running on
two Mac OS X servers and four Mac minis."

This should be a lesson: entrust your systems administration to people that
have experience in this stuff. I'm not saying you have to find an expert and
pay him or her expert prices, instead, you probably have a developer or admin
friend who knows what it takes to implement a basic backup strategy that will
_at the least_ keep you from losing _all_ of your data.

I watched a little bit of the podcast and he said he was just syncing the db
files... well if it's innodb, that's not going to work. It says so in the
MySQL docs. You know, in the section about backups.

The lesson here could be this: if you have a great idea, get it developed and
out and the door -- awesome. Now talk to someone with experience in systems
planning. Don't just throw a bunch of mac minis in a cabinet, bust out an
rsync script and call it a day.

~~~
KirinDave
I don't want to throw rocks, because I used to work for Larry (in fact, I
wrote the original codebase for Ma.gnolia.com) and I like him.

But I distinctly remember talking about backups and how a raw sync of the
innodb files wasn't really what we needed (it could sort of work in the
original deploy of ma.gnolia, but would not scale as the site scaled). I'm
kind of surprised that the community has been so tolerant of this failure.
Personally, I'd be furious if I found out that a site I trusted lost all my
data because they _hadn't even tested_ their backup system.

~~~
fendale
> I'd be furious if I found out that a site I trusted lost all my data because
> they hadn't even tested their backup system

Exactly - first rule of backups - test them. For a database, it can be as
simple as using your nightly backup to create your dev or test database or
whatever. I guess it isn't feasible with a half terrabyte database every day,
but to not have at least a several week old backup set is somewhat
unforgivable.

On the other hand, I bet he will never ever make the same mistake again, and
it will make a lot of people on here think long and hard about their own
backup strategies ...

~~~
whatusername
I've seen that kinda setup done nicely..

Daily automated DB Backup. Daily automated DB restore to a seperate server.
(Which also can kinda double as a hot spare as required.)

------
nikblack
Here is a tip - don't hire consultants to talk to you about blog marketing and
microformats etc. until you have the basics like backup and support in order.

Magnolia hired such consultants was engaged with them for a while. I know
because I am an advisor to a company who hired the same consultants - and I
made the same argument to them, ie. there are more basic things that need to
be completed before good money is spent on people who love to talk about
Blogs, FOAF, OAuth, OpenID, Syndication and online marketing etc.

With the money Magnolia would have saved from these consultants, they could
have really had their site in order. I also don't know how you can have so
many employees, advisors and 2-3 consultants on tap and nobody mentions
backup.

So other startup founders, forget about all that fancy 2.0 stuff - get the
essentials in order first.

------
zandorg
I have a wonderful data backup story.

I was backing up one nearly-full 1TB Seagate Freeagent to another one, and
then after 180GB of data transferred, I knocked one of them over by mistake,
it fell 15 cm to the floor and died!

I sent the broken Freeagent to Seagate data recovery, but unsure if it would
be recoverable, I had this amazing brainwave.

Before I backed up one Freeagent to another, I quick-formatted (note: quick,
not full). When one of the drives broke, I had a brainwave: Why not undelete
the remaining data?

So I found a utility on the web to unformat and undelete, and the only data
left from the copying was off a spare 3.5" drive, and I got back all but 2
weeks of data.

I was lucky. Now I have _3_ Freeagents to serve my backup needs.

However, all my core data is now on some 2.5" drives, which are more reliable
than a 3.5% Freeagent.

~~~
statictype
The lesson as always: Keep your Freeagent drive on the floor.

I've knocked mine over more than a few times, but since I've got it sitting on
the floor, there's been no issues.

------
Anon84
datacenterknowledge.com seems to be having some issues right now (the HN
effect? ;) Here is the gist of the post:

    
    
         The social bookmarking service Ma.gnolia reports that 
         all of its user data was irretrievably lost in the 
         Jan. 30 database crash that knocked the service 
         offline. That means that users who were unable to 
         recover their bookmarks through publicly available 
         tools (including other social media sites and the 
         Google cache) have lost all their data.
    
    
    
         Ma.gnolia founder Larry Halff said last week that the 
         service’s MySQL database included nearly half a 
         terabyte of data. Yesterday Halff informed users that 
         a specialist had been unable to recover any data from 
         the corrupted hard drive. ”Unfortunately, database 
         file recovery has been unsuccessful and I won’t be 
         able to recover members’ bookmarks from the Ma.gnolia 
         database,” he wrote.

~~~
1SockChuck
Data Center Knowledge had another story (Exploding Servers) on Slashdot at the
same time it was getting the HN traffic, so a double-whammy.

~~~
patio11
Meaning no disrespect to HN, but given that we seem to send about 4k visitors
a day in my recent experience, getting linked by Slashdot and HN on the same
day is a single-whammy.

------
arthurk
The website is down (at least for me). But there's also some information on
the official ma.gnolia site: <http://ma.gnolia.com/>

------
woadwarrior01
I seriously didn't expect this from magnolia.

How hard/expensive is it to get an automated daily backup of the DB and sync
it up to something like Amazon S3 ? I run a similar pet project ( pardon the
shameless plug :) <http://tagz.in> ), albeit with only a dozen people actively
using it. I maintain a 30 day rolling backup of daily db dumps, paying next to
nothing for the backups. I've had a number of people asking me how can they
trust me if I decide to stop the whole thing, and a part of the plan was to
allow people to download their bookmarks in Netscape bookmarks format for
atleast 30 days from the day I announce just in case I decide to take it down
/ run out of money. Not like this is ever gonna happen (To be honest i
probably can run it for the next 6 months, even if I lose my job this very
day.), But I feel contingencies like these need to be thought about, well in
advance when we're dealing with user's data (bookmarks in this case).

PS: I use Postgresql and have a script on my laptop which syncs my local db
with the latest db snapshot every week, which isn't really a good idea, but
works for me, given the tiny scale that I'm working on.

~~~
bonaldi
Hardish. Expensive when you're talking about terabytes.

Regardless of that, though, ma.gnolia _had_ backups; it's just they were
untested and were dutifully backing up the corruptions that were being
introduced on the software level.

It's that bit I'd like to know more about. Were they faithfully following the
Rails Way and expecting the model to handle validations, instead of setting up
the database to do double validation? How did these errors creep in, and how
did they grow to be so catastrophic?

~~~
gensym
This corruption doesn't have anything to do with the "Rails Way". The
corruption was at a much lower level then things like nil foreign keys, long
fields, etc. that Rails validates. It was the filesystem that was corrupted,
and whether he did database validation in ActiveRecord, using database
validations, or both would not have made a difference, as far as I can tell.

Think about it for a second - if the corruption that caused this was things
that could be validated using Ruby or in SQL declarations, it would to more
than feasible to import that data into _some_ format form which it could be
recovered.

------
tokenadult
Didn't Reddit lose user account data at least once after start-up (as I seem
to recall from personal experience)?

I know that MSN lost a lot of user account data soon after its start-up, way
back in the 1990s. Preserving data for an online community is not easy, even
for richly funded market entrants.

~~~
trickjarrett
But in every other major case I've heard of, it has been properly backed up.

~~~
tokenadult
I was a charter subscriber to MSN, and I never got my email address back. So I
stopped using MSN.

~~~
trickjarrett
Oh! I hadn't known.

------
okeumeni
I just feel so sorry for the Magnolia guy; what a waste of hard work!

He must be feeling like shit right now; I know I would.

------
antidaily
Was Magnolia profitable? Did it run ads?

~~~
Elepsis
No, and only occasionally. For the approximately 2 years that I used ma.gnolia
it ran ads for less than half of the time.

------
darkhorse
this makes me wonder about the backup systems of sites like tinyURL.

imagine all those shortened URLs everywhere becoming useless!

~~~
palish
A terrible tragedy of epic proportions, to be sure.

~~~
zach
As if millions of tweets were suddenly silenced.

------
geuis
Was this a fried harddrive problem? If so, maybe Spinrite would help?

