
Did Pixar accidentally delete Toy Story 2 during production? (2012) - chenster
https://www.quora.com/Pixar-company/Did-Pixar-accidentally-delete-Toy-Story-2-during-production/answer/Oren-Jacob
======
rootbear
I was at Pixar when this happened, but I didn't hear all of the gory details,
as I was in the Tools group, not Production. My memory of a conversation I had
with the main System Administrator as to why the backup was not complete was
that they were using a 32-bit version of tar and some of the filesystems being
backed up were larger than 2GB. The script doing the backup did not catch the
error. That may seems sloppy, but this sort of thing happens in the Real World
all the time. At the risk of spilling secrets, I'll tell a story about the
animation system, which I worked on (in the 1996-97 time frame).

The Pixar animation system at the time was written in K&R C and one of my
tasks was to migrate it to ANSI C. As I did that I learned that there were
aspects of this code that felt like a school assignment that had escaped from
the lab. While searching for a bug, I noticed that the write() call that saved
the animation data for a shot wasn't checked for errors. This seemed like a
bad idea, since at the time the animation workstations were SGI systems with
relatively small SCSI disks that could fill up easily. When this happened, the
animation system usually would crash and work would be lost. So, I added an
error check, and also code that would save the animation data to an NFS volume
if the write to the local disk failed. Finally, it printed a message assuring
the animator that her files were safe and it emailed a support address so they
could come help. The animators loved it! I had left Pixar by the time the big
crunch came in 1999 to remake TS2 in just 9 months, so I didn't see that
madness first hand. But I'd like to think that TS2 is just a little, tiny bit
prettier thanks to my emergency backup code that kept the animators and
lighting TDs from having to redo shots they barely had time to do right the
first time.

The point is that one would like to think that a place like Pixar is a model
of Software Engineering Excellence, but the truth is more complex. Under the
pressures of Production deadlines, sometime you just have to get it to work
and hope you can clean it up later. I see the same things at NASA, where, for
the most part, only Flight Software gets the full on Software Engineering
treatment.

~~~
hackuser
> The script doing the backup did not catch the error. That may seems sloppy,
> but this sort of thing happens in the Real World all the time.

I disagree. I mean, I agree those things happen, but the system
administrator's job is to anticipate those Real World risks and manage them
with tools like quality assurance, redundancy, plain old focus and effort, and
many others.

The fundamental of backups is to test restoring them, which would have caught
the problem described. It's so basic that it's a well-known rookie error and a
source of jokes like, 'My backup was perfect; it was the restore that failed.'
What is a backup that can't be restored?

Also, in managing those Real World risks, the system administrator has to
prioritize the value of data. The company's internal newsletter gets one level
of care, HR and payroll another. The company's most valuable asset and work
product, worth hundreds of millions of dollars? A personal mission, no
mistakes are permitted; check and recheck, hire someone from the outside,
create redundant systems, etc. It's also a failure of the CIO, who should have
been absolutely sure of the data's safety even if he/she had to personally
test the restore, and the CEO too.

~~~
zero_one_one
Your post is valid from a technical and idealistic standpoint, however when
you realize the size of the data sets turned over in the film / TV world in a
daily basis, restoring, hashing and verifying files during production
schedules is akin to painting the forth bridge - only the bridge has doubled
in size by the time you get half way through, and the river keeps rising...

There are lots of companies doing very well in this industry with targeted
data management solutions to help alleviate these problems (I'm not sure that
IT 'solutions' exist), however these backups aren't your typical database and
document dumps. In today's UHD/HDR space you are looking at potentially
petabytes of data for a single production - solely getting the data to tape
for archive is a full time job for many in the industry, let alone
administration of the systems themselves, which often need overhauling and
reconfiguring between projects.

Please don't take this as me trying to detract from your post in any way - I
agree with you on a great number of points, and we should all strive for
ideals in day to day operations as it makes all our respective industries
better. As a fairly crude analogy however, the tactician's view of the
battlefield is often very different to that of the man in the trenches, and
I've been on both sides of the coin. The film and TV space is incredibly
dynamic, both in terms of hardware and software evolution, to the point where
standardization is having a very hard time keeping up. It's this dynamism
which keeps me coming back to work every day, but also contributes quite
significantly to my rapidly receding hairline!

~~~
hackuser
> Your post is valid from a technical and idealistic standpoint

You seem to have direct experience in that particular industry, but I disagree
that I'm being "idealistic" (often used as a condescending pejorative by
people who want to lower standards). I'm managing the risk based on the value
of the asset, the risk to it, and the cost of protecting it. In this case,
given the extremely high value of the asset, the cost and difficulty of
verifying the backup appears worthwhile. The internal company newsletter in my
example above is not worth much cost.

> solely getting the data to tape for archive is a full time job for many in
> the industry, let alone administration of the systems themselves, which
> often need overhauling and reconfiguring between projects.

Why not hire more personnel? $100K/yr seems like cheap insurance for this
asset.

> restoring, hashing and verifying files during production schedules is akin
> to painting the forth bridge - only the bridge has doubled in size by the
> time you get half way through, and the river keeps rising...

> you are looking at potentially petabytes of data for a single production

I agree that not all situations allow you to perform a full restore as a test;
Amazon, for example, probably can't test a complete restore of all systems.
But I'm not talking about this level of safety for all systems; Amazon may
test its most valuable, core asset, and regardless there are other ways to
verify backups. In this case it seems like they could restore the data, based
on the little I know. If the verification is days behind live data or doesn't
test every backup, that's no reason to omit it; it still verifies the system,
provides feedback on bugs, and reduces the maximum dataloss to a shorter
period than infinity.

~~~
zero_one_one
> I disagree that I'm being "idealistic" (often used as a condescending
> pejorative by people who want to lower standards)

A poor word choice on my part. It was certainly not meant to come across that
way, so apologies there! Agreed that a cost vs risk analysis should be one of
the first items on anyone's list, especially given the perceived value of the
digital assets in this instance.

~~~
hackuser
No problem; I over-reacted a bit. Welcome to HN! We need more classy,
intelligent discussion like yours, so I hope you stick around.

------
woliveirajr
The biggest difference, I think, was leaving the hunting for a head for a
second moment, or even not doing it at all.

Commitment would be very different if people were being asked to help while
some heads were rolling. Because you're a real team when everybody is going in
the same direction. Any call on "people, work hard do recover while we're
after the moron who deleted everything" wouldn't have done it.

You just commit to something when you know that you won't be under the fire if
you do something wrong without knowing it.

~~~
lokedhs
I never understood the attitude of some companies to fire an employee
immediately if they make a mistake such as accidentally deleting some files.
If you keep this employee, then you can e pretty sure he'll never made that
mistake again. If you fire him and hire someone else, that person might not
have had the learning experience of completely screwing up a system.

I think that employees actually makes less mistakes and are more productive if
they don't have be worried about being fired for making a mistake.

~~~
libria
Surely you've heard of at least these arguments:

\- Employee was error prone and this mistake was just the biggest one to make
headlines. Could be from incompetence or apathy.

\- Impacted clients demanded the employee at-fault be terminated.

\- Deterrence: fire one guy, everyone else knows to take that issue seriously.
Doesn't Google do this? If you leak something to press, you're fired, then a
company email goes out "Hey we canned dude for running his mouth..."

It's better to engage the known and perhaps questionable justifications than
to "never understand".

~~~
morkalot
Case 1: It's fine to fire individuals for ongoing performance issues. (though
you must make clear to those who remain that the number and types issues the
individual already had, and the steps that had been taken to help the
individual rectify their performance issue.)

Case 2: no competent manager would fire an employee who made a mistake to
satisfy clients. They may move the employee to a role away from that client,
but it would be insanity to allow the most unreasonable clients to dictate who
gets fired. Any manager who does what you suggest should expect to have lost
all credibility in the eyes of their team.

Case 3a: A leak to the press is a purposeful action. Firing for cause is
perfectly reasonable. Making a mistake is not a purposeful action.

Case 3b: If you want to convey that a particular type of mistake is serious,
don't do so by firing people. Do so with investments in education, process,
and other tools that reduce the risk of the mistake occurring, and the harm
when the mistake occurs. Firing somebody will backfire badly, as many of your
best employees will self-select away from your most important projects, and
away from your company, as they won't want to be in a situation where years of
excellent performance can be erased with a single error.

~~~
cookiecaper
Firing for mistakes can make sense in the context of a small company that has
to pay enough to rectify the mistake that it significantly impacts the budget.
If this cost needs to be recouped, it is only fair that it be recouped from
the salary preserved by terminating the responsible party. We're not all
megacorps.

This is going to depend on the severity, cost, budget, importance of the role
filled, etc., but I think it's probably one of the only semi-plausible
justifications for firing based on things that do not reflect a serious and
ongoing competency or legal issue.

~~~
morkalot
That's nonsense.

A mistake is made, and a material loss has been incurred. This sucks. Been
there, done that, didn't get the t-shirt because we couldn't afford such a
luxury. I watched my annual bonus evaporate because of somebody else's cock-
up.

But there's no reason to believe that firing the mistake-maker is the best
move. Maybe the right move is to find money somewhere else (cutting a bunch of
discretionary, pushing some expenses into the future, reducing some
investment), or maybe it's to ask a few people to take pay cuts in return for
some deferred comp. Or maybe it's to lay-off somebody who didn't do anything
wrong, but who provides less marginal value to the company.

But it'd be one hell of a coincidence if, after an honest to god mistake, the
best next action was to fire the person who made the mistake. After all, if
they were in a position to screw your company that hard, they almost certainly
had a history of being talented, highly valued, and trustworthy. If they
weren't good, you wouldn't have put them in a place where failure is so
devastating.

~~~
cookiecaper
>But there's no reason to believe that firing the mistake-maker is the best
move.

Yeah, I'm not saying it _necessarily_ or even _probably_ is. I'm saying that
reality sometimes makes it so that we have to make these compromises.

------
smcl
"And then, some months later, Pixar rewrote the film from almost the ground
up, and we made ToyStory2 again"

So that effort to recreate it (not to mention produce it in the first place)
was pretty much all for naught? That must have been soul destroying

~~~
dijit
I disagree wholeheartedly, they had a rare chance to rebuild using their
acquired knowledge with none of the debt or cruft.

"We have to keep this scene even though it's not quite perfect because
otherwise it's a waste of money".

Maybe this is a bad example actually, movie industry is something you launch
and market and leave.

But the best architectures I've seen have been demolished, destroyed and
rebuilt from the ground up for their purpose.

Same with code.

~~~
smcl
Right, but I'm thinking from the perspective of someone who's been working on
something for ages, gone through the stress of nearly losing it, then
miraculously recovering it ... only to have found that a lot of their work was
ditched. You're right that it probably ended up better, and sister comments
are right in that it wasn't ALL for naught ... but can you imagine the moment
you found out it was being reworked?

~~~
huxley
No need to imagine. It's not just during disasters like the Pixar case.
Creative collaborative ventures like films and animation are filled with
months of effort being deleted with a few quick keystrokes.

Back when I was still in the film/video industry, it happened often, you kinda
get accustomed to the ephemeral nature and you try not to get too attached to
your work. Not always successfully but you try.

------
KaiserPro
Heh, I'm sure it did.

I worked at a few VFX studios, and everyone has deleted large swathes of shit
by accident.

My favourite was when an rsync script went rogue and started deleted the
entire /job directory in reverse alphabet order. Mama-mia[1] was totally wiped
out, as was Wanted (that was missing some backups, so some poor fucker had to
go round fishing assets out of /tmp, from around 2000 machines.)

From what I recall (this was ~2008) There was some confusion as to what was
doing the deleting. Because we had at the time a large(100 or 300tb[2]) lustre
file system, it didn't really give you that many clues. They had to wait till
it went on a plain old NFS box before they could figure out what was causing
it.

Another time honoured classic is matte painters on OSX boxes accidentally
dragging whole film directories into random places.

[1]some films have code names, hence why this was first

[2]That lustre was big, physically and IO, it could sustain something like 2-5
gigabytes a second, It had at least 12 racks of disks. Now a 4u disk shelf and
one server can do ~2gigabyes sustained

~~~
mprovost
We lost a good chunk of Tintin (I think) when someone tried to use the OSX
migration assistant to upgrade a Macbook that had some production volumes NFS
mounted. It was trying in vain to copy several PB of data (I am convinced that
nobody at Apple has ever seen or heard of NFS), and because it was so slow the
user hit cancelled and it somehow tried to undo the copy and started deleting
all the files on the NFS servers.

There was another incident where there was a script that ran out of cron to
cleanup old files in /tmp, and someone NFS mounted a production volume into
/tmp...

Eventually we put tarpit directories at the top of each filesystem (a
directory with 1000 subdirectories each with 1000 subdirectories, several
layers deep) to try and catch deletes like the one you saw, then we would
alert if any directories in there were deleted so we could frantically try and
figure out which workstation was doing the damage.

~~~
cookiecaper
I had a client with a Linux server who wanted to automount the share on their
OS X workstations. I cannot believe the hoops I had to jump through to make
something as simple as NFS work. Every iteration of OS X seems to make
traditional *nix utilities less and less compatible and remove valuable tools
for no reason other than obstinance.

------
dvdhnt
From the next web write up -

> The command that had been run was most likely ‘rm -r -f *’, which—roughly
> speaking—commands the system to begin removing every file below the current
> directory. This is commonly used to clear out a subset of unwanted files.
> Unfortunately, someone on the system had run the command at the root level
> of the Toy Story 2 project and the system was recursively tracking down
> through the file structure and deleting its way out like a worm eating its
> way out from the core of an apple.

~~~
wmil
It's more likely they wanted to remove all hidden directories from their home
directory, and ran `rm -rf .*`.

Under some shells (eg bash) that will expand to include `..` and `.`.

~~~
aptwebapps
I have totally never done that. If I had done that, I might have been saved by
a lack of permissions. Like if I was on a mounted external drive, so not in my
home directory and it didn't get too far.

Edit: What would have been much more worse, if I had done it, would have been

    
    
        rm -rf ~ /foo
    

instead of

    
    
        rm -rf ~/foo

~~~
majewsky
Such a bug was in an Nvidia driver install script a few years ago:

    
    
      rm -rf /usr /lib/modules/something/something

~~~
czinck
Bumblebee specifically, [https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commi...](https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac) is the fix. Pretty
sure Steam had the same issue early in their linux release, probably lots of
other programs too.

~~~
cookiecaper
Yes, I specifically remember a lot of hub bub about Steam deleting entire home
directories if a path was input incorrectly. Glad that never bit me.

------
leecarraher
rm just unlinks the files at the inode level, seems like a disk forensics
utility like the imagerec suite could have restored alot of the 'lost' data.
In fact i've done it on source code after learning that the default behavior
of untar was to overwrite all of your current directory structure. since it
was text i didnt need anything fancy like imagerec, instead i just piped the
raw disk to grep, and looked for parts i knew were in files i needed, then
output them and the surrounding data to an external hard drive.

~~~
colanderman
Back then yes. These days with SSDs, the OS will issue trim commands to the
disk, zeroing the blocks from the OS point of view, and on SSDs with "secure
delete", from a forensics point of view as well.

~~~
jnwatson
Perhaps I'm misunderstanding you, but trim doesn't secure delete anything. It
merely indicates to the SSD that the sector is unused so that it can reap the
underlying NAND location the next time it decides to garbage collect.

In other words, the data is still there in the flash, but only the SSD
firmware (and physical access) has access to it.

~~~
throwaway7767
The reason for marking it as erased is so the firmware can physically erase
the flash sector. It could happen immediately, a minute later or next week.
But a well-written SSD firmware will try to erase blocks any time it's not
busy doing something else, as erasing blocks takes _way_ longer than writing
them.

You might be able to recover something from the physical flash, but there's
definitely no guarantees.

~~~
K0SM0S
> erasing blocks takes way longer than writing them

Honest question, could you elaborate on that? Intuitively I would've thought
writing and erasing are _the same_ from a physical standpoint, insofar as
"erasing" means writing zeros.

~~~
vidarh
For flash, erase resets an erase unit to its default state, which can be all 0
or all 1 depending on the technology. Writes changes bits from the default to
the opposite only. Depending on the flash chips interface, you may be able to
do this at the bit, byte, or block level, but changing in the opposite
direction is expensive and time consuming.

In theory we could make flash with tiny erase units (down to the bit level),
but in practice we don't because the extra circuitry would drive the price
through the roof.

~~~
K0SM0S
That's interesting, if nanotech (the part that's about assembling 'things' at
a molecular level, hyper-grained 3D printing so to speak) really enters the
economic breakthrough we've been expecting since the early 80's, I see one
huge improvement for flash right here.

------
hybridtupel
The video link is broken. Here is (what seems to be) the same video:
[https://www.youtube.com/watch?v=8dhp_20j0Ys](https://www.youtube.com/watch?v=8dhp_20j0Ys)

------
mentos
John Cleese's talk on Creativity recently made it to the front page of HN
again
[https://www.youtube.com/watch?v=9EMj_CFPHYc](https://www.youtube.com/watch?v=9EMj_CFPHYc)
and if you haven't watched it I highly recommend it.

I believe it was in this talk that he says the best work he ever did was when
he scrapped and started over. Which from practice I think we can all admit
that while its the hardest to do, it is always for the best.

~~~
bluejekyll
> ... it is always for the best.

Not necessarily. People often underestimate (in engineering fields) how much
work it will take to rebuild something. In software there is a high degree of
creativity which can have large downstream effects. You need to architect your
system in such a way to make it possible to replace components when needed,
this is where strong separation of concerns is important.

One thing that I've seen happen time and again is an organization bifurcating
itself, so that there is one team working on the new cool replacement, and the
other working on the old dead thing that everyone hates. Needless to say this
creates anymosity and serverely limits an organizations ability to respond to
customer demands.

Starting over should be taken very seriously.

------
abe_duarte
This event is explained more in depth in the book Creativity Inc by Ed
Catmull. It's a pretty good story.

~~~
creaghpatr
love that book

------
jrochkind1
Oh man, this is the best punch line:

> And then, some months later, Pixar rewrote the film from almost the ground
> up, and we made ToyStory2 again. That rewritten film was the one you saw in
> theatres and that you can watch now on BluRay.

At first I was feeling how it would feel to lose all that work, so
frustrating! But then even if you hadn't, it turns out management was gonna
throw it all away anyway!

------
yawz
Funny that this comes up a few weeks after I finished Ed Catmull's "Creativity
Inc." If you want a little more detail about this (and other Pixar related
things and Steve Jobs) read the book. It is a really good one.

------
cm2187
A good way to fuckup on windows/C#: I Was iterating through network folders to
delete (which all start with "\\\servername"), except that I had a bug and
instead was iterating through the characters of the first network folder path.
And that's how I discovered that in windows, "\" means root of the current
active drive. And that's also why I value my automatic backup to a NAS twice a
day.

------
colomon
This brings up a great practical question. What's the state of the art of this
sort of thing for more modest but still modern data storage requirements?

Context: For the last five years, my backup system has been to have Time
Machine do hourly backups on my MBP (main development machine, just shy of 1TB
data), with key spots on my Linux server (3TB data at the moment) backed up
daily to my in-laws' house using cron and rsync, and spot directories on the
MBP backed up there as well.

But the hard drive on the Time Capsule I've used seems to have gotten
unreliable, and the external USB drive I bought to replace it has not worked
reliably for more than a day or two at a time. And even when it was all
working properly, I was never really verifying my backups.

Do people have suggestions for secure, reliable, verifiable, easy backup
systems capable of handling 4+ TB of data? I don't mind if it takes work or
money to set it up; the important thing is once it's working I can mostly
forget about it.

~~~
rossng
For an offsite backup, Backblaze is fantastic. Unlimited storage for $5/month
and the client works perfectly. It's not highly-redundant or anything, so use
it in addition to a local backup.

CrashPlan is the next-best option if you need Linux support, but the client
isn't as good.

~~~
buzzybee
Seconding Backblaze for set-and-forget. It's a huge confidence boost for data
that is a little bit too large and low-value to handle with more care.

~~~
colomon
I'm trying Backblaze now, hoping my first backup will be done in a week or
so...

------
eva1984
Good luck. With cloud/distributed computing, I am not sure they can even do
that now!

------
franze
here is a good writeup [http://thenextweb.com/media/2012/05/21/how-pixars-toy-
story-...](http://thenextweb.com/media/2012/05/21/how-pixars-toy-story-2-was-
deleted-twice-once-by-technology-and-again-for-its-own-good/?amp=1)

------
1denis
video link in the article is dead

