
Accidentally destroyed production database on first day of a job - whistlerbrk
https://np.reddit.com/r/cscareerquestions/comments/6ez8ag/accidentally_destroyed_production_database_on/
======
Rezo
Sorry, but if a junior dev can blow away your prod database by running a
script on his _local_ dev environment while following your documentation, you
have no one to blame but yourself. Why is your prod database even reachable
from his local env? What does the rest of your security look like? Swiss
cheese I bet.

The CTO further demonstrates his ineptitude by firing the junior dev.
Apparently he never heard the famous IBM story, and will surely live to repeat
his mistakes:

 _After an employee made a mistake that cost the company $10 million, he
walked into the office of Tom Watson, the C.E.O., expecting to get fired.
“Fire you?” Mr. Watson asked. “I just spent $10 million educating you.”_

~~~
Rezo
Here's some simple practical tips you can use to prevent this and other Oh
Shit Moments(tm):

\- Unless you have full time DBAs, do use a managed db like RDS, so you don't
have to worry about whether you've setup the backups correctly. Saving a few
bucks here is incredibly shortsighted, your database is probably the most
valuable asset you have. RDS allows point-in-time restore of your DB instance
to any second during your retention period, up to the last five minutes. That
will make you sleep better at night.

\- Separate your prod and dev AWS accounts entirely. It doesn't cost you
anything (in fact, you get 2x the AWS free tier benefit, score!), and it's
also a big help in monitoring your cloud spend later on. Everyone, including
the junior dev, should have full access to the dev environment. Fewer people
should have prod access (everything devs may need for day-to-day work like
logs should be streamed to some other accessible system, like Splunk or
Loggly). Assuming a prod context should always require an additional step for
those with access, and the separate AWS account provides that bit of friction.

\- The prod RDS security group should only allow traffic from white listed
security groups also in the prod environment. For those really requiring a
connection to the prod DB, it is therefore always a two-step process: local ->
prod host -> prod db. But carefully consider why are you even doing this in
the first place? If you find yourself doing this often, perhaps you need more
internal tooling (like an admin interface, again behind a whitelisting SG).

\- Use a discovery service for the prod resources. One of the simplest methods
is just to setup a Route 53 Private Hosted Zone in the prod account, which
takes about a minute. Create an alias entry like "db.prod.private" pointing to
the RDS and use that in all configurations. Except for the Route 53 record,
the actual address for your DB should not appear anywhere. Even if everything
else goes sideways, you've assumed a prod context locally by mistake and you
run some tool that is pointed to the prod config, the address doesn't resolve
in a local context.

~~~
daxfohl
Would you recommend all these steps even for a single-person freelance job? Or
is it overkill?

~~~
_jal
Depends. Do you make mistakes?

I absolutely do. "Wrong terminal", "Wrong database", etc. mistakes are very
easy to make in certain contexts.

The trick is to find circuit-breakers that work for you. Some of the above is
probably overkill for one-person shops. You want some sort of safeguard at the
same points, but not necessarily the same type.

This doesn't really do it for me, but one person I know uses iTerm configured
to change terminal colors depending on machine, EUID, etc. as a way of
avoiding mistakes. That works for him. I do tend to place heavier-weight
restrictions, because they usually overlap with security and I'm a bit
paranoid by nature and prefer explicit rules for these things to looser
setups. Also, I don't use RDS.

I'd recommend looking at what sort of mistakes you've made in the past and how
to adjust your workflow to add circuit breakers where needed. Then, if you
need to, supplement that.

Except for the advice about backups and PITR. Do that. Also, if you're not,
use version control for non-DB assets and config!

~~~
sixothree
For windows servers I use a different colored background for more important
servers.

~~~
revmoo
I do this with bash prompt colors on all our servers. Prod is always red.

------
knodi123
I was on a production DB once, and ran SHOW FULL PROCESSLIST, and saw "delete
from events" had been running for 4 seconds. I killed the query, and set up
that processlist command to run ever 2 seconds. Sure enough, the delete kept
reappearing shortly after I killed it. I wasn't on a laptop, but I knew the
culprit was somewhere on my floor of the building, so I grabbed our HR woman
who was walking by and told her to watch the query window, and if she saw
delete, I showed her how to kill the process. Then I ran out and searched
office to office until I found the culprit -

Our CTO thought he was on his local dev box, and was frustrated that
"something" was keeping him from clearing out his testing DB.

Did I get a medal for that? No. Nobody wanted to talk about it ever again.

~~~
S4M
Actually, the CTO should have mailed the dev team saying:

    
    
        Hi,
    
        Yesterday, I thought I was on my local machine and clear the database, while I was in fact on the production server.
        Luckily knodi123 caught it and killed the delete process. This is a reminder that *anybody* can make mistakes, 
        so I want to set up some process to make sure this can't happen, but meanwhile I would like to thank knodil123.
    
       Best,
    
       CTO

------
sethammons
My comment I left there:

Lots of folks here are saying they should have fired the CTO or the DBA or the
person who wrote the doc instead of the new dev. Let me offer a counter point.
Not that it will happen here ;)

They should have run a post mortem. The idea behind it should be to understand
the processes that led to a situation where this incident could happen. Gather
stories, understand how things came to be.

With this information, folks can then address the issues. Maybe it shows that
there is a serially incompetent individual who needs to be let go. Or maybe it
shows a house of cards with each card placement making sense at the time and
it is time for new, better processes and an audit of other systems.

The point being is that this is a massive learning opportunity for all those
involved. The dev should not have been fired. The CTO should not have lost his
shit. The DB should have regularly tested back ups. Permissions and access
needs to be updated. Docs should be updated to not have sensitive information.
The dev does need to contact the company to arrange surrender of the laptop.
The dev should document everything just in case. The dev should have a beer
with friends and relax for the weekend and get back on the job hunt next week.
Later, laugh and tell of the time you destroyed prod on your first day (and
what you learned from it).

~~~
justicezyx
The firing order, in theoretical order for preventing future problems:

1\. CTO As the one in charge of the tech, allows loss of critical data. If
anyone should be fired, it's the cto. And firing this guy apparently will have
the greatest positive impact to the company. Assuming they can hire a better
one. I think given how stupid this cto is, that should be straightforward.

2\. The executives who hired the cto. People hire people similar to
themselves, it seems the executives team are clueless about what kind of
skills a cto should have. These people will continue fail the dev team by
hiring incompetent people, or force them to work in a way that causes problem.

3\. Senior devs in the team. Obviously these people did not test what they
wrote. If anyone had ever dryrun the training doc, they should prevent the
catastrophe. It's a must do in today's dev environment. The normal standard is
to write automatic tests for every thing though.

This junior dev is the only one who should not be fired...

~~~
developer2
I'm amazed at how quickly everyone is trying to allocate blame, as if there
_must_ be someone upon whom to heap it all on. Commenters on both Reddit and
HN are being high and mighty, offering wisdom that _they_ would never have
allowed this to take place, while eager to point fingers. I bet far more than
half of these commenters have at one time or another worked for at least one
company that had this kind of setup, and didn't immediately refuse to work on
other tasks until the setup was patched. Hypocrites.

The fact is, this kind of scenario is extremely common. Most companies I have
worked for have the production database accessible from the office. It's a
very obvious "no no", but it's typical to see this at small to medium sized
companies. The focus is on rushing every new feature under the sun, and
infrastructure is only looked at if something blows up.

 _Nobody_ should have been fired. Not the developer, not the senior devs, not
the sysadmins, and not the CTO. This should have been nothing more than a
wake-up call to improve the infrastructure. That's it. The only blame here
lies with the CTO - not for the event having taken place, but only because
their immediate reaction was to throw the developer under the bus. A decent
CTO would have simply said "oh shit guys, this can't happen again. please fix
it". If the other executives can't understand that sometimes shit happens, and
that a hammer doesn't need to be dropped on _anyone_ , then they're not
qualified to be running a business.

~~~
justicezyx
Well you need to consider the cto's reaction.

His reaction shows that he is the no1 to fire, and has a good reason.

What you said is true, but does not matter. The cto already show that he is
clueless...

------
xoa
> _" i instead for whatever reason used the values the document had."_

>They put full access plaintext credentials for their production database in
their _tutorial documentation_

WHAT THE HELL. _Wow_. I'd be shocked at that sort of thing being written out
in a non-secure setting, like, anywhere, at all, never mind in freaking
documentation. Making sure examples in documentation are never real and will
hard fail if anyone tries to use them directly is not some new idea, heck
there's an entire IETF RFC (#2606) devoted to reserving TLDs specifically for
testing and example usage. Just mind blowing, and yeah there are plenty of
WTFs there that have already been commented on in terms of backups, general
authentication, etc. But even above all that, if those credentials had full
access then "merely" having their entire db deleted might even have been a
good case scenario vs having the entire thing stolen which seems quite likely
if their auth is nothing more then a name/pass and they're letting credentials
float around like that.

It's a real bummer this guy had such an utterly awful first day on a first
job, particularly since he said he engaged in a huge move and sunk quite a bit
of personal capital from the sound of it in taking that job. At the same time
that sounds like a pretty shocking place to work and it might have taught a
ton of bad habits. I don't think it's salvageable but I'm not even sure he
should try, they likely had every right to fire him but threatening him at all
with "legal" for that is very unprofessional and dickish. I hope he'll be able
to bounce back and actually end up in a much better position a decade down the
line, having some unusually strong caution and extra care baked into him at a
very junior level.

~~~
ncantelmo
There's also a high chance that document was shared on Slack. In which case,
they were one Slack breach away from the entire world having write access to
their prod database.

It's depressing how many companies blindly throw unencrypted credentials
around like this.

~~~
kefka
Tell me about it. Fortunately where I work is sane and reasonable about it.

We have a password sheet. You have to be on the VPN(login/password). Then you
can log in. Login/Password(diff from above)/2nd password+OTP. Then a password
sheet password.

I'm still rooting out passwords from our repo with goobers putting creds in
sourcecode (yeah, not config files....grrrrr). But I attack them as I find
them. Ive only found 1 root password for a DB in there... and thankfully
changed!

~~~
posixplz
A plaintext password sheet? Despite the layers of network access control, this
is a horribly bad practice in our modern age. Vault is free and encrypted
secret storage systems are hardly a new concept.

~~~
kefka
Not at all. The password sheet password is actually a GPG key. Everything
stored encrypted.

We suffer from NIH greatly. We end rolling our own stuff because either we
don't trust 3rd party stuff, or it doesn't work in our infrastructure. In this
case, multiple access locks with GPG is sufficient.

~~~
posixplz
A response 8 days later is better than no response at all, right? :)

I agree that a multi-recipient GPG protected file is sufficient for a small
org. In fact, that's how I used to do it Circa 2011. We found it worked quite
well - we committed the GPG protected files to a version control system (git)
and used githooks to make sure that only encrypted files were permitted,
preventing users from intentionally/accidentally defeating gitignore.

------
plesiv
Plot twist: CTO or senior staff needed to cover something up (maybe a previous
loss of critical business data) and arranged for this travesty to likely
happen permitted sufficient number of junior devs went through "local db setup
guide" mockery of a doc.

Either that or this is a "Worst fuckup on the first day on job" fantasy piece
- I refuse to acknowledge living in the world where alternatives have any
meaningful non-zero probability of occurring.

~~~
perlgeek
There are no upper bounds on incompetence. I've seen enough WTFs even in
companies that didn't seem particularly dysfunctional, and that had some very
competent people.

And then it takes only one shitty manager, or manager in a bad mood, to get
the innocent junior dev fired.

------
matwood
People will screw up, so you have to do simple things to make screwing up
hard. The production credentials should never have been in the document.
Letting a junior have prod level access is not that far out of the normal in a
small startup environment, but don't make them part of the setup guide. Sounds
like they also have backup issues, which points to overall poor devops
knowledge.

Not part of this story, but another pet peeve of mine is when I see scripts
checking strings like "if env = 'test' else <runs against prod>". This sets up
another WTF situation if someone typos 'test' now the script hits prod.

~~~
samstave
Heh, or take a Netflix Chaos Monkey approach and have a new employee attempt
to take down the whole system on their first day and fire any engineers who
built whatever the new employee is actually able to break!

~~~
karlkatzke
Why fire them? It's valuable experience that you are paying a lot for them to
gain. Better: hold a postmortem, figure out what broke, and make the people
who screwed it up originally fix it. Keep people who screw things up, as long
as they also fix it.

~~~
samstave
I wasnt serious about "firing" \-- but was just maintaining the spirit of what
happened to the OP on reddit...

but yeah - I agree with you...

------
quizotic
Yeah, another case of "blame the person" instead of "blame the lack of
systems". A while back, there was a thread here on how Amazon handled their s3
outage, caused by a devops typo. They didn't blame the DevOp guy, and instead
beefed up their tooling.

I wonder whether that single difference - blame the person vs fix the
system/tools predicts the failure or success of an enterprise?

~~~
throw20170603
The Amazon DevOp guy was fired for that mistake, just FYI.

~~~
ummonk
Have any proof of this?

~~~
tinix
Why do you all act like one party needs proof, but only because they are
refuting what was said originally, also, without proof.

Neither is providing proof, so asking one party for proof and not the other is
obviously absurd.

~~~
benchaney
In the absence of proof one way or the other, people believe what seems more
reasonable to them. Not particularly absurd at all.

------
ajarmst
Assuming the details are correct, this should be considered a win by the
junior dev. It only took a day to realize that this is a company he really,
really doesn't want to try to learn his profession at.

~~~
ajarmst
He should get that laptop back to them IMMEDIATELY. These sound like exactly
the sort of douches would try to charge him with theft. (Edit: Why is it not
surprising they don't have a protocol in place for managing dismissing staff
and, like, getting their stuff back?)

~~~
watwut
Well, the customers database with important data just got nuked, so even if
there is protocol, people who would normally do the steps have different
things in mind. Laptop and such is least of their concerns.

------
danmaz74
> The CTO told me to leave and never come back. _He also informed me that
> apparently legal would need to get involved due to severity of the data
> loss_.

I don't know if I should laugh or cry here.

------
markbnj
Guaranteed the CTO is busily rewriting the developer quide and excising all
production DB credentials from the docs so that he can pretend they were never
there. While the new guy's mistake was unfortunate in a very small way, the
errors made by the CTO and his team were unfortunate in a very big way. The
vague threat of legal action is laughable, and the reaction of firing the
junior dev who stumbled into their swamp of incompetency on his first day
speaks volumes about the quality or the organization and the people who run
it. My advice... learn something from the mistake, but otherwise walk away
from that joint and never look back. It was a lucky thing that you found out
what a mess they are on day 1.

------
spudlyo
Several years back I worked as a DBA at a managed database services company,
and something very similar happened to one of our customers who ran a fairly
successful startup. When we first onboarded them I _strongly_ recommended that
the first thing we do is get their DB backups happening on a fixed schedule,
rather than an ad-hoc basis, as their last backup was several months old. The
CEO shuts me down, and instead insists that we focus on finding a subtle bug
(you can't nest transactions in MySQL) in one of their massive stored
procedures.

It turns out their production and QA database instances shared the same
credentials, and one day somebody pointed a script that initializes the QA
instances (truncate all tables, insert some dummy data) at the production
master. Those TRUNCATE TABLE statements replicated to all their DB replicas,
and within a few minutes their entire production DB cluster was completely
hosed.

Their data thankfully still existed inside the InnoDB files on disk, but all
the organizational metadata data was gone. I spent a week of 12 hour days
working with folks from Percona to recover the data from the ibdata files. The
old backup was of no use to us since it was several months old, but it was
helpful in that it provided us a mapping of the old table names to their
InnoDB tablespace ids, a mapping destroyed by the TRUNCATE TABLE statements.

------
nstj
No disrespect to the OP but this sounds pretty fake. If the database in
question was important enough to fire _someone immediately_ over then there
wouldn't have been the creds floating around on an onboarding pdf. And
involving legal? Has anyone here heard of anything similar? I'm just 1
datapoint but I know I haven't.

~~~
jupiter90000
Yeah, I thought it sounded fake as well. I mean things like this happen, but
something about the story just doesn't ring true to me.

------
gcb0
plot twist: dev will learn Monday this is a initialization joke and the whole
company is laughing of all the threads he or she starts here and on reddit.

~~~
coldtea
plot twist: the dev attempts to off himself, and the company stops "laughing
off".

~~~
user5994461
plot twist: the dev is restored from backups

~~~
savageco
plot twist: but the backups are 6 months out of date, so the dev has no
recollection of the incident

~~~
nstj
Plot twist: the developer was working at Reddit

------
cbanek
It's not the CTO's fault. It's the document's fault! We should never have
documentation again, this is what it has done to us! We need to revert to
tribal knowledge to protect ourselves. If we didn't document these values,
people wouldn't be pasting them in places they shouldn't be!

/s

------
femto113
For some years now I've stopped bothering with database passwords. If
technically required I just make them the same as the username (or the
database name, or all three the same if possible). Why? Because the security
offered by such passwords is invariably a fiction in practice, I've never seen
an org where they couldn't be dug out of docs or a wiki or test code. Instead
database access should be enforced by network architecture: the production
database can only be accessed by the production applications, running in the
production LAN/VPC. With this setup no amount of accidental (or malicious)
commands run by anyone from their local machine (or any other non production
environment) could possibly damage the production data.

------
daxfohl
Side question, as a dev with zero previous ops experience, now the solo techie
for a small company and learning ops on the fly, we're obviously in the
situation where "all devs have direct, easy access to prod", since I'm the
only dev. What steps should I take before bringing on a junior dev?

~~~
kefka
Do as best as you can to "find compute room" (laptop, desktop, spare servers
on rack that arent being used, .. cloud) , and make a Stage.

Make changes to Stage after doing a "Change Management" process (effectively,
document every change you plan to do, so that a average person typing them out
would succeed). Test these changes. It's nicer if you have a test suite, but
you won't at first.

Once testing is done and considered good, then make changes in accordance to
the CM on prod. Make sure everything has a back-out procedure, even if it is
"Drive to get backups, and restore". But most of these should be, "Copy config
to /root/configs/$service/$date , then proceed to edit the live config".
Backing out would entail in restoring the backed-up config.

________________________

Edit: As an addendum, many places too small usually have insufficient, non-
existent, or schrodinger-backups. Having a healthy living stage environment
does 2 things:

1\. You can stage changes so you don't get caught with your pants down on
prod, and

2\. It is a hot-swap for prod in the case Prod catches fire.

In all likelihood, "All" of prod wouldn't DIAF, but maybe a machine that
houses the DB has power issues with their PSU's and fries the motherboard. You
at least have a hot machine, even if it's stale data from yesterday's imported
snapshot.

~~~
nocha
You missed one of the really nice points of having a stage there. You use it
to test your backups by restoring from live every night/week. By doing that,
you discourage developing on staging and you know for sure you have working
backups!

~~~
kefka
Indeed. But if it's just 1 guy who's the dev, I was trying to go for something
that was rigorous, still yet very maintainable.

Ideally, you want test->stage->prod , with puppet and ansible running on a VM
fabric. Changes are made to stage and prod areas of puppet, with configuration
management being backed by GIT or SVN or whatever for version control. Puppet
manifests can be made and submitted to version control, with a guarantee that
if you can code it, you know what you're doing. Ansible's there to run one-off
commands, like reloading puppet (default is checkins every 30min)

And to be even more safe, you have hot backups in prod. Anything that runs in
a critical environment can have hot backups or otherwise use HAproxy. For
small instances, even things like DRBD can be a great help. Even MySQL,
Postgres, Mongo and friends all support master/slave or sharding.

Generally, get more machines running the production dataset and tools, so if
shit goes down, you have: machines ready to handle load, backup machines able
to handle some load, full data backups onsite, and full data backups offsite.
And the real ideal is that the data is available on 2 or 3 cloud compute
platforms, so absolute worst case scenario occurs and you can spin up VM's on
AWS or GCE or Azure.

\--Our solution for Mongo is ridiculous, but the best for backing up. The
Mongo backup util doesn't guarantee any sort of consistency, so either you
lock the whole DB (!) or you have the DB change underneath you while you back
it up... So we do LVM snapshots on the filesystem layer and back _those_ up.
It's ridiculous that Mongo doesnt have this kind of transactional backup
appratus. But we needed time-series data storage. And mongodb was pretty much
it.

------
Etheryte
The author should get their own legal in line - does the contract even allow
termination on the spot. If not, the employer is just adding to their own pile
of ridiculous mistakes.

~~~
FLUX-YOU
Probably. At will employment is pretty common in the US.

~~~
gcb0
land of the uninsured, non unionized free

~~~
coldtea
When America was more deservingly called the "land of the free" in the 40s and
50s, they were also heavily unionized.

(Well, then it was "land of the free, blacks excepted")

------
roadbeats
The ending with taking the laptop to home though... He is a modern time
Dostoevsky.

------
scarface74
One of the questions I asked my manager during the interview process was how
did he feel about mistakes?

I knew I was being brought in to rearchitect the entire development process
for an IT department and that I would make architectural mistakes no matter
how careful I was and that I would probably make mistakes that would have to
be explained to CxOs.

Whatever the answer he gave me, I remember being satisfied with it.

------
rsc-dev
[https://github.com/search?utf8=%E2%9C%93&q=database+password...](https://github.com/search?utf8=%E2%9C%93&q=database+password&type=Commits)

------
thomastjeffery
Reminds me of my first dev job, when I got a call during lunch:

"The server has been down all day, and you are the only one who hasn't
noticed. What did you break?"

"Well, I saw that all the files were moved to `/var/www/`, and figured it was
on purpose."

Suffice it to say, I got that business to go from Filezilla as root to
BitBucket with git and some update scripts.

------
Jare
Something tells me their production password was nothing like a 20-char random
string...

------
user5994461
I am the only one who is surprised that he can get the keys to the kingdom on
day 1?

Day 1 is when you setup your desk and get your login. Then go back to HR to do
the last hiring paperwork.

It should take a good week before a new employee is able to fuck up anything.
Really.

~~~
anotheryou
How long do you want to adjust the height of your chair? Setting up the dev
enviroment often takes ages. Why wouldn't it be the first thing to do? There
will be enough progress bars while updating something like visual studio to
finde time to re-adjust the chair.

------
runnr_az
Hilarious. I wonder if it's true.

~~~
solarengineer
This happened to a friend at a new job a few weeks ago. He wasn't fired,
though.

~~~
Jare
If the bit about no working backups is also true, he's likely to need a new
job anyway. :)

------
andreasgonewild
I did the same thing early on in my career. Shut down several major ski-
resorts in Sweden for an entire day during booking season by doing what we
always did, running untested code in production to put out fires. Luckily, my
company and our customers took that as a cue to tighten up the procedures
instead of finding someone to blame. I hear this is how it works in aviation
as well, no one gets blamed for mistakes since that only prevents them from
being dealt with properly. Most of us are humans, humans make mistakes. The
goal is to minimize the risk for mistakes.

------
orliesaurus
I stopped believing reddit posts a long time ago

~~~
Myrmornis
Exactly, the post is very clichéd. I have about 75% belief that it's
fictional. I guess it could be sort of entertaining to see how easy it is to
get a few hundred software engineers on reddit and hacker news worked up into
a sympathetic and self-righteous frenzy with a simple and entirely fictional
paragraph posted for free from a throwaway account.

~~~
ifdefdebug
I am about 101% it's fake. "Unfortunately apparently those values were
actually for the production database (why they are documented in the dev setup
guide i have no idea)" \- yeah, no. Had you told me you were able to screw the
production db up because it had no su password set, you might have got me. But
this is bullshit.

------
vinceguidry
Technical infrastructure is often the ultimate in hostile work environments.
Every edge is sharp, and great fire-breathing dragons hide in the most
innocuous of places. If it's your shop, then you are going to have a basic
understanding of the safety pitfalls, but you're going to have no clue as to
the true severity of the situation.

If you introduce a junior dev into this environment, then it's him that is
going to discover those pitfalls, in the most catastrophic ways possible. But
even experienced developers can blunder into pitfalls. At least twice I've
accidentally deployed to production, or otherwise ran a powerful command
intended to be used in a development environment on production.

Each time, I carefully analyze the steps that led up to running that command
and implemented safety checks to keep that from happening again. I put all of
my configuration into a single environment file so I see with a glance the
state of my shop. I make little tweaks to the project all the time to maintain
this, which can be difficult because the three devs on the project work in
slightly different ways and the codebase has to be able to accommodate all of
us.

While this is all well and good, my project has a positively decadent level of
funding. I can lavish all the time I want in making my shop nice and pretty
and safe.

A growing business concern can _not_ afford to hire junior devs fresh out of
code school / college. That's the real problem here. Not the CTO's
incompetence, any new-ish CTO in a startup is going to be incompetent.

The startup simply hired too fast.

~~~
watwut
The same thing could happen to senior. In particular, to tired overworked
senior. It is more likely to happen to junior, because junior is likely to be
overwhelmed. However, mistakes like this happen to prole of all ages and
experience levels.

Seniority is what makes you not put the damm password into set up document.
That was the inexperienced level of mistake. Forgotting to replace it while
you are seting up day one machine is mistake that can happen to anyone.

~~~
vinceguidry
True, but a senior engineer, even if he is never able to make architecture
decisions, can still be held accountable for knowing better. That is precisely
why they are paying him the big bucks.

If a shop is being held together with duct tape and elbow grease, then you
should have known that going in, and developed personal habits to avoid this
sort of thing. Being overworked and tired isn't an excuse. Sure, the company
and investors have to bear the real consequences, but you as an IC can't
disclaim responsibility.

------
jacquesm
This company has a completely different problem: no separation of duties.
Start with talking to the CTO how this could have happened in the first place,
re-hire the junior dev.

After all, if the junior dev could do it, so can everybody else (and whoever
manages to get their account credentials).

------
ww520
When it comes to backup, there are two types of people, ones who do backup and
ones who will do backup.

------
jjm
This is purely the fault of the entire leadership stack.

From Sr dev/lead dev, dev manager, architect, ops stack, all the directors,
A/S/VPs, and finally the CTO. You could even blame the CEO for not knowing how
to manage or qualify a CTO. Even more embarrassing is if your company is a
tech company.

I think a proper due diligence would find the fault in the existing company.

It is not secure to give production access and passwords to a junior dev. And
if you do, you put controls in place. I think if there is insurance in place
some of the requirements would have to be reasonable access controls.

This company might find itself sued by customers for their prior and obviously
premeditated negligence from lack of access controls (the doc, the fact they
told you 'how' to handle the doc).

~~~
scarmig
The Junior dev does bear a small amount of blame, if you really want to go the
blameful route.

But figuring out who to blame is toxic. You've got to go for a blameless
culture and instead focus on post mortems and following new and better
processes.

Things can absolutely always go to shit no matter where you work or how
stupidly they went to shit. What differentiates good companies from bad ones
is whether they try to maximize the learning from the incident or not.

------
aidos
Ahhhhh haaaa yeah.....I've done that.

It was the second day, and I only wiped out a column from a table, but it was
enough to bring business for several hundred people down for a few hours. It
was embarrassing all round really. Live and learn though - at least I didn't
get fired!

------
dennisgorelik
Obviously this is mostly CTO's screw up.

But the junior dev is not fully innocent either: he should have been careful
about following instructions.

For extra points (to prove that he is a good developer) - he should have
caught that screw up with username/passwords in the instruction. Here's
approximate line of reasoning:

\---

What is that username in the instruction responsible for? For production
environment? Then what would happen if I actually run my setup script in
production environment? Production database would be wiped? Shouldn't we
update setup instruction and some other practices in order to make sure it
would not happen by accident?).

\---

But he it is very unlikely that this junior dev would be legally responsible
for the screw up.

------
gregopet
I destroyed an accounting database at a company during a high school summer
job.

A mentor was supervising me and continually told me to work slower but I was
doing great performing some minor maintenance on a Clipper application and
didn't even need his "stupid" help ... until I typed 'del _.db ' instead of
'del _.bak'. Oooops!

Luckily the woman whose computer we were working on clicked 'Backup my data'
every single day before going home, bless her heart, and we could copy the
database back from a backup folder. A 16 year old me was left utterly
embarrassed and cured of flaunting his 1337 skillz.

------
icedchai
Obviously not the new engineer's fault. Unfortunately, aspects of this are
incredibly common. On three jobs I've had, I've had full production access on
day one. By that, I mean _everyone_ had it...

------
blackflame7000
After adding up the number of egerious errors made by the company, I'd almost
be inclined to say the employee has grounds for wrongful termination or at
least fraudulent representation to recoup moving expenses.

------
Myrmornis
Story sounds fictional to me.

------
kilroy123
He's / she's better off not working at this place. So many things wrong. Not
having a backup is the number 1 thing.

I could see having a backup that is hours old, and losing many hours of data,
but not everything.

------
pjdemers
Even startups have contracts with their customers about protecting the
customer's data. If it is consumer data, there are even stricter privacy laws.
Leaving the production database password lying around in plain text is
probably explicitly prohibited by the contracts, and certainly by privacy
laws. The CTO should pay him for the rest of the year and give him a great
reference for his next job, in return for him to never, ever, ever, tell
anyone where he found the production password.

------
codezero
Here's why I think this is fake:

A company with 40 devs and over 100 employees that lost an entire production
db would have surfaced here from the downtime. Other devs would corroborate
the story.

~~~
Analemma_
I'm also skeptical, but this isn't necessarily true. There's plenty of
software being written outside the HN bubble that's totally invisible to us.
What if this was some shipping logistics company in Texas City? We'd never
know about it; they wouldn't have a trendy dev blog on Medium.

~~~
codezero
Good point.

------
alexfi
I always wonder, why IT companies don't test their backups? Even if it's the
prod db, it should be tested on a regular base. No blame to the dev.

------
sandGorgon
We were paying for RDS right from when we were a 2 man startup. Zero reasons
to not have a dB service that is backed frequently by a competent team.

------
watwut
He needs to return the laptop asap, like now. They are in full emotional mode
and can overreact to what they might perceive as another bad act too.

------
learntofly
I don't work in tech but I'm an avid HN reader.

I'm surprised a junior dev on his first day isn't buddied up with an existing
team member.

In my line of work, an existing employee who Transferred from another location
would probably be thrown in at the deep end but someone who is new would spend
some time working alongside someone who is friendly and knowledgable. This
seems the decent thing to do as humans.

------
siliconc0w
Yeah this infra/config management sounds like land-mine / time bomb
incompetence territory. You just were the unlucky one to trigger it. Luckily
this gives you an opportunity to work elsewhere and hopefully be in a better
place to learn some good practices - which is really what you're after as a
junior dev anyway.

------
anorsich
Lucky junior dev! He has figured out a bad company to work for in his first
work day. Good luck finding a new job!

~~~
falsedan
Also, this is going to look great on their resumé, and be the perfect response
to the "tell us a time when you made a mistake" interview question.

------
tacostakohashi
Everybody agrees that the instructions shouldn't have even had credentials for
the production database, and the lion's share of the blame goes to whoever was
responsible for that.

There is still a valuable lesson for the developer here though - double check
everything, and don't screw up. Over the course of a programming career, there
_will_ be times when you're operating directly on a production environment,
and one misstep can spell disaster - meaning you need to follow plans and
instructions precisely.

Setting up your development environment on your first day shouldn't be one of
those times, but those times do exist. Over the course of a job or career at a
stable company, it's generally not the "rockstar" developers and risk-takers
that ahead, it's the slow and steady people that take the extra time and
_never mess up_.

Although firing this guy seems really harsh, especially as he had just moved
and everything, the thought process of the company was probably not so much
that he messed up the database that day, but that they'd never be able to
trust him with actual production work down the line.

~~~
wdewind
No, sorry, and it's important to address this line of thinking because it goes
strongly against what our top engineering cultures have learned about building
robust systems.

> Over the course of a programming career, there will be times when you're
> operating directly on a production environment, and one misstep can spell
> disaster

These times should be extremely rare, and even in this case, they should've
had backups that worked. The idea is to reduce the ability of anyone to
destroy the system, not to "just be extra careful when doing something that
could destroy the system."

> Although firing this guy seems really harsh, especially as he had just moved
> and everything, the thought process of the company was probably not so much
> that he messed up the database that day, but that they'd never be able to
> trust him with actual production work down the line.

Which tells me that this company will have issues again. Look at _any_ high
functioning high risk environment and look at the way they handle accidents,
especially in manufacturing. You need to look at the overarching system than
enabled this behavior, not isolate it down to the single person who happened
to be the guy to make the mistake today. If someone has a long track record of
constantly fucking up, yeah sure, maybe it's time for them to move on, but
it's very easy to see how anyone could make this mistake and so the solution
needs to be to fix the system not the individual.

In fact I'd even thank the individual in this case for pointing out a
disastrous flaw in the processes today rather than tomorrow, when it's one
more day's worth of expensiveness to fix.

Take a look at this: [https://codeascraft.com/2012/05/22/blameless-
postmortems/](https://codeascraft.com/2012/05/22/blameless-postmortems/)

~~~
tacostakohashi
I violently agree with you.

All I'm saying is that there _are_ times when it is vital to get things right.
Maybe it's only once every 5 or 10 years in a DR scenario, but those times do
exist. Definitely this company is incompetent, deserves to go out of business,
and the developer did himself a favor by not working there long-term, although
the mechanism wasn't ideal.

I'm just saying that the blame is about 99.9% with the company, and 0.1% for
the developer - there is still a lesson here for the developer - i.e., take
care when executing instructions, and don't rely on other people to have
gotten everything right and to have made it impossible for you to mess up. I
don't see it as 100% and 0%, and arguing that the developer is 0% responsible
denies them a learning opportunity.

~~~
samstave
Well, sure... but you cant expect one transitioning from intern status to
first-real-job status to have the forethought of a 20-year veteran, nor should
that intern/employee have the expectation that the company who is ostensibly
to mentor him in the very beginnings of his career, would have such a poor
security stance as to have literal prod creds in an on-boarding document, let
alone not relegating whatever he was on-boarding with to a sandbox with
absolutely no access to anything.

------
dk8996
Cool story but I think this is fake. Since there are 40 people in the company,
it seems like at least a few people before him followed the onboarding
instruction. I just don't believe that there would be that many people that a)
didn't do the same thing he did or b) change the document.

------
consultSKI
Repeat after me, while clicking your heels together three times, "It is not my
fault. It is not my fault. It is not my fault." It was obvious as I read your
account that you would be fired. A company that allowed this scenario to
unfold would not understand that is was their fault.

------
knodi123
I was only granted read-only access to the Prod DB last week, after achieving
6 months of seniority.

------
posterboy
I would assume this was mocked to test if the intern could follow simple
instructions, to provide a lecture for the huge consequences of small mistakes
and to have a viable reason to fire consequently; but I'm wearing my tin foil
hat right now, too.

------
chmike
It is really unfair to have fired him. The OP is not the one that sould have
been fired. The guy in charge of the db should be fired and the manager who
fired the OP should be fired too. And, by the way, the guy in charge of the
backups too.

------
justicezyx
Isn't this new person deserve a peer bonus by discovering a production risk?

------
laithshadeed
I would suggest you, once this sorted out, to publicly mention the company
name so no other Engineer will fail in this trap again. This will be lesson
for them to properly follow basic practices for data storage.

------
albertini_89
Unfortunately, software companies like that are everywhere, the guy is
learning and screws up a terribly designed system, the blame is on the
"senior" engineers that set up that enviroment.

------
brittonrs26
My question is, why in the world did they publish someone's production
credentials in an onboarding document? That has to be a SOX compliance
violation at the very least.

------
stevesun21
The CTO should be fired immediately!

If I didn't read wrong, they write poduction db credential in first day local
dev env instruction! WTF.

This CTO sounds to me even worse than this junior developer.

------
jaunkst
So a script practically set up the machine with the nuclear football by
default, and then you where expected to diffuse it before using it. That is
not your fault.

------
seattle_spring
I have a feeling the CTO was actually one of those "I just graduated bootcamp
and started a website, so I can inflate my title 10x" types.

------
feinstruktur
Should have job title changed to Junior Penetration Tester and be rewarded for
exposing an outfit of highly questionable competence.

------
Spooky23
Firing the guy seems drastic but understandable. Implying that they are going
to take legal action against him is ridiculous!

------
b33pr
So the company's fault. Embarrassing they tried to blame the new guy. So many
things wrong with this.

------
OOPMan
Wow. What a train wreck. This is why the documentation I write contains
database URIs like:

USER@HOST:PORT/SCHEMA

------
Grazester
It was their fault, plain and simple.

~~~
madaxe_again
How is it the FNGs fault that they have no backups, no DR plan, and production
DB details freely available and in the setup guide? The company is entirely at
fault.

~~~
Tharkun
I think the "their" was a plural referencing the company, not the dev.

------
askz
I really suggest that OP sends this thread to HR & others. And this isn't
sarcasm

------
nojvek
What company is this? CTO deserves some internet slapping.

------
jksmith
Obviously not a ssae 16 environment.

------
pier25
Either the CTO and his dev team are ridiculously stupid, or this was on
purpose.

------
ice109
Lots of people in the thread are commenting how surprised they are that a
junior dev has access to production db. Both jobs I've had since graduating
gave me more or less complete access to production systems from day one. I
think in startup land - where devops takes a back seat to product - it's
probably very common.

~~~
leokennis
I work for a large bank as OPS engineer. The idea that I could even read a
production database without password approval from someone else is too crazy
to consider. Updating or deleting takes a small committee and a sizeable
"paper trail" to approve.

Sometimes when I read stories like these, I think it's no wonder a company
like WhatsApp can have a billion customers with less than 100 employees. And
then I make some backups to get that cozy safe feeling again.

~~~
mrtksn
Probably because your industry is regulated.

------
sitepodmatt
Name and shame.. The CTO stinks of incompetence, and surprised he/she has
managed to retain any competent staff (perhaps he/she actually hasn't). What a
douchebag. You are not to blame.

------
draw_down
I worked with someone who did this, early in my career. His bacon was saved by
the fact that a backup had happened very soon before his mistake.

His was worse though, because he had specifically written a script to nuke all
the data in the DB, intending it for test DBs of course. But after all that
work, he was careless and ran it against the live DB.

It was actually kind of enlightening to watch, because he was considered the
"genius" or whatever of my cohort. To wit, there are different kinds of
intelligence.

------
jlebrech
I fucked up a table once by setting the column of every record to true, but I
had asked about changing the code to require a manual sql query a few weeks
prior so it could have been prevented.

------
holydude
People are designed to make mistakes. We should learn from them and try to be
more understanding.

------
gonzo41
Understandable that hey got fired. I image there would have been a quite
emotional response from the business when this happened, but that doesn't mean
it was necessarily the most appropriate response.

\--Unfortunately apparently those values were actually for the production
database (why they are documented in the dev setup guide i have no idea).--

Someone else should have been fired if this is true.

~~~
jacquesm
No, it's inappropriate. When the process fails you don't fire the junior
employee that showed you just how incompetent the organization is. You fix the
problem.

Firing this guy does nothing, fixing the problem does, but requires those
higher up to admit the mistake was theirs to begin with.

~~~
gonzo41
I should clarify, I don't think its appropriate, But I see why it happened.
The business was panicking. His firing was kinda on the cards. Probably a
blessing in disguise for this guy.

