

Who Needs Ops Anyway? - jrussbowman
http://joerussbowman.tumblr.com/post/51388938822/who-needs-ops-anyway

======
duopixel
I'm _that guy_ whom the post is talking about. I decided against weighing in
on the original thread on HN because I was still dealing with the downfall. A
couple of things that I noticed:

* Our users were _very_ understanding of what happened. We have received nothing but encouragement to keep on working.

* Some comments on HN were nasty. I'm glad to be 33 and not 23. Otherwise, I could have been driven away from building my product because of my own incompetence.

* Many commented on devs vs ops. The way I see it, I can ask a dev to supervise the work of an ops contractor. I can't hire ops + dev at this stage.

Any start-up has three main constraints: time, money and talent. These are not
set in stone, you can use time to produce money (consulting), you can use
money to buy talent (hiring), and you can even convert time in talent
(training).

So, when people say "let professionals handle it". Well, no, my particular set
of constraints won't allow me to do this. My budget for this is around
$100/mo. In an event where I completely run out of money I'd have to take down
the site indefinitely, which causes the same effect as an HD loss.

It is clear now that I lack enough resources to run a complex app reliably. My
focus in the next months is procuring those resources (money) so I can put it
back into the product (ops and devs).

~~~
lsc
>So, when people say "let professionals handle it". Well, no, my particular
set of constraints won't allow me to do this. My budget for this is around
$100/mo. In an event where I completely run out of money I'd have to take down
the site indefinitely, which causes the same effect as an HD loss.

well, drat. I was talking about launching a "tested backups" service, but it's
just not really worth my time until you get to the $500/month level or so, and
I'd probably want a setup fee on top of that.

(For that, I'd give you a full working replication of your production site,
hosted on my stuff- something that, in case of emergency, you could cut over
your dns and just run with. Something that you could go to at any time and
check on by going to yourdomain.backups.prgmr.com or something. Obviously,
this would take me setting up some sort of replication of your database.
Obviously, this also means that I'd need to know your application well enough
to figure out how to make the running backup not conflict with the primary,
and how to cut over to the backup /as/ a primary and how to cut back.)

I mean, I can do basic backups really cheaply, but testing them? that... that
takes effort. Effort and understanding the application. And untested backups,
meh, there's no reason for you to pay me to do it (maybe you pay me for space,
but that's the cheap part.) there are thousands of services that will cheaply
give you a place to hold files.

Huh. Most of the work, on my end, would be up front. What if I charged you
$100/month, but made you pre-pay a year in advance or something? that might be
worth it for me. (assuming I had the option to back out and refund your money
within the first X days should your application prove to be too difficult to
replicate.)

~~~
jnw2
If the production server is running as a VM to begin with, how hard is it to
take a snapshot of the whole VM and replicate that? (It does sort of break
down if you have a database that's continuously updated and you want to try to
have a continuously updated copy of the database, though.)

~~~
gaius
How hard is it? Well, about as hard as writing bug free code is. You don't
need testers, right, just don't put bugs in in the first place! Or is it not
easy after all?

------
venomsnake
The problem is that admins provide somewhat less visible value.

A programmer can create in a afternoon a feature that can be sold for a 100K+
dollars. And get the credit, glory and chicks. Or at least part of the credit.

To value properly admin you have to have had your ass pulled out of the fire
from a good one a couple of times. Which comes with experience.

~~~
sdfjkl
The biggest problem is that when you have a good admin, things will run
smoothly, but few will attribute that to his work.

~~~
KaiserPro
The classic adage is: A CEO walks around the office floor and notices how
wonderfully clean the place is. This CEO is all about efficiency, so he trots
into his office and phones up the CFO. He asks how much is spent on cleaning,
suprised at the cost decimates the cleaning budget.

In a brief email to the board, boasting of another hard day's work making the
company leaner and faster he says: "The office is already squeaky clean, why
would we need cleaners?"

~~~
lsc
Yeah, there's a common sysadmin joke about how if the CEO knows you exist, you
aren't doing your job.

------
ownagefool
"I did some math and I found out that using a third party email delivery would
be more expensive than the server"

The crux of the problem is the napkin math was wrong because he obviously
forgot to a) include his billable time on learning such stuff and b) include
the price of further staff to manage such things.

Personally I think there are tons of devs who can be good at ops, you just
need to quickly come to the realisation that many of the aspects of ops are
fairly essential non-optional functions of your job role, and should sit
behind the writing of new code if your company has anything of value.

Once you come to these realisations you can quickly understand that despite a
dedicated server being cheaper than paying for an email service, the time sink
and required technical knowledge will quickly more than even the cost out,
leading to this actually not being about ops for any other reason than an ops
guy wrote it.

~~~
nulagrithom
That line truly astounded me.

If "opening ports" is on the pain list, running your own mail server is going
to feel like running a marathon while having a seizure.

~~~
duopixel
You are nitpicking, opening ports was a google search away. It's just to give
an idea of my experience doing ops at the time (nothing beyond shared
hosting).

~~~
nulagrithom
Substitute "opening ports" for "installing SSL certificates" or "tweaking
config files" if you'd like; there's no nitpicking here. Don't feel insulted
though, my point is that running a mail server is an awful experience even
when you do ops for a living. It may look cheaper on paper to DIY, but in
practice it can be better to save your sanity instead of the money.

------
jmspring
I've found that there is a mindset out there that ops should be the sole
responsibility of developers. They wrote the code, they should own the running
of the service. This mindset falls down on two key reasons:

1) Not all developers are good at ops. Issues aren't always related to the
code. Platform choices come into play and all the characteristics of a
particular platform aren't necessarily known to developers.

2) In companies where developers are pushed hard -- It's a startup!, We must
deliver! (the typical death march) -- after 60+ hour weeks, one's trouble
shooting skills aren't the best.

In the last 3-4 startups I've been involved with, I've been one of a small
handful of people who can do ops as well as code (across a myriad of
technologies) and have pushed for engineering to provide as much logging,
documentation, and guidance to ops when the company deems them relevant; when
ops is considered part of dev, I pushed for getting some basic ops in house
(the two instances I can think of were the 'weekends are in the schedule' type
startups).

~~~
hijinks
well said.. I'm an ops guy that likes working with small companies and I've
come into a lot of startups where dev did the ops and it was a mess. No one
knew if backups work, monitoring was a mess at best and the way things were
setup, nothing would scale at all

------
jiggy2011
As someone who deals with mainly dedicated servers at the high end and shared
hosting at the low end, I've been curious as to what sort of things cloud
hosting like EC2 makes easier to the point you don't have to think about it.

As I understand it, they sell VMs running some form of Linux that can have
more RAM/storage/bandwidth dynamically allocated which takes out hardware
related worries.

In terms of software though, there are still quite a few problems that need to
be thought about.

For example:

* How do we backup files and databases? Where to, and how often? Are we just duplicating every so often or do we want snapshots at certain periods that we can revert to?

* How do we deal with software failures, like FS corruption?

* How do we update our software stack, when do we update it and how do we test that an update hasn't broken anything?

* What about if we want to concurrently run 2 versions of the same framework for different apps?

* How can we configure firewalling etc to allow trusted people to connect to the database, but block the people who spam the login form every 5 seconds?

* How do we make sure the software is configured correctly? Like charset encodings in the database, making sure that we have the correct modules installed into apache/php or the right gems installed etc?

* How do we manage background tasks, like cronjobs etc?

* How do we manage alerts when things fall over? Nagios etc.

A lot of the answers to these are going to depend on specific requirements for
the project so are going to require some ops know-how to set up correctly. Or
is it more the case that a cloud provider gives you a specific set up with
limited options and you make everything fit around that?

Or is there some magic that goes on which I am missing?

~~~
jd007
Amazon certainly makes a lot of things easier. For example, you still have to
manually back up your database and files, but the storage of those files are
simplified (S3, which is arguably one of the safest places to store files in
terms of durability).

Things like dealing with software failures, updating stack versions,
configuring DB, are pretty much the same with EC2 as they are with dedicated
hosting. There are Amazon products that help (Cloud Monitoring service) but
mostly you will be doing it yourself the same way you would on your own
hardware. Being a cloud VM hosts, some of these things are more convenient to
handle than if you were on your own hardware. For example, everything can be
done from the EC2 API, so you can programmically spin up/down instances
(machines) as things go down to keep everything working. But of course you
need to set up this failover/auto-scaling system yourself (the API just lets
you control the infrastructure).

This is the case with IaaS services like EC2 and Rackspace (you only get bare
VMs with some extras), but if you move to a more hand-holding PaaS service
such as Heroku, where you get the entire deployment system and failure
handling system, then your software stack management and failovers are mostly
handled by the service provider. Of course these services cost a lot more for
equivalent amount of compute power than IaaS services.

~~~
jiggy2011
I see, sounds mostly similar to just renting a VPS from one of a variety of
providers then in that it doesn't dispense with ops stuff.

I did evaluate S3 for backups for one project , but concluded that an rsync
script would be simpler and more portable.

In the case of things like heroku, how are software updates handled? Do you
contact them and say "I want to update to rails version x.x , do it and run
these automated tests" or do they just do everything on a schedule?

In other words, if you want an extra feature that is only present in a newer
version is this possible? Also what I would worry about it them doing a random
upgrade at an inconvenient time (like during a launch of something) and it
breaking something subtle.

~~~
jd007
Yeah EC2 is basically a VPS rental, but with a number of services surrounding
it to help with ops. So it is more convenient than a pure VPS service, but by
no means "takes care" of ops for you.

With Heroku you use files to specify your dyno configurations, and in the
runtime file you can explicitly set the version of software to use. Heroku
will read the file and take care of everything for you. Whether the version
switch breaks your application code or not you have to test out on a dev dyno
first, there is no way Heroku can help you with that. They will not touch
installed software versions without your explicit consent. They might change
the default runtime software versions though, so you should always explicitly
state the version you want.

------
ajhit406
Good article, and I wrestle with this constantly. But the truth is,
(Specialized) Knowledge and Curiosity are rabbit holes of unimaginable depth.
They can unravel your ambitions for a product when not managed correctly.

The trend in software development has always been to abstract policy, process,
and infrastructure into modular components when possible, and to allow experts
to manage them. I think the demand for such services largely proves their
efficacy in the marketplace.

The argument over whether a developer _should_ learn ops is interesting, as
the answer differs depending on what she intends to get out of building the
application.

Remember, most applications (though not all) are intended as business
endeavors. I think you need to look in the mirror and ask yourself-- "Am I
building this application to serve a consumer's need? Or to become a better
programmer / operations / systems engineer?"

In the case you're building something for a customer, time is your biggest and
most important resource. Don't squander it by prematurely optimizing things.
While it is admirable (and sometimes scalable) to invest in expanding your
knowledge sphere, this often isn't the smartest business decision. Truth is,
no matter how good you get with AWS, there is probably almost always someone
else out there who is better than you and is offering their knowledge and
experience as a service. And I can almost guarantee you your time (as a
founder) will always be worth more than what this service costs.

~~~
duopixel
I am not sure these are rabbit holes. One can certainly get lost in the depth
of a new discipline, but you can take a look around new ideas start sprouting.

I am an interaction designer, for example, but working with email got me
thinking into ways of using it to support the interface. Instead of sending a
reminder email for inactive users ("we miss you!"), you can send them a little
interaction ("is this challenge good? yes or not). I doubt that insight would
have come if I hadn't worked with email on the technical side.

------
chuhnk
I'm a sysadmin by profession but when I write code I don't want to have to
worry about backups, scaling, etc. Those things get in the way of creating the
product. What we deem as "cloud" infrastructure has gotten us partially there
by eliminating the need to think about scaling the service and allowing that
to dynamically occur on heroku or aws. There is obvious still some issues with
availability within zones or regions but the layers of abstraction for
hardware have come along way. Automated backups are definitely being explored
by cloud providers but I don't think its a guaranteed service as of yet. In
the next few years we wont even have to worry about that. We'll just be able
to go back to versions of data in time because of the way in which data is
already snapshotted in certain datastores.

Although I guess people still forget that anything can and will fail at some
point in time. That automated backup wont work one day, it'll be corrupted or
just wont run.

~~~
dedward
I think this is a common fallacy.

To randomly pick an example - sure, automatic filesystem snapshots are a
cakewalk these days, and a decade ago they were rather expensive. It seems
logical to assume that things that we needed admins to do and rig up somewhat
delicate systems for a decade ago are so easy now, we don't need people to
focus on that...

This overlooks the fact that the baseline has just moved ahead. Sure you don't
need dedicated people for stuff you used to.. but there is new stuff out there
that your competitors are hiring dedicated people to work on and push the
envelope. if you're okay with doing what you could have done 10 years ago,
just using reduced staff, that's great - but it's not going to win you much.

~~~
philwelch
Is operations really going to be a competitive advantage for you? If my
startup lets you hail cabs from your smartphone, is an extra 9 of uptime
really going to help me beat Uber?

~~~
nasalgoat
My company's space is pretty crowded with competitors, but we're beating them
because our service returns results in 78ms, while our competitors doing
devops and running in Azure, etc. return in 2500ms.

That's because we have a dedicated ops team who concentrate on performance and
scaling, and leave the devs to write code instead of managing servers.

~~~
philwelch
Surely the devs focus on performance and scaling as well, if not at the level
of managing servers? Having a dedicated ops team is a necessity, but it's
important to keep developers in the loop on operations as well.

~~~
nasalgoat
As the architect of the systems, I focus the designs on scalability and
performance and the devs implement it. So, whether it's a dev or its an
architect, someone has to be thinking about it. But having an ops background
helps with knowing how to scale.

------
makerops
I am a sysadmin and developer by day/night; I am currently putting together my
initial videos for <http://makerops.com>, which I hope to be the railscasts
for bridging the gap between dev and ops, for individuals/startups/people who
don't have a lot of money to spend. I'll also be doing some stuff for large
companies looking to transition their organizational structure to a "devops"
shop, which I hope helps me launch a consultancy. There are a lot of things
that developers can do to ensure they can scale, stay up, and not lose
everything. Worst case scenario, use a platform as a service (heroku etc) and
outsource all of the add ons, such as mail, logging etc. Best case scenario,
write all your infrastructure as code, and be able to deploy to N number of
clouds, or even bare metal, either way, I hope can learn how to produce a
screencast quickly enough to get this damn thing launched.

------
gaius
If you think devs should do ops, you probably also think devs should be
writing their own compilers too.

~~~
jd007
In a startup you may not have the luxury of affording both devs and ops. Most
of the time you end up having a devops guy who does a lot of both. Compilers
are the same regardless of who's using them, but ops needs to be tailored for
specific use-cases, which is very different.

~~~
nasalgoat
Just like how a 3-in-one printer/scanner/fax machine does none of those jobs
well in comparison to separate dedicated devices, you get what you pay for.

------
SudoAlex
If you've spent the hours learning to code, and even more time learning how to
host your brand new app on a server... just spend that little bit of extra
time to get those backups up and running - don't launch without it.

Your business could potentially be dead overnight without backups.

This doesn't just apply to the operations of your site, this also applies to
every bit of data related to your business. Do you have backups of everything
on your local machine? Do you also have backups in a remote location incase
your local backup gets destroyed/stolen? If not then stop everything until
you've put a plan in action.

Who needs backups anyway? Everyone does, unless you don't care about your
data.

------
chacham15
While ops certainly has its place (and an important one at that), there has to
be a line where it simply isnt reasonable to hire a dedicated ops person. For
example, if you have no users you dont need to worry that servers never go
down, that email isnt being routed, that user data is backed up, etc. Speaking
as someone building a company with no dedicated ops, I think that the line is
somewhere around: as soon as you can afford it. Before that, (i.e. pre-
revenue) it seems like a pre-optimization. On the other hand, backups
(especially of the software) are not inherently an ops issue. Developers deal
with backups all the time (e.g. git, svn, cvs, dropbox, etc.).

~~~
bifrost
> Developers deal with backups all the time (e.g. git, svn, cvs, dropbox,
> etc.).

Those are not backups, they are merely information stores. I have seen time
and time again that restoring a site from zero is extremely painful due to how
poor most developer tools are. In fact, some tools that are designed to "make
things easier" actually don't make them easier for ops folks when things
break. A personal bugaboo is startup scripts that aren't "portable" between
shells. That might sound overly annoying, but when you discover that your SW
isn't starting due to differences in environment variables in a dev vs prod
environment that should never have been there, it'll make more sense.

------
foxhop
Thanks for writing this, I have been thinking the very same thing for a while
now. This post inspired a post of my own
([http://russell.ballestrini.net/honey-i-just-deleted-
linkpeek...](http://russell.ballestrini.net/honey-i-just-deleted-linkpeek-
com/)) which talks about how I recovered from a catastrophic failure because I
had backups.

------
atsaloli
Whether or not you have a dedicated ops team, you may benefit from
establishing Ops - here is a guideline - Tom Limoncelli's Ops Report Card
<http://www.opsreportcard.com/>

