
Dear Staging: We’re Done - ithkuil
https://devops.com/dear-staging-were-done/
======
linsomniac
If you spend $100K on a staging environment and it doesn't give you confidence
that it won't fail on production, the problem is Your Staging, not The
Staging.

If we have something fail in production, the first question we ask ourselves
is: Why didn't we catch this in staging? 90% of the time, the answer is: "We
didn't test if properly but thought we did."

Our rate of patching for defect is rare.

As far as the cost, our staging is run in hardware retired from production,
with some upgrades (mostly RAM and SSDs). I'm not sure we've spent the authors
$100K over 2 decades, let alone in a single year. But YMMV.

~~~
hinkley
Everybody is in the cloud now, where there are no hand-me-downs, or at least
none we get to use.

~~~
birdyrooster
A year and a half ago, IBM did a study and found that most companies are only
20 percent to cloud transformation. The reality is that there is a ridiculous
amount of private datacenters and iron around.

[https://www.ibm.com/blogs/cloud-
computing/2019/03/05/20-perc...](https://www.ibm.com/blogs/cloud-
computing/2019/03/05/20-percent-cloud-transformation/)

~~~
hbogert
This makes it sound like 100% is the goal. Some companies don't get added
value by going to the cloud. The cloud isn't cheap for example (among many
others) for storage heavy workloads. And sometimes it's also a regulatory
thing.

~~~
birdyrooster
Thank you for saying this, I am more than frustrated by that mentality
whenever I go to conferences or message boards. There’s a big world out there
and most of does not resemble the practices of internet companies. It’s this
fallacy of inevitability for cloud.

------
awill
>>"With feature flags, I can safely test in production without fear of
breaking something or negatively affecting the customer experience."

That's not true at all. It is a common misconception that feature flags have
no risk.

~~~
hinkley
I suspect it’s the same quality of thought that can’t fathom that a test could
have a bug.

Implementing a flag takes fewer lines of code than the feature, and so while
the odds are very high that the bug will be within a feature, there is a non
zero chance that the bug is in the flag.

A bug might break a new feature. It might break the old behavior. It may turn
the system into a latch (you can launch it darkly, turn it on, and then not be
able to turn it off).

And then you can break the system while retiring the flag and making the new
path permanent.

~~~
blntechie
> I suspect it’s the same quality of thought that can’t fathom that a test
> could have a bug.

I have to admit that I often find it funny when some talk about unit tests
being the end of all. No manual QA needed, no staging needed, tests passed and
everything is good to roll to production.

Unit tests are important, no question but it’s still a piece of code which
could have bugs or not enough scenarios covered.

------
dijit
The amount of bugs we catch in staging is absolutely staggering.

From services that suddenly require new ports to be open through VPNs
(development happens without intercontinental VPN’s of course) — to
commandline options not correctly propagating to our hardened variant of
configs — to even latency or bandwidth issues and timeouts due to assumptions.

There’s a difficult balance to strike when it comes to ease of development and
not letting issues (security or issues with distributed systems) propagate to
the customer.

Staging is also the place where you can safely let the chaos monkey reign
supreme without fear of losing customer data! It’s the last place to test the
“fit” of your system.

Staging has a place, and if puff pieces like this cause the company as a whole
to downplay funding for it then I’m going to be mighty displeased.

------
JMTQp8lwXL
> When I test my code, I want to know that it works in production. Staging,
> you can’t give me that.

This sounds like more of a problem with the author's staging environment than
a general issue with the concept of staging environments.

The idea is staging and production should mirror eachother as closely as
possible. If your staging environment is wildly different, of course it will
be less useful. But it's also your responsibility to maintain a useful-enough
staging environment. Why not just say, "it's too much effort for me to
maintain a separate, identical environment" rather than dismiss the concept
entirely?

~~~
hinkley
It sounds like a problem with the author, one probably drilled into their head
by management.

The longer you are here, the more weird shit you see, and the definition of
“know” gets a lot murkier.

You aren’t going to “know” if it works in production, even if you put it into
production. Which is something that only makes sense about the fourth time you
encounter a piece of shipping code that is so fundamentally rotten that you
are not sure how it ever came up with the right answer.

You don’t know if the whole car is working. You only know it’s moving the
direction you wanted it to move and it hasn’t exploded, made horrible sounds,
or smells. So it’s mostly working.

Every tool and process should be working to make you less afraid of trying
things (like deploying to prod). Otherwise they’re just helping you build a
bureaucracy instead of software.

~~~
sjellis
Yes, it's about confidence levels.

There will always be _known_ differences between staging and production, e.g.
service names, networking.

The point about staging is that it lets you run risk-free tests on a set of
things, in a way that give you confidence about the things that are the same
as production. The fact that a set of things works in staging is never an
absolute guarantee that it will work in production.

If you are in the position of needing to ship releases rapidly, and you can't
run meaningful tests in the available time, then no, it's not useful.

------
fastball
This is a joke right?

Staging is supposed to be _architecturally_ identical to prod, that doesn't
mean it should be _actually_ identical. In practice, this means that if your
prod has 10 API servers that are load-balanced behind HAProxy or something,
you can probably get away with only have 1-2 API servers in your stag
environment, as long as they're still behind HAProxy. If your prod PostgreSQL
primary has 64GB of RAM, in stag you can probably get away with 4GB (or less).
The key here is to have the same architecture, but on smaller scale (because
you're not actually trying to service your customers).

From there, if you have a staging env that costs "$100k", but you don't "infra
as code" (with Ansible, K8s, Chef, Puppet, Docker, etc), you're doing it
wrong. The difference between deploying to stag and deploying to prod should
probably be a one-line change.

~~~
skissane
> In practice, this means that if your prod has 10 API servers that are load-
> balanced behind HAProxy or something, you can probably get away with only
> have 1-2 API servers in your stag environment, as long as they're still
> behind HAProxy.

I think if you have more than one instance of a component in production, you
really need to have more than one instance in staging. You can get a lot of
issues with more than one node (e.g. session cookie stickiness in load
balancers, clustering framework issues, replication issues) which you won't
get with only one. I've seen before people get burnt by production issues
which they can't reproduce in staging because it only has a single node for
everything.

~~~
fastball
Yeah fair point, one is probably too aggressive with the downscaling. Though a
stag environment with an LB and 1 API server will still help you catch
problems that you otherwise might not have caught without the stag.

------
colinbartlett
It should be noted that the author is Developer Advocate at Split.io, a
commercial feature flagging SaaS product.

~~~
hardwaresofton
So with this piece of context, this is essentially content marketing?

~~~
rorykoehler
Isn’t everything content marketing in one way or another?

~~~
verroq
But some are more content marketing than others.

------
rgoulter
Well, it's nice that the author is considering the costs compared to the
benefits drawn from doing something "just because".

I don't think the article does a convincing job of arguing that releasing
behind feature flags would be much easier to maintain than a staging
deployment, though.

For making this consideration, it'd be worth seeing why the costs can't be
lowered, why the effectiveness of staging can't be increased, or what cases
might be difficult to handle with feature flags.

------
anonytrary
_Why not both?_

As others have said, staging is useful for certain types of changes that may
be high-risk and not time-sensitive. For other changes that are time-sensitive
and are low risk (like changing some incorrect/offensive copy), being able to
bypass staging is also useful. I don't mean to disrespect the author but it
feels like the article was designed to take an extreme stance for views. It
even reads like clickbait. I bet this article leaves a bunch of people
believing that using an intermediate testing environment can never be useful.

------
benjanik
I know we're mentioning the $100k number, but I'm genuinely curious: how do
you spend $100k on staging? I currently spend $75k/year on production,
$3k/year on staging, and around $2k/year for a full cloud environment per
developer that wants one. For me, $100k for staging would mean >200k monthly
active users on staging.

~~~
true_religion
You want staging to be as close to production as possible, but obviously not
exactly he same size.

If you spend 1M on production, you might have a staging environment 1/10th of
that size. Say 10 machines in a cluster instead of 100. But some technologies
also have minimum implementation cost requirements per tier of usage. Like if
you have a multiple data centers in production, you will need to test that in
staging. You can’t just have 1 machine in a cluster, so your staging
environment needs at least 3 to make a quorum. Therefore, to have 5 data
centers in staging, you now need 15 machines... which is more than the base of
1/10th of production.

Issues like that can balloon staging costs if you try to properly implement
it.

~~~
fastball
Honestly with the right architecture you shouldn't even need 10 machines in a
cluster, 2 should do. After all, the whole point of clusters/load-balancing
etc. is that 2 machines behind your LB should function the same as 1000
machines behind your LB, just the latter gives you a lot more throughput.

------
lazyant
This article is too simplistic to take seriously. Basically it says: "staging
is expensive (but $100k !?) and is not the same as prod because data (how many
tests fail because of this?), so I'll use feature flags and test in prod".

~~~
blntechie
Data consistency between prod and stage is also simple to solve for use cases
where it’s not an issue. For example, we have a job which restores each day’s
prod backup to staging.

~~~
devxpy
I do this, easy to implement, and it works pretty well -

    
    
      dropdb $PGDATABASE || true
      createdb -T template0 $PGDATABASE
      pg_dump $DUMP_DATABASE | psql -q $PGDATABASE

~~~
devxpy
whoa, am i doing something wrong? please share thoughts

------
lomkju
You can't test everything in production.

Be it an infra change or application. It should always be like:

Staging -> Canary -> Production

If you invest time in making the above pipeline automated it will cause less
outages.

P.S we use feature flags but it won't solve many other challenges around infra
and application changes.

Some examples: \- Changing from Classic ELBs to ALBs (We care about latencies
so ALBs served 1% traffic in the start) \- Zero downtime single mysql master
upgrade. (ProxySql) \- New k8s controllers. \- New metrics adapters.

Staging should be like a playground to test all edge cases before you go to
canary and then to prod.

~~~
trynewideas
> Be it an infra change or application. It should always be like:

> Staging -> Canary -> Production

Ironically, the author says exactly this when she presents on it. Why she
wrote this article in this fashion is beyond me.

[https://www.youtube.com/watch?v=adPQCuotAr4](https://www.youtube.com/watch?v=adPQCuotAr4)

~~~
joshuamorton
The article is a lot more recent than that presentation, its possible the
author's opinion has shifted.

~~~
loopz
So to "solve" her staging problem, she pushes the complexity of multiple
feature flags and branching unto developers? Sounds a bit short-sighted given
how quickly those complexities is known to multiply. Who is going to pay down
those growing technical debts?

------
instakill
I disagree whole-heartedly.

Personally, I have worked in an environment where staging was a pain, full of
cobwebs and issues and data mismatches. I sympathize because I know how
painful it can be.

But if you respect your end-users, you cannot delegate the duties of a staging
environment to them.

------
ram_rar
This article has reduced the argument for staging, calling it unnecessary. Its
upto the devops to use staging effectively. It you have a waterflow like
deployment and useless e2e tests running in staging, then it wont be helpful.
Instead, if you use staging as another prod (Blue Green deployment cough
cough.. ) then, it would be a lot more useful.

Than blaming staging env as a whole, I would blame archaic devops practices
for SaaS.

~~~
partyboat1586
Are you saying e2e tests are useless or they are useless in staging because it
doesn't reflect prod?

~~~
ai_ja_nai
e2e tests tend to be very brittle and not that informative

~~~
partyboat1586
I agree they are brittle but why are they not informative? They usually catch
edge case bugs in situations I wouldn't have time to manually test.

------
wildpeaks
Ideally you'd run tests both in staging (first) and production (only if
staging says OK), with the goal to catch 99.99% of issues in staging, safely
away from the paying end users.

Because realistically, something will go wrong sooner or later and it will
cost the company (and its customers) a lot more money and reputation if it
happens in the user-facing environment.

------
dabeeeenster
Hi. We open sourced our feature flagging product
[https://github.com/BulletTrainHQ](https://github.com/BulletTrainHQ) not long
ago and would love any feedback from HN.

I still have concerns with deploying production upgrades behind flags due to
the inherent issues surrounding things like database schema mismatches,
terraform/infra upgrades that are not compatible bidirectionally etc. Has
anyone worked on ways to manage these with feature flags?

~~~
loopz
It is possible but you need to design versioning in message and storage
schemas from the start. It's a non-trivial thing, but can be done. I'm not
aware of any general solution available, so need a bespoken design effort. You
need to map out requirements for versioning beforehand, and the complexity and
future limitations will follow exactly from requirements.

If instead of converging to versioning, you want full-on branching that never
merge, this sounds like very chaotic design. You'd then maybe want to design
everything around feature isolation (see: UNIX philosophy), but comes with own
caveats around new requirements and cross-cutting concerns.

------
RantyDave
Isn't this pretty much what "cloud" is for? And, y'know, terraform, ansible
etc?

------
chx
There's the production cluster then there's the development environments one
of which happens to be the staging branch but it is not different in
configuration from any web-1234 feature or bugfix branch. It doesn't add
anything in costs as I doubt we could use a smaller machine (or is it two
machines? can't remember but doesn't matter) than the current one just because
we allow one less env... I thought this is what virtualization and all that
was about?? And I absolutely can't see how could we drop the environment for
feature branches -- not only QA is necessary but also if you want to talk to a
stakeholder, how are you going to?

------
mikl
Getting an application to the state where you can use feature flags for any
new functionality is a massive and continuous effort. Any change in data
structures or APIs become a huge pain.

Whether this is worth it, depends a lot on the particulars of your apps.

And where do you test if your feature flags work correctly with production
data and environment if not staging? Yes yes, you might have tested in on your
development environment, but if testing on staging doesn’t work because the
environments are too different, testing on localdev or CI is even worse.

------
LoSboccacc
everyone gangsta until they have to feature flag a database schema change

~~~
jeffbee
having a database schema at all is the root of that problem. i wish people who
are writing systems from scratch would reflect on whether they actually need
an RDBMS or RDBMS-like schemas.

Everything I've worked on in the last few years would have been in far, far
better shape if they had just started with a no-schema data management system
like Bigtable and just adapted their backend code around its limitations.

Imagining you are safe because you are smoke-checking DML statements in
staging is just begging for a week-long outage.

~~~
john-shaffer
This is a big reason why I like Datomic so much. I get the benefits of a
schema, but the schema is basically append-only, so it's very easy to reason
about. Updating the schema is instant.

The last time I tried a project with a no-schema DB, it was a huge PITA. It
sucks having a function fail on one single document because it had a number
instead of a string. I love flexibility, but it is very valuable to have the
right constraints in the right places. I quickly learned that a schema is just
that.

~~~
doteka
I don’t see how you’d have a number instead of a spring in a document in a
typed language. Do you just directly dump user supplied data in your database
without parsing/processing it? Otherwise, how is this not caught when it
enters your system?

~~~
loopz
It is caught by the schema and never stored in the first place. Or you can
solve it for every column/field in your system manually, and probably
introduce subtle bugs and inconsistencies along the way. I'm for schemas, but
not necessarily RDBMS for everything.

------
KaiserPro
I've not had a staging environment for a good 6 years, if not more.

Thats not to say that a staging env is bad, they can be very useful, if setup
properly. It also might be the case that you really can't test in prod. (stuff
that provides safety)

I am very keen on feature flags/magic routers. They have the advantage that
you can test the inputs of the system with _real_ data. But that can also be a
drawback.

------
kissgyorgy
This meme came to my mind immediately after reading the title:
[https://memegenerator.net/img/instances/68740589/i-dont-
alwa...](https://memegenerator.net/img/instances/68740589/i-dont-always-test-
my-code-but-when-i-do-i-do-it-in-production.jpg)

------
ai_ja_nai
If staging is broken, the CI should break too due to the impossibility to
deploy. I don't get the "nobody will care to fix staging"; unless nobody cares
of releasing in prod.

I suspect this DevOps is working in loneliness, without enough communication
with devs. Devs ought to be educated, it's part of DevOps job.

------
mrmincent
$100k for a staging environment is definitely worth it IMO. It sounds like
this person hasn't setup a good one, but in my experience a good $100k staging
environment will catch more than $100k worth of bugs and downtime over a year.
One hour of downtime can be worth more than that.

------
PopeDotNinja
Have you ever worked for a company that doesn't have staging or feature flags?
It sucks.

~~~
fastball
Staging is so important to me that I moved our startup from Paddle to Stripe
as payment processor because Paddle did not (and I think still does not) offer
any sort "dev" mode that I could use to test our payment integrations in
dev/stag.

Literally their solution was "just make a product that costs $0 and test it
against our production service".

To me, that was a very red flag.

~~~
PopeDotNinja
The real kicker for me is when there’s no staging environment and production
is kittens-not-cattle, and yet there’s no strong way to wonder how this
situation came to be.

~~~
fastball
Wait, doesn't it make sense for there to be no stag in a kittens-not-cattle
situation?

You lovingly hand-provisioned your infra rather than writing infra as code.
This makes it non-trivial to set up a stag environment, so you just don't.

------
tuyguntn
feels like moving problem from operations to developer.

with staging: keeping almost identical environment with production, syncing
data if necessary to test load. can be considered as mostly Ops related work.

without staging: of course feature flags should be used when necessary, but if
there is no staging environment we should create feature flag for almost every
pull request, which in turn create a lot of legacy which should be deleted
later. add nested feature flags into this and now you have problem of seeing
which conditions are executed when multiple teams are working in parallel

------
szundi
This is typical. There is no recipe for all. For those that this is essential,
they accept the costs and do it anyway. Those who do this by fashion... I hope
they earn enough ;)

------
aussiedude
If userdata is whats failing you in staging there's simple ways to go about
pulling in a sanitised version of prod data to making it more prod like.

We had some issues with Akamai WAF blocking things in PROD which only got
caught when pushing to PROD so we added staging to Akamai now its closer to
1:1!

For example oracle e-business suite runs completely different with 900GB of
prod data vs 100gb of staging data.

ASP.Net applications are slower talking to databases with 1600gb of MSSQL data
in prod vs 40gb in staging vs 3gb in test etc.

DR performance different again because its scaled down version of PROD but has
the same amount of data and traffic loads when it takes full production load.

Blog writer is taking the piss.

~~~
loopz
Due to privacy and GDPR laws, this is bad practice and outlawed in many parts
of the world. We used to mirror from production, but you can't really do that
with user data without exposing some personal information. Even a timestamp
from a transaction can be enough to identify a real person.

------
mgraczyk
I set up a staging instance for my previous startup, then never used it. I
don't really understand what the value is unless you have dedicated QA staff
that run some manual tests against staging. If the tests are automated, they
should run in a dev environment. If you don't have QA, just test in prod. This
seems to work well for the largest tech companies, who (in)famously don't
really use branches and ship directly to prod with feature flags.

~~~
paranoidrobot
The answer obviously varies wildly for every product and company.

Not everything can be feature flagged. Not all tests can be automated. Not all
features can be safely tested in production. You can't always roll-back code.

Scenarios where I've found staging environments to be invaluable:

\- Features that make wide-scale changes to the application

\- Changes that require user acceptance testing. Management/Customers might
ask for Feature X, you have designers mock up what it'll look like, you might
even build prototypes of it... but until they see it and play with it, they
might not fully realise that what they actually asked for isn't what they
actually intended.

\- Integrating changes from multiple developers and seeing how they play
together

\- Testing that your feature flagging actually works

\- Testing how the system responds to dumping large amounts of data/requests
at it - a half-day outage in production might easily exceed the cost of a
whole year of running a staging environment.

\- Testing how the system behaves after data migrations/changes (things where
there's no roll-back without losing data)

\- Testing integrations with external systems.

\- Detecting poor performance, memory leaks, deadlocks, and other stuff that
isn't possible on a dev's machine

Testing code on your machine is fine as a smoke test, but your laptop probably
has multiple times the capacity of a single node on production, for a single
user. So your change that results in 1GB/sec of full table scans on the
database might not even be noticed because hey, you have an NVME drive. Your
tests also run for maybe a few minutes, but that staging instance might be up
for days or hours, and deals with hundreds

I've picked up numerous issues because someone (I'm including myself here)
committed some code that resulted in Staging going down or behaving poorly.

I've even overheard conversations that were along the lines of "Yeah, this
feature isn't working right at the moment in testing, but that's because
staging is running slow today It's fine, we can push this out to production
and it'll be better there." if it wern't for me overhearing that, it'd have
resulted in a multi-GB video file being sent for every visitor to the site,
costing a fortune in CDN charges and still running like crap because it was a
30 second video.

------
surfinganalyst
Dear Development: We're done.

~~~
blntechie
Why need a development code base when you can just directly edit and save to
prod? I remember opening tomcat prod directories, edit the html and saving it
live in prod years back. Maybe that’s where we are headed.

