Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: When did multiple environments become standard practice?
35 points by withinboredom on April 7, 2023 | hide | past | favorite | 58 comments
I recently worked for a company that had a single environment, yes, even development was done against production databases. I worked there for awhile and at first, I was aghast at how weird it was. Then when I left and went back to multiple environments; I am again aghast at how complex it is -- not to mention that there isn't much 'force' to push you to learn how to recover from mistakes in production (leading to more downtime than I ever saw with one environment).

I was curious about when multiple environments became 'standard', but it appears to be a thing for as long as the internet can remember. Can someone who has been writing software since before the internet remember why we started to do things this way?




Totally off the cuff, I would say 1995. As to why? Because developing in production often breaks production, and about that time, we started to care about that. Before that, we were happy when things were working at all and didn't really expect them to work all the time.

As you say, it doesn't necessarily always work, but that's the notion.


> Before that, we were happy when things were working at all

What nonsense. Before the internet, shipping things often meant no updates ever, so it not only had to work, but it had to work with ‘no bugs’. There was no need for environment, because nothing ran in front of clients until the disks or cartridges were shipped. But there were dev, test and possible release versions.

It’s rather the opposite of what you claim; things worked and were expected to work all the time once out the door. You could not fix them, so they had to. They didn’t always, but did surprisingly well and testing feels at least more robust than now, when someone releases a game, and the minute after install, I can download 500mb of fixes, sometimes things that were really obvious and so very badly tested. Seems it is now that no one expects things to work all the time or at all on release. You just patch it later.


You're right about all of that. :) I guess I assumed the context of the question was online services. People working on web sites, back in the 90s, often messed around right in production. But if I remember correctly, the mid nineties are when I started to hear about maintaining TWO environments and doing things in "test" first. Seemed really whiz bang at the time.

What I'm mainly remembering, I suppose, was that any online service being down was just not a big deal, so it wasn't worth the extra infrastructure to avoid occasionally and temporarily breaking it. Nobody cared, at least not like they do now. And anyway, getting one copy working was hard enough!


> It’s rather the opposite of what you claim; things worked and were expected to work all the time once out the door.

Software versioning is from the 70s. There's no claim you can make to revise history. What you worked on is not what everyone worked on. I worked on the sole production env for many companies, even into the late 00s (usually e-retailers).


I didn’t say there was no versioning; there was, internal and external. But for consumer or mass produced company software, we didn’t ship updates. Bespoke software (often some type of erp) is another story: that had updates.

I was responding to the gp who claimed we were happy if things worked: we were but they did work as much as they do now. There is DOS software we wrote still running companies with very few updates as updates used to be incredibly painful. Physical, going to the company and updating their systems in the night painful.


What nonsense. I worked over 20 years in a shop that sold a saas app that did devel right on production the whole time.

It worked because we all knew it better damned well work.

Devel also happened offline of course, devel happened all kinds of different ways, merely it also happened right on the live copy that thousands of users were using.


Which is what I said? Who are you responding to?


You said there was no such thing as production. You talked only about shipped static product.


Depending on industry that might be 1980. Once telecom equipment became computers, those would have been production vs nonprod environments.

Erlang is 1986, and it was replacing other stuff.


> I am again aghast at how complex it is -- not to mention that there isn't much 'force' to push you to learn how to recover from mistakes in production (leading to more downtime than I ever saw with one environment).

It's interesting that most of the responses say we moved to multiple environments "because production matters". Yet, your experience says that a single environment is leads to less downtime in production, not more.

Is there any actual data on this in general? E.g., some study of downtime in single vs multiple environments? So much of software engineering "best practices" seem to amount to littler more than herd opinion, but rarely have anything substantive to back them up except somebodies experience.


I don't believe there is concrete evidence on the multiple environment thing but I think it's likely to come from what the environments are used for.

For instance, Netflix's approach plus the Dev Ops Handbook and the Phoenix Project all push frequent deploys as a win. Personally I have found that to be the case as well.

However developing in production is like git pushing every keystroke. Honestly, doesn't yield any benefit.

So personally I like local iteration, and then trying it out as a canary, and then go to prod.

The canary is in the prod environment.

In the end, though, I say all this and the levels.io guy writes his stuff right on the VPS sometimes and absolutely slays so who am I to prescribe.


I think the size of the team would be a big factor too.

One to three people (maybe) working on a web app can develop against prod and ship right to prod without breaking things too badly, 99% of the time.

Once you have >10 people working on a product, and committing changes in parallel, I'd wager it's impossible to avoid breakages, even with dev/prod/test environments


We had ~50-100 devs working on that system on any given day ... but only around 5-10 merges per day. However, each team was responsible for a given area of the code (through convention, not any hard rules), so in reality, each area of code was only touched by 2-3 people at a time.


I still work straight in prod. Clients are happy with how much faster we are than the competition. Usually faster by more than weeks from feature request to live. Sometimes things go wrong, but that happens with dtsp as well.

We tried dev test staging and prod first in the 90s when internet arrived with cgi-bin and later Java, but it was too much hassle without much benefit, so we stuck with dev-prod or just prod.


I wonder if this is a scalability thing? Is your competition the same organization size as you? How big of an organization do you think can manage doing this without breaking things too often for it not to pay off?


I think the real issue is this: Can management trust business continuance with "only" prod?

If you are a programmer who also understands the business requirements, you know the answer, but everyone else has to guess and can only guess: If they really know what the business needs they have to trust that the programmer they hired built a program that does that, but you know, most management doesn't even know that much...


I deliver to enterprises and usually (in the last years exclusively) on departmental level, so these are bespoke prod per client. If something goes wrong, it is one client, but the feature or bug fix was specifically for that client and they know it can happen. It doesn’t happen a lot though, as we don’t use the fancy new stuff that needs trans/compiling; just make a file edit for a feature and it’s done. So the possible breakage is limited to a small/tiny part of the entire software.


Thanks for the details! To clarify, I was asking about how big your organization (company) is compared to your competition, not how big your clients or the prod environments are. i.e. if you're just 1 person working in prod, I imagine that's a lot easier than 100+ people trying to avoid breaking it?


Oww apologies, we are with 5 fulltime people now. So indeed small. But we used to have almost 1000 25 years ago and it worked then as well, because we had one team per client and that would be 3-10 people.


Ahh thanks! Yeah, I think might explain it... I imagine your competitors probably have bigger (or more) teams working for each client, and that's why they're slower and have separate environments.


Well, the competitors by far and large are modern kool-aid drinkers and it's our huge advantage they are; they need kubernetes (so cloud) hosting, massive compile times, tons of crap (something called Clickhouse which saves terabytes of data every day that no-one needs or ever looks at; but it sounds cool to 'require' petabytes of free space for 'metrics'). We need a LAMP server or Docker instance. anyone can run our stuff in 2 seconds, no compiling, no nothing. Give us SSH and you'll be up and running, with your 1000s of colleagues ready to use the product. Modern dev is a sad affair but we thrive in it because it is what it is.


I was first paid to write a proper website in 1998, and having dev and prod was normal then.

Fast forward to today and I believe developers shouldn't have access to prod data, ever, even to debug things. Privacy and security of user's data are more important than having a nice dev experience every time. If your work in an ISO27001 environment you don't really have much choice.


How are the developers supposed to debug things if they can't see the real data?

Prod environments gets problems that never ever crops up in dev environments because the test data in those evironments are bounded by the developers imaginations.

In prod, real users regularly do unimginable things.


You have to actually understand what your code is doing rather than just winging it by accepting there'll be issues that you can fix if they show up. You end up writing a lot more defensive code - you rarely write things to reject improper values but instead you write guard functions that only accept proper values. You embrace the "parse don't validate" approach. You write generators and fuzzers that test APIs with thousands of variations of inputs rather than trying to think of "What would the user do?"

You write a lot of things that would be 'pointless' and 'slow' in most organizations, but you don't really see bugs any more, or spend time firefighting and context switching, which counters that slowness significantly. I find it's a really enjoyably way to build software, and I've carried on doing it despite not working in an environment where I had to now.


You'd need managment to really get what you are doing for that not to be career suicide. Because you will look like a slow ass developer, and since there is rarely problems you'd be mostly invisable.

Compare that with devs or teams that get features out the door relatively swiftly: Sure, there is problems in production sometimes, but mostly fixed pretty fast too. A dev team like that looks like heroes.


You'd need managment to really get what you are doing for that not to be career suicide.

My management trusts me, mainly because the app I look after has far fewer reported defects and incidents than the rest of the company. I still haven't managed to persuade other teams to join me yet though. There's significant fear.


What comes to mind is having someone with access to production who can replicate the type of data involved in a problem without the details that violate privacy. Though in some cases that is half of the debugging work done right there figuring out the type of data that breaks it.


That goal of never debugging in production is understandable. But some companies have that software that falls in to the shit-in-shit-out category of software. By that, I mean software that does basically no input validation and consistency checks. The business never wants to think through all the cases, they are just willing to specify the minimum of requirements for the easy cases. In such an environment you have no chance without looking at the real pile of s.., hmm data.

Obviously such companies should have an empty intersection with ISO27001-regulated once.


> When did multiple environments become standard practice?

What's "standard practice?" It's not standard practice because we don't have standard practices. We just have common misconceptions.

Once upon a time, deploys happened (at most) as frequently as engineers received paychecks. To test anything in an environment between paychecks, they needed to put the code somewhere else - a different environment.

Code likes to be deployed, though. It stays healthier the more frequently it's deployed. So keeping deploys tied to paycheck frequency ought to be a thing of the past.

But people we claim "we have to test it before production!" Ok. Contract testing is actually manageable now, thanks to Pact/PactFlow (love ya, SmartBear!). That means most environment-bound end-to-end testing can finally be replaced with service-level tests (with trivially reachable 100% coverage) plus contract tests . . . none of which require environments. So we can test all of it before production, test it better than before, and also (at least mostly) get rid of those crummy environments.

For anyone doing lean, those end-to-end tests are a huge source of waste for test teams. They require constant environment maintenance, data maintenance for the environment, in companies with multiple repos they require a way to know state across repos, and we haven't even talked about the test code itself yet and how brittle code running across an entire stack can be. Get rid of those and suddenly you free up test cycles to focus on things like helping testability or focus on previously overlooked areas - things that proactively make the product better instead of reactively hoping to catch regressions.

Most software engineers don't think about solving the biggest source of waste in testing because they're focused on features. Most "SDETs" today don't have the curiosity or imagination to change the system and they can't do it with Selenium anyway. I don't think most managers are concerned with actual process improvements so much as measuring the right KPIs and managing up. And no one else is in a position to even realize there's a problem. So . . . for the most part, nobody's looking to fix the problem.

To be clear, that's a warning that touching this kind of problem isn't doing any favors to your career. You're better off focusing on solving problems that get you promos until you jump to the next company and get a real raise.


As the old saying goes, every team has a testing environment. Some teams, however, also have a completely separate production environment.


Once your product matters, you start thinking about a safe testing environment.


The implicit assumption behind this statement is that using production for testing increases the risk of production outages.

Even if we consider it correct, I think it is a mistake to believe that the only way to handle that risk is to open the environment proliferation can of worms, and jump inside. It isn’t.

Each system is different, but in the vast majority of software systems I have found, concerns with testing on production environments aren’t actually good reasons to use non-prod environments. They are instead signals of more fundamental flaws in the system itself (or in the dev practises).

To use an example. If we are talking about a multi tenant system and we worry that tests ran against the testing tenant can break other tenants, what we have is a system that is inherently broken. We don’t prevent this problem by doing tests elsewhere (and inflating the maintenance burden, creating the risk of divergence between test/non-test envs, etc.) We prevent this by implementing multi tenancy correctly.


We had a multi-tenant system with billions of tenants. We just spun up a new tenant and tested there. Unit tests had their own isolated environment, but there was no reason to worry about a 'safe' environment. If you managed to break production, you likely knew exactly what you did to break it (and can at least tell someone about it).


(fuzzy...)

Mainframes and mini-computers had test batches run on the same hardware as the real batches.

Later for OLTP/continuous bank (ATM) systems, the big variable was storage: when to attach a disk you've been hoarding since it could take a quarter or more to get another one, but you're flirting with capacity issues. So there was some relatively ad-hoc reconfiguration of production machines.

eBay was probably the first internet company to really scale to national/international always-on services, first with perl then java. They had (have?) two-week release "trains", with hundreds of branches being merged in a topological order (in parallel: merge processes happening for multiple trains). What train you were on determined what services and resources were available. Only the best of the best actually did the merges. Google watched eBay tangled in IBM ClearCase and decided: one repo, no branches. The now-comment 3-stage environment (test, staging, production) is relatively simple, depending on your data pipelines.


What does your company use?

1) 0 non-production + production

2) 1 non-production + production

3) 2 non-production + production

4) 3 non-production + production

5) N non-production + production

6) Something entirely different.

I was thinking about branch based environments or per developer environments but was not sure how to depict that in a simple poll. Feel free to chat amongst yourselves.


Last three jobs I've been at, we've had three environments.

Staging - Continuous deployment; whenever code is checked into the main branch, it automatically goes here

QA - Once/twice a week we "cut" a release from what's in staging and promote it to QA. Our testers go through all of the changes and make sure that nothing is broken. We also have more rigorous integration tests running against this environment.

Production - After everything is vetted in QA, it gets promoted again to production.

I've seen some places have this take two weeks (in Agile terms, once a sprint). I've seen others do it basically every day (which IMO is too often). Once a week is the sweet spot for me.


Where I work we use option [2] - It works well. All merges to the main branch auto deploy to our Testing environment. Properly formatted tags go to Production as a release.

The front-end team get branch based deployments. The backend has their local docker development.

Try as we might, creating all the infra even with 100% terraform is a challenge - mostly for the swallowing of the AWS bill.

We just could not see the cost/reward for doing more than just non-prod + prod.

Totally open to ideas why 2 non-prods is better than 1.


Ideally you want the ability for every developer to quickly spin up a full ephemeral env, test, and then tear it down.


These makes only sense if you have proper test data or test data generation.


It depends on your system. Ideally you could tee/replay real traffic.


2 Non Prod, 1 Prod,

One of those Non Prod projects is usually a replica of Production, unless it is cost prohibitive.


I work with a lot of SMBs and usually at the first production breaking bug ;)

no but for real, i think mid 90s is right for this.

With kubernetes, dev environments are a given now, but when we used bare-metal that was expensive, we would even provision a box in the office as staging =)


We had separate test and production instances in about 1994/95 but that was Pharma (regulated)… when I moved to Web it was very cowboy style for a few years, don’t think we had a safe testing or dev environment until maybe 2001 then the whole thing tanked anyway.

Funny thing is after that I worked for a big company with a lot of “best practices” and except for me, for about six glorious months, nobody was ever able to run the full product on their dev machine.

So I assume there is still a long way to go in most companies. The bugs I see in FAANG software are also, as often as not, of the “clearly the dev never really ran the code” variety.


I remember in my second job senior developers would check their bank balances from their terminals, and would admit they actually had write permissions...so i think standard practice might not be so standard after all


I gather you were working at a bank?


Hrm.. Probably the early to mid 90's. Systems that could reliably provide internet service started to appear at low enough prices that you could justify a microcosm called dev.


In the airline world, multiple environments outside of production have been standard since 89 or 1990 as far as I know (maybe earlier, but idk).

That’s before any airline would use internet to talk to each other, but using their own network.

I think it was mostly motivated by allowing airlines to train their agents on a dedicated system but also rehearse their migration from their system to a GDS without perturbing any other airlines flights (or their own for that matter).


I started my first job as a software developer in 1990. We ran only the production environment and of course had lenghty down times. I changed jobs frequently and by about 1994, I found dev and prod environments everywhere. By 1998, dev/test/prod setups were the norm.


As with others comments I cannot attest to the "standard" however I was involved in the early foundational days for the multiple environments evolution.

I was the first programming hire in 1997 for a small electronic commerce company, being the first hire they had no systems and I was tasked with figuring it all out. Having written and managed in entirety one of the first API payment servers on the internet a problem quickly surfaced, near immediately actually, how do people learn to comprehend what it does without seeing it do it? Since there was nothing to look at, that mysterious 'black box' system when you swipe your payment card, the only solution was to have a 'fake' one that just didn't talk to the card networks. So the word 'fake' was not very well received in the business circles and almost immediately another issue arose - What do people code against to validate their own API client payment calls? Simple, I'll refactor the 'fake' API payment server to process API development test requests also. For those that have lived it and comprehend where this is going one can see the 'fake' payment server quickly becoming its own living and breathing platform to support. This led to the creation of four distinct separate environments all that matched from hardware up through software versioning in a staggered staged rollout window originating around 1999. I labeled the environments as DEV, QA, UAT, and PRD and each one had its own designated critical business flow reasoning. DEV was the environment the internal LAN technical parties tested against and to clarify this was NOT some developers personal system they kept running but was a match of PRD. QA was internal LAN quality assurance for all business test cases that needed stability outside of DEV along with load testing. UAT was external WAN User Acceptance Testing for new customer development as well as existing customers new feature development. PRD was of course external WAN and live production. This approach was tried and true tested leading up to Y2K and greatly alleviated countless PRD issues as I had established a code promotion procedure from Development to DEV, DEV to QA, QA to UAT, and then UAT to PRD. One can see three distinct code pushes before reaching PRD and this significantly increased uptime when such things were just becoming a topic.

"Can someone who has been writing software since before the internet remember why we started to do things this way?"

Clearly I am "old" as I sit here trying to remember how I remember this. :) Been through, done, and seen some crazy things in the last 30 years but what I'm building now…



development against production? no, no, no. What happens when you're testing your delete code? oops. The complexity of multiple environments has increased, but thats no excuse to develop on production.


It was a deliberate decision. Most bugs and challenges were due to scale (that you cannot replicate in a synthetic environment), and due to global distribution, not due to bad code.

And yeah, in the 5ish years I was there, I saw someone delete a database table in production exactly once. It took ~1 hour to finish deleting (hundreds of millions of rows takes awhile to delete), and it was back online seconds after that from a backup. I think we lost virtually no data. We just disabled the feature until the data came back.

I can't imagine that happening nearly as flawlessly at any company before or since.


Did the recovery go well because of the single-env set-up, or despite of it?


Well, the fact that we could just 'turn the feature off' was something that you'd generally only see because you /must/ deploy code that is disabled to keep PR's relatively small. So pretty much anything and everything can be entirely disabled -- it really depends on how long the feature has been enabled though. If it has been 'always on' for years, its possible new devs didn't include the check when modifying/adding the feature to new places.

Then there was the fact that it was deleted at all. Once in 5 years, with a total dev team in the hundreds ... isn't that bad. They only pulled it off by doing something they were told not to do. That can happen anywhere.

Also, since devops knew that it /could/ happen (because there is always someone who does what they aren't supposed to do -- including themselves when things break) took streaming backups of data. They were able to rebuild the table while the original was deleting and swap it back in as soon as the db released its write-lock.

To me, I hadn't seen that level of 'fixing things' in prod. At 'regular' companies, if someone managed to delete a table, they'd probably spend at least twice that time finding someone who knew where the backups were and figuring out how to restore them. Meanwhile prod would probably be down completely since they can't disable the feature.


Testing delete code against production isn't that big of a deal. If you screw up, the "oops" is immediate and obvious, as is the recovery strategy — restore from backup.

The real fear with deploying code directly against production is data corruption. I've seen more than one instance where two pieces of code, both of which had impeccable unit test coverage, interacted in strange ways to cause data corruption, because of mistaken assumptions that their respective authors had about each others' code. This is the kind of thing that you need integration tests to catch. Integration tests need some kind of common environment (even if it is ephemeral) in which to run. And now you have a development environment.


lol, I didn't enumerate all the ways this is bad. Restoring from backup can mean an outage and possible loss of data (the intervening transactions that are lost), this is to be avoided, it also assumes your backups work, often an untested case. Corruption is bad to, the entire idea is bad.


Multiple environments is an old practice. I would roughly correlate the growth of use of Git with the popularity of them, though. It’s easier for smaller/newer/more stretched companies to manage multiple environments when it’s easier to move code and assets to them.

As for your current place: single environment won’t make them stop breaking things. Deploying much more often will. The staging environment, or gradual rollout in prod, doesn’t help if you have a lengthy manual QA process so you need to force deployments into a situation where there’s no time for slow and manual.


Hah, I can’t tell which paragraph offended. Do we have more slow testers or CVS diehards here?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: