> It also sounds like what you're saying is that you can just avoid all these pr...

> It also sounds like what you're saying is that you can just avoid all these problems by having everything automated from day one, but that's not the reality in any employer I've ever worked for. Unless you're starting a company today and happen to have an experienced infrastructure engineer on staff from day one you're not getting that world

Certainly, it's hard to start it from day one. But it's not that difficult to move into it. If you have buy-in from your developer/operations team. We started with API and CLI calls for our beta version. Our next migration was to use partial Ansible control and we morph that into almost a monolith because of interconnected pieces that were typically required with this design. But it really didn't need to be a monolith, we just wanted to link things together and building a giant monolith was easier to make those references.

So we then split up into smaller Ansible playbooks, and we did lookups to create the linkage, which roughly broke the monolithic pattern and allowed us to do smaller deploys. But we still ran into breaking changes unexpectedly. So we decided to abandon the months of effort we put into Ansible and we started looking at terraform, because one of their salespeople promised our management team that it is cloud agnostic and we would only have to write once. After a week or two looking into that, we realized that we were basically just going to have to remake Ansible modules and we rarely weren't saving anything by migrating. Granted this was two and a half to three years ago. Things might have changed.

We then switch to SAM, and as we did that we extracted the Ansible side out of our deployment and we started redeploying brand new small SAM stacks, and started treating almost everything of our infrastructure as sheep instead of cattle, we completely redeployed our launch configurations our cluster are lambda functions, basically everything except the database, DNS, and the CDN with every deploy. This basically removed state is being an issue because the state is only needed for the first deploy, and we don't technically change the stacks afterwards since we simply replace them every time. For us, this also meant we could easily test and roll back if needed. Since we don't need to change the state of the stack back to its previous state, we simply changed the pointer to the previous stack, which typically was a DNS state change. But like I said, we focused on our states being only around DNS, etc. Which, almost isn't stateful, because our SAM deploys would insert zero weighted DNS records, and The state management is really just adjusting the weighted values.