Imho tools should use actual code (whether it's TypeScript or Kotlin or whatever) instead of reinventing constructs like loops and string interpolation.
Thankfully these tools are getting more popular, because frankly I can't stand configuring another Kubernetes or GCP resource using a huge block of copy/pasted YAML.
You’re not telling AWS how to do something you’re telling it what you want the end state to be and let it figure out what needs to be created, updated, deleted, or replaced and the dependency chain what can be run in parallel when you create or update the template.
There are linters and editors for CloudFormation that help you with autocomplete (?) and to warn you when you are specifying a a resource type. You can even add your own custom definitions to the linters for custom resource types that you create.
CloudFormation just like generic SQL doesn’t by itself have the concept of loops. CloudFormation does support custom transforms and macros that you can create in any language and you can write programs that generate CF in many languages using the CDK where it will perform validation. I haven’t used it so I am being really hand wavy.
We ended up preprocessing them using jinja2, injecting our variables via its context, and using its cleaner syntax for expressing conditions, loops, etc. Now we have the best of both worlds.
Apologies, I don't mean to be a downer. But... you're making your present life easier but making the code harder to maintain for future maintainers of your codebase.
I’m eagerly awaiting 0.12 so the language would have similar features as jinja templating.
That's what sparkleformation does for cloudformation stacks.
Just as in databases, you end up playing lots of “read the query plan, restate the query to try and get a better one” games.
I saw it with early versions of Chef, I've seen it with Apache Auroras python based job specs for Mesos etc.
I'd rather work around the limitations of a DSL that is declarative and limited but is consistent across orgs than having to retrain myself if I move to a new job.
Operations is the wrong place to do engineering, either that or operations needs to level up significantly since they do cause a lot of delays and damage to companies with these poorly design tools.
Configs are hard, and they're different to enough from normal software (for example DRY doesn't apply in the same circumstances) that using the same tools is a bad idea.
I've seen what happens when swes use swe languages on configs. They get unintelligible. And then I have to clean up the messes.
To be clear though, using Json or toml for configs is also often a mistake.
I think a valid question would be what is the right framework factoring or design should be, but pure declarative isn’t not and half declarative like Terraform are just repeating past mistakes.
Doesn't it, though? Dumb configs tend to suffer very much because of the impossibility to apply DRY directly, and people end up using/writing config generators just to ensure consistency of values.
Hell, isn't half of the job of a IaC tool to be such a config generator, papering over lack of capabilities of configuration languages?
Theoretically, maybe. In practice, I don't think so. Mostly because code is much less prone to change than configs are. Like if you have some encapsulated set of behavior in code, it's often easy (and not particularly painful) to do something like
self.thing = x if self.other else default
self.val = ...
self.val = ...
self.val = default
Because code-code doesn't change that much, this kind of weird gross encapsulated behavior is (usually) ok. It's still a code-smell, but you don't get burned by it. But when you are dealing with configs, they do change, and you want to be extra explicit, because (as you probably know), the number of exceptions and special cases will inevitably grow, and you'll end up with implicit leaf-level configuration hidden away in your so-called encapsulated stuff, but without end-user visibility into what is actually happening.
One solution to this is to completely ignore DRY and ensure consistency via unit tests or static analyzers or something, but that also sucks (possibly more) than doing some denormalization within your config-language itself.
My experience says the right way to do this is to restrict yourself to not using any conditionals other than perhaps get-with-default, and doing everything that would be done conditionally via inheritance or composition. Remains to be seen if I think I made the right decision 5 years from now.
I find this a strange argument. Traditional operations people will still resist using these techniques, and will use pre-canned modules and resources if they're forced to use something.
The real reason seems to be because of the declarative nature of IaC rather than concerns about adoption.
(Those examples, the real world ones, where from competent software developers that did product development, not operations people).
If you’re in technology your profession is to solve problems, it’s not to entrench poor decisions in a company, which in some sense everyone is at fault for, but this area affects a lot of a company and the attitudes are pretty anti-improvement if it goes out of someone’s skill set. The end result is that the software will be moved out of operations, I see this more and more. Operations can’t improve just by magic, they need to embrace the past 50 years of computing knowledge too.
Oh. Really? /s
That's literally what DevOps is all about. I'm sorry but Infrastructure isn't as simple to manage in code as "regular" software. The Infrastructure API's mutate or behave in unexpected ways too often. State drift is common. There is active work in figuring out better solutions. But to say that operations have not embraced computing knowledge is nonsense.
For all the many benefits we've seen with Infrastructure as Code to date, the tools are still fairly primitive - copy/paste is the norm for reuse, testing is rare or non-existant, productivity during infrastructure development is low, continuous integration and delivery are largely ad-hoc, and there are very few higher-level libraries available to abstract away the complex details of today's modern cloud platforms. Net - it feels like we're still programming cloud infrastructure at the assembly-language level.
I'm really excited about the opportunity to bring more software engineering rigor into the infrastructure as code space. At Pulumi we believe using existing programming languages is a key enabler of this. Pulumi is still a desired state model like other Infrastructure as Code offerings (so you can still preview changes and make minimal deltas to existing infrastructure) - but you can write code to construct that desired state. As a result, you get for loops and conditionals, you get types and error checking, you get IDE productivity tooling, you can create abstractions and interfaces around components, you can write tests, you can confidently refactor you infrastructure code, you can deliver and version components via robust package managers, and you can integrate naturally into CI/CD workflows.
Pulumi isn't the only tool in this space - we're seeing things like Atomist bringing this same model to delivery pipelines, and AWS CDK bringing this model to the CloudFormation space. I'm excited about where these tools will take the Infrastructure as Code ecosystem in the coming years.
[disclaimer - CTO at https://pulumi.io so clearly biased on this topic :-)]
So good luck to you!
Pulumi is going down a better route in that they’re using a normal programing language with a normal tool set, however I’m skeptical of how their engine is designed with RPC calls for language bindings, since again it makes debugging more complex as opposed to just a normal sdk. They also don’t have debugging enabled yet.
What I see as optimal is tool like make, which calls other command line tools. Resolves dependencies, processes errors. But not make, something with sane syntax and error handling.
All these new fancy all-Go tools creep me out. Infrastructure tooling should not live in domain of one programming language. Extensibility should be language agnostic. If I want to write some Perl/Python/Bash script to support some very non standard part of my infrastructure, I should be able to. If I want to plug vendor specific utility execution into deployment pipeline, I should be able to. And it should be easy.
It certainly makes one's ability to look into the machinery and know what is going on much harder.
> If I want to write some Perl/Python/Bash script to support some very non standard part of my infrastructure, I should be able to
That very feature is why I love ansible: if there is some quirk, or even an unreleased module (they only ship new features in major releases, I recently learned), then you can copy the upstream file, or a modified version of the existing one, or even a whole new module, into the `library` or `lookup_plugins` directory of your playbook and you're back in business. No fighting with golang anything. You can also write ansible modules in any language you like.
The counterpoint to this is things like Gulp or Gradle which become a nightmare after a couple years of multiple developers and coding styles appending things here and there. Now rather than just spending a few hours learning a basic config DSL, I have to build up a mental execution model every time I want to add a build step.
Similar to React and it's virtual dom.
I disagree with this one. I believe describing an infrastructure should be fully declarative and the tool should decide how it needs to create resources based on the description. This way, the infrastructure code almost can't have bugs, but the tool can be fixed for everybody.
ShellJs works pretty nice. Not only is my script now cross platform, but doing conditional logic and user prompting is a lot easier in code than bash.
The only "issue" I've found is that ShellJs is quite barebones. I wrote a wrapper over it to do everything, such as nice question prompting and colored output.
I used to use this heavily back in 2013, 2014. Infochimps got picked up by eBay, afair. Hence why this was never developed further.
What I am missing (often, in these type of articles as well as in actual production environments) is the fact that if you develop (infrastructure) code, you also need to test your (infrastructure) code. Which means you need actual infrastructure to test on.
In my case, this means network equipment, storage equipment and actual physical servers.
If you're in a cloud, this means you need a seperate account on a seperate creditcard and start from there to build up the infra that Dev and Ops can start deploying their infra-as-code on.
And this test-infrastructure is not the test environments other teams run their tests on.
If that is not available, automating your infrastructure can be dangerous at best. Since you cannot properly test your code. And your code will rot.
This of course depends on your level of lock-in with various cloud environments.
This is kinda why I love Google Cloud and don't see myself moving to another cloud provider until they match GKE. I want all developers to throw everything into GKE, and Operations manages only the VPC's, firewalls etc. Developers get complete ownership over compute (and networking within the cluster) while broader network management can still be managed by an operations team.
It does assume no hardware or complex networking needs to be handled.
And there is the point of observability. When there is a proper testing ground for developers that is as-production, it enables developers to dig into and mess with logging, tracing, debugging of all sorts.
This adds value by providing developers insight into what a reliability engineer (or whatever they call sysadmins these days) needs to provide whatever service it is that the developers' code is part of.
And I want engineers to be able to futz about with all cloud services available, without having to worry about any negative impact on production.
And finally: What happens when $cloud_provider makes changes to the accounts interface and you want to mess around with those new features, without hitting production?
Give your future-self a break, and make sure you can futz around on any and every layer.
Another common practise is using seperate domainnames. Don't use 'dev.news.ycombinator.com'. Instead, use 'news.ycombinator.dev'. This frees you up for messing around with the API of the DNS provider. And when switching DNS provider, test whatever automation you have in place for this.
Just because you maxed out the credit card doesn’t mean that you don’t still owe the money if you go over. That’s what billing alerts are for.
That’s what the separate accounts are for but you don’t need a separate card and you still should be using an organization.
This frees you up for messing around with the API of the DNS provider. And when switching DNS provider, test whatever automation you have in place for this.
Why isn’t your DNS provider AWS with Route 53 where the separate domains would be in separate accounts with separate permissions and separate access keys/roles per account?
All our servers configuration is managed through Ansible, apps are containerised and run on Kubernetes (on CoreOS, so even less configuration required), apps are deployed automatically with CI scripts.
Why would I need to describe the hardware/infrastructure as code? I create VPSes manually, once in a blue moon, and they just get added to our K8S cluster, using the same Ansible template. It takes 2 minutes from VPS creation to adding the new node to the cluster. It'd be nice to describe in code the exact hardware provisioning, but that can be done easily with a couple lines in the internal documentation. And actually, having heterogenous server sizes might be a feature, to be able to schedule less demanding apps on less powerful and cheaper servers.
What benefit would Terraform give me? Not being snarky, I just don't know how to fit it in our process.
- Disaster Recovery: Imagine for whatever reason your VPC is completely destroyed, all your VM's are gone. You still have data backed up (you do backups don't you? :)). Your company has to inevitably be down for some time, but how long would it take to restore everything? Terraform can reduce this from days/weeks to minutes; all configuration is there in code, is reproducible. Even if you don't/can't use terraform itself, you still have captured all your infra config information in code and not just in a Google Doc/Evernote/Post it notes.
- Audit Trail: You want to empower developers to make changes to infrastructure without opening a ticket and asking you to do it. But if they don't open a ticket, it might make your compliance story much harder. If they do open a ticket, you now have a huge ticket backlog so you hire more engineers .... kinda relates to the point about scalability. Using Terraform, and enforcing infra changes through terraform, you have a super simple audit story, and will know exactly who made the changes, when they were made, for what reason etc.
- Infra Convergence: Its a fact of life that you need to make temporary infra changes for fixing fires, hotfixes, super important custom customer request etc. If you allow your infrastructure to be cluttered with these one off changes, it will be messy, developers won't give a shit when making new changes, and often everyone will have permissions over everything. Using terraform, every time you make changes to infra, you discover these manual changes and either revert them (if they're one offs) or commit them to code (if they're meant to be permanant)
I'm a new user to terraform, and it certainly has its problems (its still 0.* for chrissake). But... it has been a very useful tool when used correctly and with discipline.
You certainly can write Ansible playbooks to create all of those resources if that works better for your team, but generally it's better to draw a line between configuring your VM/container/server and provisioning all the infrastructure it depends on.
I'm sorry -- how many people are still doing this in 2019?
Most popular tools offer infrastructure as a config with some horrible and limited scripting language (in some cases even JSP doesn’t look like a horrible idea - hello helm). Declarative VS Imperative Holywar is getting a bit old (like any other tech war).