
Open sourcing Terratest: tools for testing infrastructure code - kiyanwang
https://blog.gruntwork.io/open-sourcing-terratest-a-swiss-army-knife-for-testing-infrastructure-code-5d883336fcd5
======
brikis98
I'm one of the creators of Terratest. Happy to answer questions.

The main question I've seen so far seems to be how Terratest compares with
various "spec" tools (e.g., inspec, serverspec). Most of the spec tools focus
on checking the properties of a single server or resource. For example, is
httpd installed and running? Terratest is largely for end-to-end, acceptance
style testing, where you deploy your real infrastructure, in a real
environment (e.g., AWS), and test the infrastructure actually works as
expected.

For example, let's say you wanted to test a module for running Vault
([https://github.com/hashicorp/terraform-aws-
vault](https://github.com/hashicorp/terraform-aws-vault)), which is a
distributed secret store. With a spec tool, you might test a single Vault node
to check that Vault is installed and the process is running. With Terratest,
you'd check that the whole Vault cluster deployed correctly, bootstrapped
itself (including auto-discovery of the other nodes), that you can initialize
the cluster, unseal it store data, retrieve data, and so on.

------
nodesocket
I use Packer and Terraform extensively for my consulting company. I appreciate
the work on trying to make Terraform/Packer testable, but I wish this tool was
Golang agnostic and written in HCL or a markup language.

The biggest concern with Terraform is that small changes (ex: changing
Terraform variable names) often causes rebuilds of large chunks of resources,
which when in production is scary and sometimes cannot be applied without
downtime. Point being, not sure testing (expected results) is the core
problem. The problem is confidence destroying and editing resources will not
produce unexpected downtime (modifying ELB's, RDS, IAM, or security group's)
are examples.

------
peterwwillis
It's weird to think of infrastructure/deploy automation in terms of "testing".
Yes, you are always testing things, but not always in the "func test" sort of
way. There's automation testing, and there's testing of automation, and
there's tests that are part of automation. Terratest seems to be the second,
but it's the third one I think is most useful.

If you build your infrastructure/deploy automation correctly, you should be
able to redeploy your full stack all the time, and when it succeeds, throw
traffic at it, and if you don't detect any anomalies, make that the new
production service. On the detection of anomalies you simply move the traffic
back to the previously deployed incarnation (or re-deploy the old incarnation,
if necessary). For sufficiently large systems this gets more complicated as
you can't just duplicate your resources, but the smaller pieces that have
actually changed can be shifted around.

The idea of a "rollback" is really just "return to a previously known good
state", but it's misleading. It was previously good _before now_. Now things
have changed, and it might not still be good. So just as much as you can test
newly deployed changes before you make it the production service, you should
probably also test the _previously deployed changes_ to make sure they will
work again if pressed back into service. So, regression testing for
infrastructure, I guess. (You'd do this if you were building a physical
product like a network appliance to make sure your old appliances still work
with newer software releases, but we rarely think of software-derived
infrastructure this way)

------
tty7
How does this compare to severspec
[https://serverspec.org](https://serverspec.org) and inspec
[https://www.inspec.io](https://www.inspec.io)

Also i would hate to be on a team who is scared to change their code, be it
infrastructure or application code!

If you are scared to change your infra code then you probably are not
following true infrastructure as code practises.

~~~
TobbenTM
I think the fear of changing infrastructure code is usually not rooted in bad
practices, but rather immature tools and practices.

Speaking from experience, it's really hard to get infrastructure as code
deployments as bulletproof as application deployments, because you're so
dependant on the toolchain, and its interaction with the provider (AWS, Azure,
etc).

And in some cases, it's impossible to actually do a clean infrastructure
deployment without some manual steps, which leaves you wondering what new
changes might need manual steps as well. 'Which problems have I not
encountered yet?'

~~~
tty7
I read "scared" as literally "a state of panic".

I'm not sure I agree the dependency on the toolchain & provider, I am always
able to read what API does (vendor side) and how it has been implemented (tool
side) and make a educated decision. Caveat, I do not use bleeding edge
features of any cloud provider and unless engineering requires - I do not use
PaaS features of cloud providers - which i find have roughest edges for infra-
as-code.

Secondly I think that part of that issue is maybe that the wrong toolchain is
being used. eg: Trying to use HCL (hashicorp configuration language - used in
Terraform) as if its turning complete. (and i've seen this before)

I've not run into an issue where manual steps are required to cleanly deploy
infrastructure which I can not automate away. I can agree on dependency
chains, where you may need to run your infrastructure deployment in sequence
so that eg your network is up, before you provision instances.

All this being said, I mirror my production environment in staging. So 99%
issues are found there.

I think we are mostly on the same page, I just took "scared" more literally
than you.

------
ofrzeta
The missing piece in all of this is the possibility of a clean rollback. If
all infrastructure could be described "as code" and is version controlled in
Git or something you could roll back. That doesn't take into account the
omnipresent state in databases and so on. I have yet to see a system or
environment that doesn't save state in databases or similar systems.

~~~
jnsaff2
Well, we are moving towards more statelessness in infrastructure, meaning the
state is only stored in databases and the rest of the infrastructure does not
(or should not) contain any state. So you can have immutable servers which you
can replace and upgrade/downgrade at will with much less worry about state.

You still need to care about the databases but it is much more confined.

------
ofrzeta
also related: Testinfra
[https://testinfra.readthedocs.io/en/latest/](https://testinfra.readthedocs.io/en/latest/)

------
wgjordan
How does this tool compare to Test Kitchen [1] generally, and kitchen-
terraform [2] specifically?

[1]: [https://kitchen.ci](https://kitchen.ci)

[2]: [https://github.com/newcontext-oss/kitchen-
terraform](https://github.com/newcontext-oss/kitchen-terraform)

