Hacker News new | comments | show | ask | jobs | submit login

If you'd consider something other than CloudFormation, there is also Hashicorp's Terraform. It has an AWS provider (https://terraform.io/docs/providers/aws/index.html) which creates resources and maintains the state in a file that you can store in version control (https://terraform.io/docs/state/index.html).



Terraform, as an idea, is brilliant. Mitchell and company isolated a hugely important need and tried to fill it, and I give them all the credit in the world for that. Cross-platform cloud provisioning? Gimme. But I cannot in good conscience not relate what a disastrous experience Terraform has been for me at both jobs and clients.

Writing reusable code in Terraform is an exercise in frustration due to the extreme clumsiness of HCL (which, I understand, was used because "YAML is complicated"--well, that's true, but YAML isn't a good solution either, you're HashiCorp, you wrote Vagrant, you already know how to do this!). The application architecture is reckless and full of race conditions; your state will be hosed if one resource errors out at the wrong time, while other resources are being successfully updated--the resources that return successfully after the failed resource will on many occasions fail to be persisted to state. What's more, application testing seems to be at best an afterthought: there have been regressions in the providers that will break your existing states.

I would under no circumstances use Terraform if I didn't have clients who had selected it before I was working with them. If in AWS, I would use CloudFormation, with a tool like Cfer[1] (which is excellent, reliable code) or SparkleFramework[2] (which is more full-featured but I hope you never need to debug it) to provision my stuff.

(Full disclosure: I'm building a much, much better provisioner for multi-provider cloud infrastructure. Neither of the projects I recommend are mine; mine's not done yet.)

[1] - https://github.com/seanedwards/cfer

[2] - http://www.sparkleformation.io/


If you're writing your own, you might also look to BOSH[1] for inspiration.

It's older than CloudFormation and Terraform (born 2010). It can manage anything that someone's written a driver for. So far that includes AWS, Azure, vSphere/vCloud, OpenStack, VirtualBox, Google Compute Engine, Apache CloudStack and there might be others I missed.

It stores state in a database. It is able to recover from mismatches between the state of the world and the desired state. Cloud Foundry users have been using it for years to deploy and update CF installations. Pivotal Web Services (I work for Pivotal, in a different division) has been upgrading to the most recent CF release every few weeks, live, without much fuss, for years.

For any kind of heavily stateful infrastructure, BOSH is a strong candidate.

[1] https://bosh.io/


Augh, how did bosh slip my mind? I've never used it in production, but I've used it to roll out a CF environment for testing and was impressed to dig into it a little more (most of a year ago now, I think your mention of CF was actually what kicked that off). From (admittedly limited) experience I'm not crazy about its developer-facing feel, but I appreciate the significant and responsible effort in it.


> The application architecture is reckless and full of race conditions

Honestly curious, can you point to one or two?


I don't have a toy example offhand, but the resource failure case I mentioned is one. If resources A, B, and C are in flight at the same time and A fails, Terraform in some as-yet-undiagnosed circumstances will not record state changes caused by the still-in-progress work on B and C. This happens a lot with SNS queues, IIRC, because SNS queue operations on the AWS API take a relatively long time to resolve. So if, say, you mistyped an attribute for an EC2 instance, it can fail out and Terraform will happily forget that it created an SNS queue for you.

I have a sneaking hunch that the continuing problems with template-file resources (complaining that the "rendered" attribute doesn't exist in dependent resources) are related to this, but can't prove that; my clients don't pay me to debug Terraform, but to get their stuff working, and that doesn't leave much time to get in-depth with it now that I've decided not to use it for my own purposes anymore.


> Full disclosure: I'm building a much, much better provisioner for multi-provider cloud infrastructure. Neither of the projects I recommend are mine; mine's not done yet.

One of the convenient things about software that doesn't exist is that it doesn't have any bugs.

Let your software speak for itself when it exists; until then, this seems an undeserved critique of software, and a team, that is solving problems every day.


This is a crazy sentiment. I was just considering checking out Terraform, and I'm really glad to have read the previous commenter's experience.


I've been using Terraform for over a year, maintaining a standard 3 AZ load balanced production cluster.

HCL has improved dramatically, and now that template strings are a thing, most of my variable interpolation issues are solved. However you still can't specify lists as input variables so you frequently have to resort to joining and splitting strings. It's hackish and worse, changing one value in the list will invalidate all other resources that use the variable.

Race conditions and dependency cycles are still a problem. Particularly with auto scaling groups and launch configurations -- I have to migrate them in two steps (create then destroy) to avoid a conflict. Same with EBS volumes, I ended up scripting my instance to attach the volume by itself, otherwise there's ordering issues when destroying and replacing.

There's also missing features, such as the ability to create elasticache redis clusters and cloudformation resources.

I'm still glad that I went with Terraform though. It takes a good amount of time to get around the limitations and bugs, which can be really frustrating, but when it works, it works beautifully.


Strongly agree. Support for new AWS features hits Terraform much faster than CloudFormation (still waiting for CF support for AWS's managed ElasticSearch service that was unveiled two months ago--Terraform got it right away). Some of the critiques below are true... HCL is fine, but Terraform's interpolation syntax has a long way to go. That said, CF's JSON is way more painful to deal with. As for the other problems, they go back mainly to someone using a tool they don't fully understand. Yes you can get into some odd states in rare cases, but Terraform gives you the ability to rapidly build and tear down your infrastructure over and over if necessary to work out details and you have fine-grained control over which pieces are built how. Not only that but you can inspect the logs to see what's happening and if there are bugs, you can fix them yourself because the tool is open source and free. CloudFormation gives zero visibility, no fine-grained control, and it's completely opaque and where it's broken, you can't fix it.

Terraform is relatively new and improving rapidly. It has its problems, but it's light-years beyond CloudFormation. It's clear that Amazon doesn't place a high priority on making CloudFormation easy to use, or to support new features. The right approach to any problems with Terraform is not to spread FUD about it like below, but to contribute code fixes.


> HCL is fine, but Terraform's interpolation syntax has a long way to go

Oh, HCL is fine, you say so authoritatively? Well then do me a solid and show me an if statement, show me a for loop. Because you're not building nontrivial, reusable infrastructural modules without logic. I know. I've tried. I've committed, between different projects and clients, somewhere around ten thousand lines of Terraform and probably half are copy-paste garbage because HCL is so crippled a tool.

It hurts me to say this at a deep and visceral level: Terraform's interpolation syntax makes freaking Ansible and its "no, really, it's totally cool, string templates for logic are awesome" look good.

> The right approach to any problems with Terraform is not to spread FUD about it like below, but to contribute code fixes.

Spread FUD? Oh, no no no, you can take your assertions of FUD and insert them somewhere uncomfortable, thank you very much. I wrote Terraframe[1] specifically to contribute back to the Terraform community, to make it better, and stopped (to create a different project) because I was stymied. By no documentation, by HCL <-> JSON not actually working, and by no interest from the developers in any sort of dialogue about actually fulfilling the promises they themselves assert for their software. Between this and bugs that a trivial testing framework should catch (Why are you validating AWS resource names differently between point releases? Why are you changing that validation to be wrong? Why are you breaking my existing states when you've done this? Why did your tests not catch this before you pushed this out to your entire userbase?) I cannot take the project seriously as a tool for being used in infrastructure I care about. Because I don't trust them to take Terraform seriously, either.

[1] - https://github.com/eropple/terraframe


I believe you can hack an if statement by doing a length, substring and equality comparison to make it equivalent.


I'm a big Terraform fan, but I really don't like HCL and its limitations. I ended up writing a PHP "SDK" of sorts that generates JSON that Terraform consumes [1]. It uses the AWS SDK for some things (like listing all available AZs in a VPC), and provides some macros. I made this for use at work, and it powers a few production sites for a large company.

There's still a lot to do to make it ideal for public consumption (like writing docs and freezing the API), but it'll get there sometime soon. PRs are most welcome.

[1] https://github.com/ameir/terraform-php


I second the Terraform suggestion...my team loves it. But we've found storing state in version control to be clunky. Storing state remotely in Consul has been less problematic for us, though S3 would also work for those that don't have a running Consul cluster.

What I love most about Terraform is that we can include the output of terraform plan in pull requests that make infrastructure changes. Then our continuous deployment process runs plan again and requires an identical output before running apply. This both makes it easier for team members to review changes but also ensures that we don't accidentally destroy infrastructure, which is really easy to do with a lot of these infrastructure-as-code tools.

The other thing that Terraform has going for it over CloudFormation is for hybrid cloud deployments, since it can provision infrastructure in vSphere and OpenStack as well as AWS.


Can you go into how you're using consul with terraform?


We're using Consul to store the state remotely (see: https://terraform.io/docs/commands/remote-config.html). In a nutshell, it just stores the JSON it would have stored in the tfstate file in a key in Consul instead. In addition to being easily available in a shared location, this allows you to leverage Consul's features (ACLs, watches, etc) to improve the process of making infrastructure changes.

Stuff we've thought of but haven't gotten around to yet: - Build relatively simple tooling around terraform and Consul to acquire a lock before running apply...we haven't gone to that length yet since only our continuous deployment environment has credentials to mutate production and it runs builds of the infrastructure project sequentially. - Watching the Consul key where the tfstate is stored for changes to kick off sanity checks to ensure that everything is still healthy.

They're both so flexible that there's probably other ways in which they'd work well together that we haven't thought of yet.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: