Infrastructure is made up of many parts. A single S3 bucket may have multiple re...

rubiquity · on Sept 16, 2021

We need to be creative and make types relevant to the resources being modeled. If the infrastructure database is just a bunch of string identifier fields then it isn't very helpful. You have to find good abstractions and model the resources in types that bring meaning to the data.

orf · on Sept 16, 2021

Sure, but that’s got nothing to do with SQL and could be modelled in terraform. Or better yet, indirectly using terraform via IAC providers in languages like Typescript.

Show me a proper example of creating an actual s3 bucket that you’d use in production. KMS key, inventory configuration, resource policy, lifecycle policy, logging enabled. Created via SQL.

Now show me how you’d take this and make it a reusable module so we can have a standard template for buckets that everyone can use.

rubiquity · on Sept 16, 2021

You’re focusing too much on the initial creation rather than on going maintenance and evolution. SQL and relations are much better suited to handle evolution by enforcing constraints than a graph of stitched together pseudo JSON.

orf · on Sept 16, 2021

No, I’m not. There isn’t a difference between creating and ongoing maintenance- it’s the same thing. You describe your state, something reconciles that. How you describe your state and the dependencies between resources is absolutely key, and on the face of it it looks like this interface is totally inadequate.

So, again, show me even a brief sketch of how you would describe what I said above with this model, and you’ll see it quickly falls apart.

rubiquity · on Sept 16, 2021

You're coming off a bit worked up, but I'll humor you anyway.

I'm not going to type out a bunch of SQL as an example. SQL vs HCL isn't the point and they basically break even on expressiveness. After you've typed out your pseudo-JSON, what exactly are the existing tools saving you? From having to use some wrapper around the cloud API? That's the easy part.

By overly focusing on SQL you're missing the forest for the trees. The point is relations and RDBMS features such as constraints, triggers, stored procedures. Such a platform would be always online rather than just a tfstate file waiting for humans to munge it. It's also time to stop thinking about the cloud as literal resources like current tools do and start moving to more abstract concepts (more on that later).

> No, I’m not. There isn’t a difference between creating and ongoing maintenance- it’s the same thing.

I run very critical infrastructure for a living and there absolutely is a difference. Creating new resources is easy -- they aren't being used and the world doesn't have any expectations for their performance or reliability. The bacon is made in evolving existing infrastructure without impacting user experience negatively. Terraform and other such generators give you very little in guard rails or help and silly outages happen all of the time because of it.

Database engineers have been creating sophisticated execution engines for decades. The creator of SQLite cites treating every SQL query as its own program to be ran by a byte code VM as a key design decision. Writing Terraform or what have you is like programming in an AST (sorry Lispers). To date query execution engines figure out how to manage on-disk structures. There is no reason they couldn't be creating smart plans for infrastructure changes at a much higher abstraction level than "glue this KMS key to this bucket."

orf · on Sept 16, 2021

I think we're arguing two different points here. I'm not arguing that a theoretical better system exists that can plan infrastructure on a much more intelligent level than current public tooling does, or that you can't muck up terraform applies and cause downtime. I'm arguing that version controlling a bunch of SQL statements to describe your infrastructure feels wrong, outside of a few snappy short examples, and that this more intelligent system you're describing is in no way tied or related to using SQL as a DSL.

You have some infrastructure that you want to exist in a certain state. I used a bucket to describe something that seems simple but often needs a lot of other related infrastructure to exist that is configured via potentially a lot of different API calls. These need to be created in a particular order, and any changes to those resources need to be reconciled in a particular order. Replace "bucket" with anything else that fits that, which is absolutely everything.

Ok, so how do we describe that with SQL?

   INSERT INTO aws_kms_key VALUES (nothing?) RETURNING something?
   INSERT INTO aws_bucket (bucket_name, kms_key_id) VALUES ("bucket-name", (SELECT kms_key_arn FROM aws_kms_key WHERE what = ....?))
   INSERT INTO aws_bucket_policy (bucket_arn, policy) VALUES (???, make_policy_somehow?())

Looks terrible, not least of all because naively you'd assume it would create buckets every time this is "run" or because you'd need to add variables and suddenly you're writing TSQL. Great. Now the terraform would be quite verbose, sure, but you'd encapsulate it in a module so all you'd need is:

   module "s3_bucket" {
      name = "bucket-name"
      kms = true
   }

And that's way more descriptive than a large number of SQL statements to do the same thing. Sure, you could argue that you could write the following magic statement:

   INSERT INTO cool_custom_bucket (name, kms) VALUES ("bucket-name", true)

But how would a user create their own abstractions? What if I want a custom policy for all our buckets? Would we write some trigger that updates a policy for all buckets that are inserted? Would we write some incomprehensible stored procedure that does some of this?

Ok, but lets forget all that. Imagine it works and it's fluid and it's basically terraform but in SQL etc etc. So now we want to make some changes and roll them out. How does SQL help us here? Do we open a merge request and say "UPDATE s3_bucket WHERE name = "bucket-name" SET kms = false"? Do we edit the insert statement above? Do we just ignore versioned infrastructure entirely and open some pseudo-terminal and smash in "update production set down=true"?

Seems horrible. If you're imagining something entirely different level of infrastructure management needed only by google-scale companies that can only be done with SQL then you might need to elaborate on why exactly that is, because I can't see the connection. Why couldn't this theoretical system work with infrastructure managed by a DSL like Terraform? Why couldn't you fork Terraform to make more intelligent decisions about large-scale updates, or manage state in a different way whilst keeping the DSL?

And, as a side note, I rather appreciate the statefile being a plaintext file. It's much more flexible. But it can store state in a database as well.

yawaramin · on Sept 19, 2021

Why are Terraform modules convenient while stored procs are horrible though? They're doing the same job? E.g.

    call s3_bucket('bucket-name', true);

Is not really all that different from a Terraform module.

And the rest of your objections boil down to 'use in-code database migrations', which app developers have been doing as a best practice for a long time now.

orf · on Sept 21, 2021

There are two decoupled parts to IAC - describing the state you want and reconciling that with the actual state of the world. Reconciling can be advanced as you want: you could take into account SLAs, perform progressive updates, or just ignore that and remove/create resources at will. So even if you used SQL and inserted it into a database, it's just a representation of state that needs to be reconciled in a separate step.

The problem is that "use in-code database migrations" is very very different to this problem. With database migrations you have a previous state and you manually code the steps to progress that state: add a column, then copy the data, then delete the old column, then do this, then do that. You do the reconciliation yourself.

And this is obvious when you think about it - imagine you did structure IAC like migrations, when you wanted to bring up some fresh infrastructure what happens? You would create all these resources, then modify them, then delete them, then create some new resources as a linear history of operations. Not good, very brittle.

So you need to decouple "what you want" from "how you get it". You could indeed use SQL to describe "what you want", but what does that really give you over using a language like Python/TypeScript/HCL to do it? It's not expressive enough and even things like conditionals are hard, and lets not even talk about string templating. Lets try and model the `s3_bucket` procedure with Postgres:

   CREATE PROCEDURE s3_bucket(bucket_name, bucket)
      LANGUAGE plpgsql AS
      $$
      DECLARE
         kms_arn string;
         bucket_arn string;
      BEGIN
        INSERT INTO aws_kms_key VALUES (nothing?) RETURNING arn INTO kms_arn;
        INSERT INTO aws_bucket (bucket_name, kms_key_id) VALUES ("bucket-name", kms_arn) RETURNING arn INTO bucket_arn;
        INSERT INTO aws_bucket_policy (bucket_arn, policy) VALUES (bucket_arn, make_policy_somehow?())
      END
      $$;

Of course, because you're describing a graph these returned arn's wouldn't actually _be_ arns, just placeholders that need to be reconciled later. And therein lies a problem - how do you ensure each node in your graph has a unique order-independent identifier? How do you know that you shouldn't need to create the `aws_kms_key` because it already exists, but you do need to create the `aws_bucket_policy` because you just added that `INSERT INTO` line? So you'd need to add a unique identifier to every `INSERT` statement:

    INSERT INTO aws_kms_key (node_id) VALUES ("this has to be unique else everything goes to hell") RETURNING arn INTO kms_arn;

Ugly. What's better about this than:

    resource "aws_kms_key" key {}

    resource "aws_s3_bucket" bucket {
        name = "bucket-name"
        kms_key_id = aws_kms_key.key.arn
    }
    
    resource "aws_bucket_policy" policy {
        bucket_id = aws_s3_bucket.bucket.id
        policy = jsonencode({...})  # or use the IAM document resource
    }

You could also model it with Pulumi:

    const key = new aws.kms.Key()
    const bucket = new aws.s3.Bucket("my-bucket");
    const policy = new aws.s3.BucketPolicy(bucket, make_policy())

Both of these are _much_ easier to grok and more flexible than the stored-procedure soup you will immediately get into if you use SQL.

And at the end of the day... it doesn't matter. You're not even _using_ a database to store your state, you're just using SQL to produce some graph of things you need to reconcile. You could write some huge stored procedure to turn your database state into a terraform/pulumi plan and get it to apply it with no problems.

Given that... what's the point of using SQL when there are better alternatives? Or, if you really hate yourself, go write it in COBOL. Equally as suitable.

yawaramin · on Sept 22, 2021

> You're not even _using_ a database to store your state, you're just using SQL to produce some graph of things you need to reconcile.

Well, that's one way of doing it. The better way, imho, is to actually have a database to store the state. That also solves:

> You would create all these resources, then modify them, then delete them, then create some new resources as a linear history of operations. Not good, very brittle.

When you have an actual database backing your SQL queries, the migrations would all run in this DB, produce a final state of the DB, then this final state would get applied to bring up your system in one shot.

> So you'd need to add a unique identifier to every `INSERT` statement:

Sure, and that's fairly easy to derive, because we already have a unique identifier, the bucket name:

    INSERT INTO aws_kms_key (node_id) VALUES (concat('aws_kms_key', bucket_name)) RETURNING arn INTO kms_arn;

orf · on Sept 23, 2021

> Well, that's one way of doing it. The better way, imho, is to actually have a database to store the state

It doesn’t matter how the state is stored, you still need to reconcile it. So yeah you can write your target state into a database if you wish, but you need something else to compare that with the state of the actual world and progress that towards the state you requested.

> When you have an actual database backing your SQL queries, the migrations would all run in this DB, produce a final state of the DB, then this final state would get applied to bring up your system in one shot.

Yes, what you are describing is how Terraform works. Without SQL.

Except it’s never “one shot” because it’s a DAG of requirements where each resource creation can fail.

> Sure, and that's fairly easy to derive, because we already have a unique identifier, the bucket name

In this specific case, sure. But for a KMS key or in the general case, no you wouldn’t. You’d also be hard-coding provider specific information into your system - “a bucket ARN can be derived from the name attribute”. This is not always the case - consider any REST api that accepts a POST request to create something, but returns only an incrementing integer primary key to uniquely identify it.