Hacker News new | past | comments | ask | show | jobs | submit login
Pyinfra: Automate Infrastructure Using Python (pyinfra.com)
644 points by InitEnabler 7 months ago | hide | past | favorite | 217 comments



Hey all, I'm the creator/primary maintainer of pyinfra! Super excited (a little terrified) to see this on the frontpage, happy to answer any questions :)

I also hang out on the Matrix room: https://matrix.to/#/#pyinfra:matrix.org

Another thing: the GH repo points at currently in beta v3 and the docs for this are here: https://docs.pyinfra.com/en/next (highly recommend starting with v3, I just haven't had any time recently to wrap up the release, but it's stable).


As you can see here, the main question is what are the advantages over Ansible, a mature and the most popular agentless configuration management tool written in Python. So I propose putting this answer right to the landing page


I think I tried to shy away from specifically being "Ansible does this bad so pyinfra does this" and instead focus on the features that differentiate like "Instant debugging with realtime stdin/stdout/stderr output (-vvv).". But it seems like that isn't enough and the landing page needs to be more explicit in comparison. Ty for the feedback!


I applaud trying to be positive and focus on "this is what we do well," but yeah at least some explicit comparison would help. The copy right now is kind of assuming the reader already knows Ansible to compare against as a baseline. Which is probably fair for most people who find your project, but people who find your project are also probably not happy with Ansible and want to know if this addresses their pain points immediately.

Is very interesting though, I think I'm gonna try it myself.


I like Ansible, but that doesn't mean thera are no pain points.

One of them is handling "if-this-then-that-else-that". Being purely declarative, Ansible is horrible at that.

Pyinfra can be used in imperative mode, am I right? This would make the use of if-else a breeze, which would be a really good reason for me to to switch.


Ansible is declarative?

In puppet and saltstack you can declare that a folder is empty and declare a specific file in this folder. The system's smart enough to delete all the files except the one.

To achieve such feat in ansible is hard. Easiest way is to have two tasks, one deletes everything and second recreates your file. Doesn't feel very declarative

Unrelated thing, they don't even try to be declarative in ansible E.g you can have a file with state "touch". It not a state if it updates each playbook run!


You're confusing declarative with idempotent. Ansible is both, it won't change anything if the state is already what you declared. The nitpicked case you chose, you want the file to have the latest timestamp, this is a valid state to declare.


Ansible is neither and it shows. For example, deleting a configuration in Ansible doesn't revert the system back to its original state. That's the definition of statefulness.


The shell task breaks the declarative nature a bit, along with registering task results and then writing conditional whens based on them. Interpolating values based on the registered results does too imho


Plus, what your are declaring is often times not the state you desire but... action you want to take. Say you use `apt.name: [pkg1, pkg2]`, run it, then remove `pkg2` from the list. Running this again won't remove `pkg2` from your system. So it's declarative, but not necessarily on the optimal level all the times.


Ansible tries to be declarative and idempotent, but fails at being both.


ansible-core succeeds pretty well in being both.

ansible modules (built-in and community) are a different matter. generally, quality will vary.


Ansible is neither.


There's a lot of cases where Puppet fails on something similar. Packages is probably the easiest example. Puppet won't remove all package not explicitly declared, nor will it remove files it created, if you remove the Puppet code for managing that file.

I'm fairly sure that the way to make Puppet do what you suggest is the same in Puppet and Ansible. The difference is that Puppet is smart enough to not actually remove you file during every run (I think). On the other hand, Ansible will normally not be configured to run every 30 minutes like Puppet, so it's much less of an issue.

Both tools are great but they work some what differently. How you think about using them is much the same though, you need to tell the computer what to do. In Puppet this is often talked about is if you describe the state, I suppose that's partly true, but in the end you have a series of actions the computer will need to take to achieve this state.


You can use Rsync to to a puppet created folder You’d need to have a source folder with the file which is fine


rsync covers this and should ne used instead of a lot of Ansible tasks


There's definitely a trend in this direction.

Pulumi for Terraform and Dagger for Docker are two examples I use

I like CUE as a language to replace my Yaml that has some of the typical language constructs but maintains the declarative approach


Is performance better than Ansible? I have used Ansible extensively and find it excruciatingly slow.



Hi @Fizzadar and congrats on making this and getting it out the door; kudos!

As you craft your "Why this and not Ansible" content, you might actually state clearly what you already noted on the Performance page, namely: "One of the reasons pyinfra was started was performance of agent-less tools at the time." If I read that, it'd instantly make me want to stick around and read some more, play with pyinfra, etc. BTW, i will be playing with it anyway, but just wanted to point out that you likely won;t need to start from scratch for copy (on a comparison or answering "Why this and not Ansible" content). Cheers!


Excruciatingly slow is an understatement :)


Why being slow is a bad thing? Ansible gives me a legitimate excuse to have proper lunch. ;-)


You're supposed to be writing compilers during that time


That would be appreciated. I saw the homepage and my first thought was "Ansible is python. How are these things different?" Obviously pure python vs yaml is one thing. But beyond that it's not clear. Perhaps are specific use cases in your mind where one or the other is a better fit, and that would be helpful as well.


Stupid question. Ansible is mentioned TWO times in HN hiring thread, it will soon be zero. Isn't this dated tech? Ansible is good at patching servers (have cute names, make sure they are patched). Patching??? But, why not use containers? Aren't we moving away from Ansible/Puppet? I would much rather have AWS/CDK or pythoninfra like this if I was into pet machines.


Someone somewhere has to set up machines so you can do containers. Doesn't name an they are pets.


You mean terraform?


No, before terraform, someone has to boot a machine and slap an image on it or do an OS install, Register the host in some way and have it checking for the terraform.

I use ansible for creating machine images or initial provisioning. (I don't run the ansible, someone racks the host, sets it's build state to install, and boots the host and it joins the appropriate cluster and people do container things. I don't necessarily know when my ansible runs against a host.

I also have a pretty good stack of ansible playbooks that I use manually day to day for hardware validation for new server models and one off type stuff. But again, I never really know what I'm running against or have pet servers.

A good chunk of hardware validation runs automatically if the boot target is set to hw-validate, but the whole point is that you are gonna find stuff that doesn't work with your standard process and either pass on it or adjust.

I do run tf to provision cloud infra so its transparent to the devs, and, honestly, not sure how ansible is dated and tf is not, they are pretty much the same thing in a different coat.

And honestly, generating thousands of lines of conflicting generic yaml isn't really much of an improvement over writing it once and running it automatically on 1000s of boxes.


Is it declarative? Obviously python isn't, but since it's not executed as a script but rather the module passed to pyinfra, it could be, and looks like maybe it is just registering work to (potentially) do on module load?

If so, nice, shout about it more - it's my number one requirement of such a tool, why I think Terraform (or OpenTofu) is great and mostly everything else sucks, and I think it should be everyone's. It's just obviously (at least, once someone makes it available!) the correct paradigm for managing stateful resources and coping with drift.


Yes... and no. It depends on the operation (the docs explicitly state if an operation is _not_ idempotent "stateless operation"). Operations are either:

- state definitions, "ensure this apt package is installed" (apt.packages: https://docs.pyinfra.com/en/next/operations/apt.html#operati...) - stateless, "run these shell commands" (server.shell: https://docs.pyinfra.com/en/next/operations/server.html#oper...)

Most operations are state definitions and much preferred, the stateless ones exist to satisfy edge cases where either the state-ful version isn't implemented or simply isn't possible.


Ah, so this is similar to the Terraform CDK approach?

In Terraform CDK, you use a language like python to compute the set of resources and such you want to have, and then hand that over to the terraform core, which does the usual terraform song and dance to make it happen.

This is actually interesting to me, because we struggle with even the simplest data transformations in ansible so much. Like, as soon as you start thinking about doing a simple list comprehension in python in jinja templating, lots and lots of pain starts. From there, we're really starting to think about templating the inventories in some way because it would be less painful.


Interesting, not heard of CDK before! Kind of similar? As long as the language is Python I suppose! Would be possible to integrate with other languages too I guess, not something I’ve ever looked into though.

Totally agree on templating which is why inventories have always been python code just as operations, giving maximum flexibility (with some complexity/type confusion drawbacks).


CDK just generates AWS CloudFormation files (There is CDK for TF as well as said before, but is not the AWS official implementation). It's like having a few constructs to allow you generate a YAML or JSON file.


Relevant XKCD: https://xkcd.com/303/


I never understood the desire to make things declarative. It seemed to me to always hide what is actually happening and it made it more difficult to understand. Is there a simple way to understand why declarative stuff is desirable to some people?


It lets you separate "here is the final state of the system that I want" from "how to get there".

If a SQL compiler or `terraform plan` command can convert "the current state of the system" + "desired end-state" to a series of steps that constitute "how to get there from here", then I can usually just move forward to declaring more desired states after that, or debugging something else, etc. Let the computer do the routine calculations.

When using a path-finding / route-finding tool, having the map and some basic pathfinding algorithms already programmed in means we no longer need to "pop a candidate route-segment off the list of candidates and evaluate the new route cost"... I simply observe that I am "probably here" and I wish to get to "there"; propose a route and if it's good enough I'll instruct the machine to do that.

If I can declare that I want the final system to contain only the folder "/stuff/config.yaml" with permissions 700 -- I don't care what the contents of stuff were previously, and if it had a million temp files in it from an install going sideways or the wrong permissions or a thousand nested folders in it, well, it would be great if the silly computer had a branching workflow that detected and fixed that for me, rather than me having to write yet another one-off script to clean up yet another silly mis-configured system that Bob left as a dumping ground that I have to write yet more brittle bizarre-situation-handling code for.

Same for SQL and data. "Look, Mr. Database, I don't actually know what's in the table today, and I don't know why the previous user dumped a million unrelated rows in the table.... Can you answer my query about if my package has shipped, or not?"


It sounds like you need a shim between every single dependency to make its setup work declaratively. That sounds like a versioning nightmare to rely on and was borne out in my experience with it.


I think the main reason people like the declarative approach is that done right, it's idempotent. You also don't have to think about the current state of the system at all. You just need to describe what you want it to look like. Of course in practice it can be more nuanced than that, but thinking declaratively can make things much simpler in some scenarios.


These aren't as related as you think. Ansible is imperative and idempotent.


Ansible is only as idempotent as the module it calls though.


Ansibles' resources are declarative [0]. What part of Ansible is imperative?

https://docs.ansible.com/ansible/latest/reference_appendices...


Ansible playbooks are usually a list of steps to execute in order (imperative). Those steps may try to present a declarative interface to what they are supposed to do, but many fail to fulfill the definition of "declarative" that you have linked. E.g. with the built-in modules it is impossible to declare a desired set of installed packages, only a set of packages to be installed _in addition to all already installed packages_. This means it is impossible to remove an installed package again by removing it from the declaration, you have to specify a second step (imperative) that explicitly removes the package. This makes it impossible to declare a final state for "installed packages" with ansible.


This is debating state-management though, which Ansible makes the correct choice about: Ansible largely works the way a user expects when they transition to it from doing things on the command line, and guides them towards idempotency (which is a pre-requisite for declarative configuration).

The problem is to track deletions you either have to constantly have a view of global state (i.e. do you want to put `linux-kernel` in your package list?) or you need to store specific state about that machine (i.e. `redis` was installed by playbook redis-server.yml, task "install redis") - because the packages absence in that list doesn't necessarily mean "uninstall it" if something else in another playbook or task will later declare it should be present.

As soon as you're trying to do deletions, you're making assumptions that the view of the state you have is complete and total and that is usually not the case - and even if it is within the scope of your system, is it the case on the system you're interacting with? Do you know every package that should be installed because it comes out of the box in the distro? Do you want to (aka: do you have the time, resourcing and effort to do this for the almost zero gain it will get you in the short term unless you can point to business outcomes which are fulfilled by the activity?)


> As soon as you're trying to do deletions, you're making assumptions that the view of the state you have is complete and total and that is usually not the case

terraform does this, which is why it tracks the its own representation of the prior global state. So when you remove a declared resource the diff against the prior state is interpreted as a delete. Note this does introduce the problem of "drift" when you have resources that are not captured in the scope of the state.

> i.e. do you want to put `linux-kernel` in your package list?

Yes. At least I want to put something like "core-packages" or "default" or similar as part of setting my explicit intent.


Yes, this is debating state management. For full declarativity some form of state management for the parts of the system that should be under declarative control (like terraform) or a stateless but very holistic view of the system (like NixOS, I guess also Guix System) are needed.

Given that ansible has neither it can't be much better then what it is. I disagree that that is the right choice though. As it is I see not much more value in ansible than in some sort of SSH over xargs contraption combined with a list of servers. The guarantees they give are the same.

> Do you know every package that should be installed because it comes out of the box in the distro? Do you want to [...]?

No, I don't want to. Thankfully, with NixOS I don't need to, since the pre-installed packages are automatically part of the declared state of my NixOS systems (i.e. I declare the wanted state in the same way in which the defaults are also declared, which makes it easy to merge both).


> Ansible playbooks are usually a list of steps to execute in order (imperative)

You can't be declarative all the way down because reality is not declarative.

You can have all modules being declarative but if you need orchestration, it's not declarative anymore unless you create a new abstraction on top of it.

So people keep arguing about declarative vs imperative and fail to specify at which abstraction level they want things to be either.


I agree with you, your declarative abstraction has to have an imperative implementation underneath that will do all the dirty work. Ansible presents this declarative interface at the module level (if the module is implemented properly, most aren't), and a playbook is an imperative list of declarations to be applied. Roles also combine a list of imperative steps into a declarative interface.

Since apparently (I try to avoid ansible, so I might be missing something) playbooks are the go-to approach of using ansible this means that most uses of ansible are imperative (in the context of configuring a system), unless you only ever give a system a singular role and then you are probably defining your role in imperative steps.

A system like NixOS on the other hand presents the entirety of a system configuration in a single declarative interface that is applied in one go, while applying such a configuration to a system can be a thought of as an imperative step (although it is usually a singular, unconditional step). So it is declarative at a higher abstraction level.


I didn't intend to suggest that the declarative approach is the only way to achieve idempotency.


It’s the cattle not pets mindset. In most organizations the sysadmin team is really undersized. Not uncommon to have one admin per several hundred systems. In such places, there is no time to care for individual servers. If a server is misbehaving we blow it away and spin up a clean replacement.

Declarative scripts make it easy to manage a fleet.


I think that's.. perhaps not orthogonal, but has some orthogonal component - you could certainly have something like:

  for i in range(100):
      ip = cidrhost(subnet, i)
      if exists := get_server(ip):
          continue
      create_server(ip=ip)
and so on. I don't like it, but because it's procedural/imperative, not because it's particularly more 'petty' than the Terraform (or equivalent) would be.

For me it's more about what I'm doing, conceptually. I want a server to exist, it to have access to this S3 bucket, etc. - the logic of how to interface with the APIs to make that happen, to manage their lifecycle and check current state etc. isn't what I'm thinking about. (In Terraform terms, that belongs in the provider.) When I write the above I'm just thinking I want 100 servers, so:

  resource "cloud_server" "my_servers" {
    count = 100

    ip = cidrhost(subnet, count.index)
    # and so on
  }
comes much more naturally.


This fails completely even at small scale when the script is interrupted before finishing.

The difference between just using some Python vs Terraform is idempotency. TF isn’t going to touch the nodes the script succeeded on; if you have to start your for-loop script it will, which may not be desirable.

Frankly these days configuration management is a bit dated…

You’re much better off in most cases using a tool like Packer with whatever system you want to bake an image, then use a simple user-data script for customization.

It’s very hard to scale continuous config management to thousands of servers.


Eh, any way you do it could leave it in an unfinished state if interrupted, I'm not too bothered about that. (But it does sound like you think I was speaking in favour of doing it in a procedural python script sort of way? I was not.)

Packer and Terraform do different jobs (they're both by Hashicorp!) - you can bake an immutable image all you like, you still need to get a server, put the image on it, give it that S3 bucket it needs, IAM, etc.


They work together to produce immutable cattle. The alternative is managing a pool of servers where you are doing things like in-place patch upgrades, vs a teardown of the old infra and replacing it with the newly baked servers.


I'm well aware, I just don't see what 'use Packer' has to do with choice of programming paradigm for Terraform or other tool in that role.


Have you been introduced to functional programming? It's excellent and mind-bending at first. Here's an overview: https://github.com/readme/guides/functional-programming-basi...

Declarative structure is at the heart of functional programming. Declarative is not the right choice everywhere, but when it makes sense, it can significantly raise the quality of the code.


and yet declarative can be somewhat unhelpful in the face of mutable filesystem


Conceptually I think it’s much nicer to define the state of the system rather than the steps to get there, and tool of choice figures it out.

But there’s always edge cases and situations that doesn’t work which is why pyinfra supports both and they can combine any way you like.


Because what most people want is actually something closer to "Goal Seeking." If the system works as intended (and as you point out with the need to debug is often does not!) then defining the desired end-state and letting the system figure out how to get there, is a simpler, higher order abstraction. And it can also often be clearer to just say "ensure these prerequisites are met" such that alternative implementations can achieve the same outcome. In practice, abstractions are leaky.


It makes people feel that they're smarter than you. See also functional programming. That said sometimes it's useful as s way to auto generate imperative actions.


I really can't see how you could feel that way about it after spending even just a few minutes (which it sounds like you have) to understand what it means beyond just reacting to terminology, something having a name.

I definitely think it'd be easier to explain a python-like declarative language to someone who asks what programming is than actual python. 'It's just describing the way things should be' vs. 'it's like a series of instructions for how to compute ...'

Certainly not more clever IMO, if anything the opposite. Like I said above or elsewhere in this thread, when I'm managing infrastructure with Terraform I don't want to (and don't have to) be thinking about how to interface with the API, check whether things exist already, their current state, how to move from that to what I want, etc. I just know the way I want it being, I declare that, and the procedure for figuring it out and making it so is the provider's job. That's not smarter! The smart's in the provider! (But ok if you're going to make me flex, I've written and contributed to providers too... But that's Go; not declarative.)


> Obviously python isn't

Obviously you can create declarative idioms in Python


Sure, if you keep reading I described one that it looked like this might be doing.


Oh man this is really cool. I have also written a Python infrastructure-as-code project (https://pages.micahrl.com/progfiguration/), I really like the idea of using a programming language rather than a text document to define infrastructure. Yours looks very polished, and the built in support for testing in Docker is a brilliant idea.


dang this exploded. I came across the project this morning when I was looking at a blog on how to implement a generic programming language to become a configuration language and it mentioned pyinfra. Glad this project is getting some exposure. :)


What’s the difference between pyinfra and fabric? Fabric seems to have overlaps especially for agentless execution.


How does it compare to Fabric? At first glance it looks quite similar. All our scripts are written in Fabric, but Fabric appears to be somewhat abandoned and the latest version never reached full parity with v1. I'd be looking to try something new next time.


How does it compare with pulumi?


Almost exactly as it compares to terraform, since both TF and Pulumi only get down into the shell of any provsioned virtual machine via "connect and run some shell, good luck". I'd guess it would also be horrifically painful to even do that in circumstances such as Auto Scaling Groups, where even TF and Pulumi don't know the actual IP or InstanceIds

The way TF and Pulumi traditionally think about this problem would be to use cloud-init/ignition/Cloudformation Hooks to cause the machine to execute scripts upon itself. Ansible also has an approach do that via "ansible-pull" which one would use in a circumstance where the machine has no sshd nor SSM agent upon it but you still want some complex configuration management applied post-boot (or, actually even if they do have sshd/ssm but there are literally a hundred of them, since the machines doing the same operation to themselves is going to be much less error prone than trying to connect to each one of them and executing the same operations, regardless of the concurrency of any such CM tool)


[yet another reference to Ansible, sorry! :)]

This looks like infinity times better than Ansible in some cases and somewhat worse in others (python.call every time I'd need to access a previous operation's result feels clunky, though I certainly understand why it works that way).

Do you think it would be possible to use Ansible modules as pyinfra operations? As in, for example:

  - name: install foo
    apt:
      pkg: foo
      state: present
could be available as:

  from pyinfra import ansible

  ansible(name='install foo').apt(pkg='foo', state='present')

where the `ansible` function itself would know nothing about apt, just forward everything to the Ansible module.

Note 1: I know pyinfra has a way to interface with apt, this is just an example :) Note 2: It's just my curiosity, my sysadmin days are long gone now.


Definitely possible! Not familiar with the ansible Python API so partially guessing but the pyinfra op could yield a callback function that then calls ansible at execution time.

Alternatively you could just yield ansible cli and execute from the local machine using the @local connector.


FWIW, ansible modules (all of them, to the best of my knowledge) operate via a stdin/stdout contract since that's the one universal api for "do this thing over (ssh|docker|ssm|local)". That's also why it supports writing plugins in any language (shell, compiled, python, etc) since `subprocess.Popen().communicate(b'{"do_awesome":true}')` works great

DISCOVERING the available ansible actions is the JFC since, like all good things python, it depends on what's currently on the PYTHONPATH and what makes writing or using any such language-server some onoz

And this wasn't what you asked, but ansible has a dedicated library for exec, since the normal `ansible` and `ansible-playbook` CLIs are really, really oriented toward interactive use: https://github.com/ansible/ansible-runner#readme


What is different in v3? Didn't see it in the "Next" docs.


Mostly this from the 3.x changelog:

> pyinfra now executes operations at runtime, rather than pre-generating commands. Although the change isn't noticeable this fixes an entire class of bugs and confusion. See the limitations section in the v2 docs. All of those issues are now a thing of the past.

https://github.com/pyinfra-dev/pyinfra/blob/3.x/CHANGELOG.md


I current use Ansible to setup both local and remote hosts. I've been very happy with it, and love that Pyinfra intends to support the Ansible connector.

My main gripe with Ansible is the YAML specification. Ansible chooses to separate the task specification and task execution. Pyinfra chooses to directly expose the Python layer, instead of using slightly ugly magic functions/variables. I like this approach more since it allows standard Pythonic control flow instead of using a new (arguably ugly and more hassle to maintain) grammar.

Excited for Pyinfra!


I'm only using Ansible because of its extensive documentation and mindshare, but my best successes with it were when I let go of the idea that the playbooks specify state "declaratively". I now treat them as imperative steps where each step is being checked as to whether it needs to be done or not, and it has vastly simplified my mental model of what Ansible is actually doing.


I think of ansible as a declarative-imperative lasagna, where each playbook is a desired state, achieved by an imperative sequence of plays, which themselves are desired states, achieved by a sequence of roles, which have the same properties, and then tasks too below that, finally resolving to plain old imperative Python.

It's all pretty messy but useful.


I never grokked this “plays” and “roles” business. All in all, this clever and cute terminology gives me creeps. I only use “playbooks” as series of tasks, more or less.

Maybe I need an explanation “like I’m just a programmer/sysadmin and I need to use boring terms years old” of what is what, every explanation so far (when I bothered to look for it last) was too invested in this theatrical terminology, so I gave up and stuck to what worked after a command or two.

Same with Chef and its galore of cooking words, but thankfully I don’t have to use Chef.


To this day I'm miffed that Chef has "cookbooks" which contain "recipes," which contain... "resources." Why not "ingredients??" It was right there!


> Maybe I need an explanation “like I’m just a programmer/sysadmin and I need to use boring terms years old”

The issue is, Ansible was written for sysadmins who aren't programmers. There is no good explanation, other than it's a historically grown, syntactic and semantic mess that should've been barebones python from the get go.

It is not idempotent. For example, how can I revert a task/play when it fails, so that my infra doesn't end up in an unknown state? How do I deal with inevitable side effects that come from setting up infra?

People will now refer you to Terraform but that is imo a cop out from tool developers that would much rather sell you expensive solutions to get error handling in Ansible (namely RedHat's Ansible Automation platform) than have it be part of a language.

But to give you a proper explanation: Plays define arrays of tasks, tasks being calls to small python code snippets called modules (such as ansible.builtin.file or ansible.builtin.copy). To facilitate reuse, common "flows" (beware, flow is not part of their terminology) of tasks are encapsulated into reusable roles, although reusability depends on the skill of the author of the role.


Ansible is useful but so confusin (to me anyway).

The way I see roles vs playbooks is whether I’m going to reuse it or not.

Roles are more generic playbooks in a sense that I can share with others or across deployments (for example setup a reverse proxy, or install a piece of software with sane, overridable defaults.

I can then use roles within playbooks to tweak the piece of software’s configuration. If it’s a one-off confit/setup then I’ll use a playbook.

I don’t know if it’s the right paradigm (I don’t think it’s explained well and clearly in the docs), but using this rule of thumb has helped me deal with it.

Of course, any role can be a playbook and vice versa since they do the same thing functionally, it’s all about reusability and sharing.

Kinda how you have libraries in software: role = library, playbook = the software you actually want to write.


An Ansible playbook is usually the main entrypoint, it consists of a list of plays. Each play has hosts and a list of tasks to run at them.

Because people wanted to reuse their code, the abstraction of roles was created. A role is something like „setup basic OS stuff“, „create this list of users“, „setup a Webserver“, „setup a database“. The association, which roles to run on which machine still happens in the playbook.


I'm using include_tasks: and import_playbook:, like an animal :)


You can't share a set of tasks on Ansible Galaxy without wrapping it in a role


My biggest problem with Ansible is the YAML, doing anything with loops is horrendous & trying to mangle nested variable types requires a StackOverflow post every time.

A few years ago, I found a library that lets you utilize Ansible's tasks in raw Python, without the huge hassle of using the Ansible Python API. I cannot find it again however. But PyInfra looks great.


This alone is the entire reason I started working on pyinfra, loops in YAML is just evil.


Why did you choose to roll your own modules rather than do what's described in the comment you replied to, i.e. provide a Python layer for interacting with the rich set of available Ansible modules?

Not trying to be rude ofc, I'm sure you considered it and have a good reason – just curious as of what it is. An incredible project you put there, nonetheless:)


Not rude at all :) When I first started (not sure if this is still the case?) Ansible would push Python code to the target machine and execute there, meaning it wasn’t actually agentless. I always thought of pyinfra as copying what a human would do if configuring a server by hand over SSH, so new modules that use only shell commands were needed.


I recall the Ansible Python API to be labeled as Interal Use Only and subject to change on a whim because of that. That at least discouraged using ansible in that way.

Seems they still kinda discourage it but do have examples at least.

https://docs.ansible.com/ansible/latest/dev_guide/developing...


It could be interesting if you could write a translator to use any Ansible module with this, and vice versa.


But you can just write a small module in Python, have it do the looping logic for you, install it at the root of your project's configuration-as-code repository, and then use the module in the YAML, removing the need to do complex, ugly loops in YAML.

Is there a reason this isn't an option for you?


Real Python instead of templating (Jinja in YAML) would be nice.

In Ansible, it's fairly arduous to try to reshape data from command outputs into structures that can be used in loops in other tasks--especially if you want to merge output from multiple commands. Main usecase is more dynamic playbooks where you combine state from multiple systems to create a new piece of infrastructure.

I think templating yaml or templates inside yaml is a bit of an anti pattern.


Related:

Pyinfra automates infrastructure super fast at scale - https://news.ycombinator.com/item?id=33286972 - Oct 2022 (37 comments)

Show HN: pyinfra v2 - https://news.ycombinator.com/item?id=30999030 - April 2022 (2 comments)

Pyinfra v2.0 Released - https://news.ycombinator.com/item?id=30973976 - April 2022 (3 comments)

Show HN: Pyinfra v1.4 - https://news.ycombinator.com/item?id=26983266 - April 2021 (3 comments)

Pyinfra – automate infrastructure super fast at scale - https://news.ycombinator.com/item?id=23487178 - June 2020 (64 comments)

Pyinfra v0.3 - https://news.ycombinator.com/item?id=13862942 - March 2017 (1 comment)

Pyinfra v0.2 - https://news.ycombinator.com/item?id=12956784 - Nov 2016 (2 comments)


I just started using Pyinfra to wrangle a bunch of servers and it is a breath of fresh air compared to Ansible. I moved all of my server OS installs to Fedora CoreOS which doesn't ship with Python in the OS and since Pyinfra doesn't need Python on the host node I can kick off tasks in bulk to do server things. It is great. I cannot wait to see where the Pyinfra project goes.

On a side note, one of the most hacky things I came up with to get Ansible working on Fedora CoreOS was to bind mount a container rootfs that had python 3 and then symlink it into the right spots. You can of course add Python in with rpm-ostree if you want but I wanted to avoid layering packages at the time. I wasn't proud of it. But it worked.

https://github.com/forem/selfhost/blob/main/playbooks/templa...


Doesn’t IBM/Red Hat own Ansible and Fedora CoreOS? I would think they would mix together perfectly.


> since Pyinfra doesn't need Python on the host node I can kick off tasks in bulk to do server things.

And you can do this with Ansible, too. Check out the raw module/command.


I am aware of the raw module. The stuff I was doing with Ansible and Fedora CoreOS required more than just that module.


Couldn't you use the raw module to get Python into place and then use the rest of Ansible's feature set after that?


I think Puppet hits the sweet spot in this area. It's default is a series of idempotent "here's how this should be configured" statements, but it can be used as a full programming language in its more advanced capacity, and it's reasonably extensible (in Puppet-lang and Ruby) to support specific custom applications.

I also think that the facts/manifest/apply separation is conducive to nicely testable infra code, and useful dry-run output.

I'm always surprised that Puppet isn't still more popular. My theory is that it's passed over because of its age/cruftiness/bad vibes in some cases, and that a couple of technical flaws mess it up for some key userbases:

For folks who just want a quick-to-start management tool for a small set of config, Puppet's ugly and clunky client/server model and the hyper-YAML-ification of its best practices (which is pursued to a fault by the community, and not helped by the Hiera pitch that the Puppet stack can also be sort of an asset tracking/catalog system) make small-scale usage and prototyping hard. Puppet doesn't have to be used that way (it can be used just like pyinfra/Ansible with a local-apply or via Bolt, hitting a nice sweet spot between ad-hoc/non-idempotent commands and nice declarative/idempotent Puppet code), but I think the puppetmaster/hiera-all-the-things legacy in the community does Puppet and potential new users a disservice.

From the other side, I think a lot of more cloud-oriented users looking for a "better Terraform for server state" end up annoyed by the quality of modules on the Puppet forge and Puppet's lack of a statefile equivalent (meaning that it doesn't support deletes or infrastructure state snapshots in the same way TF does).


Adding config management agents to run on your infra is IMO unnecessary operational burden. (ie puppet, chef, saltstack, etc.) In the day and age of everything running on Docker, the closer you are to a bare OS image, the better.

Config management that uses SSH is generally good enough.


I agree; that's the "client-server legacy" that I mentioned in GP.

It's unfortunately not widely known that Puppet can be run just like you describe, over SSH (or, for e.g. running in a Docker container, can be invoked as a one-shot "puppet apply" against a local configuration file like pyinfra's "local" transport): https://www.puppet.com/docs/bolt/latest/bolt.html. Doing that requires no background daemons, puppetmasters, cert-signing hell, inventory management PuppetDB/Foreman stacks, or any of that stuff: you run a command which SSHes to a remote/local machine and applies changes based on instructions written in Puppet-lang or one-off scripts. The remote end is entirely self-hosting; it doesn't rely on anything being running on the remote host (Bolt will install the "puppet-agent" package to bootstrap itself, but in this context that package is inert and is used equivalently to a library when you run tasks).

I'm with you that the agent-based approach is far from the best way to go these days. I'm just bummed that we're throwing the baby out with the bathwater: I wish Puppet-the-language and Puppet-the-server-management-tool weren't so often dismissed along with the Puppet-as-inventory-system or Puppet-as-daemonized-continuous-compliance-engine.


Hard disagree. Having an agent running on things is IMO far superior for preventing config drift (agents checking in versus one big centralized cron job pushing state to everything). And to be honest, the fact that it doesn't play as well with Docker is a flaw with the idea of putting everything in Docker, not having a config agent. Some things work well in containers, but it's silly to try to shoehorn everything into them the way many people do.


Can concur, used puppet a bit at the dayjob and agent issues were common at some point.

Also, for bigger inventories on a single vm runtimes shot up quickly in the hour realm


Yeah, dealing with agent issues sucked; I'm glad I haven't worked on one of those setups in awhile. And if the agent bootstrapped some part of the shell-in-and-remotely-troubleshoot tooling, good luck debugging it, and if the agent bootstrapped the telemetry system, good luck telling the difference between "host with agent failure" and "host that disappeared"... anyway. Fun times.

For hour+ runtimes I really do think that's pretty much always user error. I know that's a clichéd and grouchy comment, but (as, I'll admit, a Puppet fan with some personal defensiveness for a favored tool) I do think it's true in this case.


Ssh and its child processes are just another agent. Agents of a model that must be up at time-of-convergence as seen from the coordinator node; a remarkably inflexible arrangement that can only be addressed with additional development not otherwise necessary.

Ruby is far, far preferable to shell for ease of idempotence and implicit convergence.


I believe it's due to Ruby being its language of choice. Ruby is mostly a dead language in the Ops space, unlike Python.

Having inherited a big mess on Puppet of some people who used the flexibility of Ruby to automate 5 datacenters, but then left the company was also an interesting experience..


Another reason for puppet being less popular is lots of places ended up with very complicated configuration that did everything on the server but was hard to work with.

Ansible you could deploy a small playbook that did just one thing. A lot easier to get started with and keep under control.

As others have mentioned puppet was also a lot less useful when server images can per-configured and often short lived. It was more designed to take a bare OS install and turn into a long-lived server.


I was an early adopter of Puppet back when it was fairly new. It was a breath of fresh air when the state of the art was cfengine!

Despite its many great ideas, I never particularly liked the agent or need for a master server. And I've always managed to avoid learning Ruby so I couldn't easily hack on it myself. The company I'm with now uses it extensively so I'm having to re-learn it and so far my impression is that it went from "cool new open source thing" to "your average enterprise-grade bloatware thing".


Usually when I use these types of tools I'm building immutable infrastructure where a golden image gets built and an existing app data volume gets attached to a new OS image (same workflow as Docker containers but more access to kernel/hardware)

Puppet doesn't work well for that. I've seen it come up in auditing scenarios since the agent can effectively report if the instance is still in the correct config state.


This will tie nicely into my favorite way to deploy services these days:

1. Use PyInfra to set up Docker and Tailscale on remote hosts and any other setup. Open the Docker port to your Tailnet.

2. Use the Docker provider for Terraform to set up and manage containers on those hosts from your dev machine or from a CI/CD tool. Tailscale allows containers on different machines to communicate privately, or you can open a port to the web.

This makes for such an easy-to-use and bulletproof setup. In the past I would have used Kubernetes but I've come to realize that's overkill for anything I do and way harder to debug.


This kind of setup is a nice improvement over golden images with a lot of the benefits. Application setup, upgrades, and rollback become much easier when the whole app is packaged together and has its own copy of dependencies.

You can also throw in systemd units for Docker or Podman. I usually create a small shell script that pulls, removes any old container, then runs a new container with correct args in the foreground and toss that in a simple systemd unit


is there a blog post or github repo with more info on how you do this?


No but I'll think about writing one up!


Why not go for headscale?


Agree with those saying the landing page needs work. But terraform/docker integration sounds interesting.. after many years of ansible you’d think there is a more comfortable way to replace a hundred lines of hacky bash in dockerfiles.

Also, can I just say that cm is extremely frustrating? Not sure this is the fix, but hopefully the story isn’t over. In my experience the maintenance of cm codebases never, ever stops. At first I thought it was a matter of expertise, but experts typically agree and just call it the cost of doing business.

Shelve something for three months and it will break on the next run, on the same os/host where it used to work. Blame the package manager, blame the os choice, or the cm tool. But it’s embarrassing and insulting for Devops teams after putting in the effort to do things right, and evangelizing to everyone else about repeatability. I’d rather just see tighter integrations with containers moving forward and never think about it again. Not everyone is using k8s but in the 2020s everyone probably should default to using docker before doing things of even marginal complexity directly on hosts.


> Agree with those saying the landing page needs work.

Any & all feedback much appreciated! It's basically just a very rough copy of the README at the moment.


Thanks for making Pyinfra.

It's one of the tool that get out of your way and let. you get the work done. The tool works for you instead of you fighting with the tool.

Pain point of ansible: storing state and checking later, coordinate state between server is all a breezy with Pyinfra because you write the Python code to perform those check.

The system is very well though out. No need to hack around host file, inventory is just a python script that export resource definition.

No more static, ad-hoc host var, you get a real python script to define and return your variable.

Using pyinfra I was able to focus more on the "compute". the state such as credential, inventory can managed and store outside such as in SSM or just call python ec2 api to filter instance by tag.


Ansible needs a working Python interpreter on the target machine.

Pyifra doesn't even need that. Just needs a shell.

Subjective opinion but it is heavily under recognized piece of software.

Ansible is really great but you soon end up writing Python in Yaml strings.

So why not straight up Python?


As an FYI, "needs" is not correct, it has `raw:` for doing anything the target interpreter understands (sh, bash, powershell, etc), which can then include actually provisioning beefier interpreters (full blown python, pypy, whatever)

Ansible plugins can be written in any language, shell, compiled binaries, whatever, and communicate with the control plane via stdin/stdout

I suspect you are thinking of Jinja2 when you are writing python in yaml strings, which ... kind of, I guess, but also confusingly not Python, or at least the hacked up copy of Jinja2 that ansible uses can't do all the fun things normal Jinja2 can


Should this be considered some kind of alternative to tools like Ansible?

Also CDKTF should be in the space for imperative infrastructure as code definitions.

- https://developer.hashicorp.com/terraform/cdktf


cdktf is fantastic


What languages are you using it with? Last time I tried with python the code was super verbose, type hinting suggestions was not happening, both vs code and pycharm… can it be linked by the fact it’s transpiled from typescript?


This is great. We tried ansible and gave up as it was difficult to keep configuration DRY and annoying to create conditions with no control structure.

It was before ansible 2, so probably things are better now.

Then we started using Python fabric. Wow it was so freeing. Any helper methods were easily extracted and writing conditions felt natural.

Now I am using Python invoke to maintain my local setup.


I gave up being religiously DRY in Ansible playbooks early on. It's much easier to open a file and read through a list of simple 2- or 3-line tasks that execute sequentially, than it is to chase down a bunch of imports.

Same as in programming, over-adherence to DRY leads to spaghetti code.


I have tried this path of not trying to DRY everything, but has regretted and refactored to a more DRY approach eventually. The cost of remembering to fix/alter the logic everywhere is more than trying to keep it DRY. More often a method or a module is enough, nothing fancy.

The only place where I have accepted that DRY is not worth it is, unit tests. I used to extract any common behavior in a shared test, but each object will eventually evolve in its own way that the effort to make it DRY will be useless.


Ansible is strong when done right. Check out the tutorial series by Jeff Geerling on YouTube, he's amazing.


May be, but moving to Python did not take anything away. It brought more joy that you have more control over things like, on which server to run the migration and choose UAT or prod and just a list of servers specified in the command line.

And organizing the modules was straight forward as we already knew/did that in the project.

Perhaps, it comes from my programming background, but its true.


I worry about using python for this kind of thing.

It's very hard to be confident about python code.

If you have a good code review feedback loop and so on then it can be OK but proper types enable lots of good things when dealing with configuration and state.


I mean Python has your back with static type hints. While Python's type system isn't the most powerful in terms of expressiveness -- TypeScript is stronger, Go is weaker, it's more than capable enough for a config management system.


Emphasis on hints.

And my point is that it can be way too capable.


I guess the fact that they're hints doesn't really bother me when you're doing static analysis. You can have strong typing with a weak type system like C and Go where the types will be rigidly enforced but they're also not expressive. There end up being lots of things you can't express in the type system which leads you to do things like void* or `any` with manual casting.

But a fully type-hinted Python codebase is extremely expressive, the times where you have to opt-out of the type system is much much rarer and the types you end up writing are much more specific so you get stronger guarantees. It's not without downsides but I don't think it's "because they're hints you can't trust them" since lots of languages erase their types on compilation.


I am not elbow deep into Python ecosystem, but how many python code bases are fully type-hinted?

Maybe I am overlooking because I am not a pythonista, but when looking at this code [1] I see only some superficial hints. Looking at `_make_command`, I need to look inside the body to see that the first argument is expected (?) to be callable (it just ignores otherwise).

____

1. https://github.com/pyinfra-dev/pyinfra/blob/3.x/pyinfra/api/...


It's definitely still in the minority, but you're seeing a _lot_ more newer projects adopting consistent, ubiquitous type hints.

To your point though, the `_make_command` method here is not setting hints in its arguments. I'm not super familiar if this is considered "fine" in a pydantic world, as I found for my usage, native python type hints were more than fine to make my code more usable and safer. Based on the code though, it seems like there are cases where the `command_attribute` is not a callable. What I don't understand is why this isn't hinted as a Union of Callable and whatever other types it could receive. I'd have to spend more than 3 minutes looking at the code base to understand how it's used to get a stronger idea here.


Was there any thought to perhaps do a version with an agent? I really like how fast Saltstack can be as compared to Ansible.

I've been using my own homegrown project that does just this - Python roles, server/client, Mako templates: https://github.com/mattbillenstein/salty

It's very very fast to do deploys on long-lived infrastructure, but it hasn't been optimized for large clusters yet; I expect the server process will be a bottleneck with many clients, but still probably faster than Ansible for most setups.


pyinfra supports executing on the local machine (@local connector: https://docs.pyinfra.com/en/2.x/connectors/local.html). If you store the operation files on the machine that’s basically an agent when executed just without a periodic check for other changes. Adding a mode to do that in a loop would be pretty trivial..


Yeah, I'm talking more about RPC - the server sends a command to the agent - the agent does a thing and returns a response. There's no external sync of the command and given a long-lived connection - client/server what you will - this can all be completed in milliseconds with no new-connection overhead.


Very cool. One question: Can Pyinfra create container-like objects and objects inside them? Example: create an RDS database, create a user inside that database, and assign the user a role.

Terraform cannot deploy such a configuration in a single config, since its planning stage requires that all containers already exist. Terraform crashes when planning the user and role changes, saying that the database doesn't exist. This is a large pain-point when using Terraform. How does Pyinfra handle such deployments?


Python seems like a really poor choice for infrastructure.

- Python is not easy to build into portable binaries

- The package ecosystem is very hard to use in a reproducible way

- The language is not truly typed - types add massive value for infrastructure and scripts because they are less likely to be unit-tested

- The lack of a "let" or "var" keyword makes simple programming errors more likely (again, this code is less likely to be unit-tested)

Maybe I'm missing something? I don't know why I would want to introduce Python in this domain.


Why is it important to be able to build into portable binaries? Pyinfra doesn't require running Python on the machines you manage. Pyinfra basically turns your Python code into shell commands which it runs over SSH. So only your development machine has to run Python.

I think there is not a lot of overlap between people who need to automate infrastructure and people who don't know how to install Python on their development machine.

As for your other comments regarding Python as a language: I mostly agree. I have stepped away from Python as a language to develop production software. In Python I miss the confidence I get from static typing. Having said that, for automating infrastructure, you're effectively comparing Pyinfra and Python to bash scripts and YAML (for things like Ansible), which are both orders of magnitude worse if you like static typing or any form of being able to verify what you wrote.


N=1 I'm capable of handling python on my dev/deploy box(es), but that doesn't mean it's not a pain. In my perfect world, ansible/puppet/chef/whatever would ship as a single static binary even when they mostly ran against remote SSH targets.


Right, but this makes me wonder why I can’t just do: ssh user@host “echo ‘Hello World’”

All these kinds of tools essentially just executing commands over SSH… I could just SSH.


> So only your development machine has to run Python.

If you have a team of developers and a CI process, then portability is important. There isn’t one development machine.


Extremely aware of this (see pyinstaller attempt): https://github.com/pyinfra-dev/pyinfra/pull/768)

I chose Python because it’s what I was writing all day back in 2015. Which makes me realise pyinfra is almost 10!

Edit: I mostly write Go or YAML (k8s) these days but Python still makes an appearance from time to time (outside of pyinfra dev).


Python is an excellent choice.

> - Python is not easy to build into portable binaries > - The package ecosystem is very hard to use in a reproducible way

People use OS packages since 4 decades.

> - The language is not truly typed

The language IS strictly typed.

> - types add massive value for infrastructure and scripts because they are less likely to be unit-tested

99% of errors in deployment are not solved by typing.

> - The lack of a "let" or "var" keyword makes simple programming errors more likely (again, this code is less likely to be unit-tested)

If your logic is so complex that let/var makes a difference you should be not touching infra.


This might have been true a few years ago, but these are all solved problems in 2024.

> Python is not easy to build into portable binaries

https://pex.readthedocs.io/en/v2.1.40/buildingpex.html

- The package ecosystem is very hard to use in a reproducible way

pip, virtualenv, and requirements.in/txt is extremely reproducible. I will offer that it's not exactly idiot-proof yet and there are tons of stale tutorials out there

> The language is not truly typed - types add massive value for infrastructure and scripts because they are less likely to be unit-tested

Yes it is, if you want it to be. There's nothing stopping someone from using mypy, pyright, or other type tool on the strictest setting, and not passing builds unless you have 100% type coverage.

> The lack of a "let" or "var" keyword makes simple programming errors more likely (again, this code is less likely to be unit-tested)

No, but you get ~95% of the safety guarantees by using immutable-esque objects like @dataclass(frozen=True), pydantic models with the same, or attrs/cattrs with similar setting.


Sounds like you'ved misjudged the use case. Tools like this do the deployment, they aren't generally deployed themselves.

So a portable binary is not a requirement. Other points like let or types are not an impediment either, there are many quality tools available if you need them (ruff, pyflakes, mypy), and python has been doing this kind of work productively for thirty years now.


> Tools like this do the deployment, they aren't generally deployed themselves.

It will have to be executed on many different developer machines (or even your own machine several years in the future) so a simple, reproducible build process, including fetching pip dependencies, is critical.


Presumably the tool keeps backward compatibility, as did ansible or salt, so this doesn’t seem to be a real-world concern. Very few folks are doing nix-level stuff, yet the world marches on.

There will likely be a security fix in it or a dep at a later point, so you wouldn’t want to use the exact same version anyway.


A decent Python development tool chain handles most of that. Docker, pylint, black, type hints integrated with IDE/editor

Admittedly some languages like Go do a better job integrating all this into the core of the language. However, Go doesn't tend to have as powerful of a stdlib so it tends to be a lot more verbose to achieve the same thing.


Docker should not be required for day-to-day development.


Why?


It’s hard to install on some setups. It’s much slower than native. It’s admitting defeat.


Maybe because Python is already in use by pretty much every company that makes money in this (and others) domain ? Some of what you mention looks like pebkac problems as well.


Well so is pretty much any configuration language under the sun, and all the other options that aren't python.


I 100% agree with your points.

No type checking = no serious job. I have learned enough from Ansible to not ever touch that kind of stuff again.

There has been a time that Python was a fringe language, only known by some hardcore nerds. I thought Joel Spolsky had once mentioned that having Python on your resume was a signal of a quality developer, someone who went off the beaten path.

Times have changed. Python is now the MS Excel for developers. It shines for quick and dirty data mangling. Unfortunately, that is how a seizable portion of people approach software engineering. My theory is that for some having to do abstract thinking and perform a dry analysis beforehand is an impediment. They can only discover what they want while banging out something. They fix the runtime errors they could catch, and slap some more features on top.

Types imply a kind of foresight, and that is what some people really have difficulty with.

EDIT: Might sound negative, so I admit that the quick feedback cycle you can get from an interpreter language like php/python is a feature in itself.


I agree, Python is a pretty bad choice for all the reasons you mentioned.

That said I think there are precious few good alternatives. I've been using Deno a fair bit for "scripting" and it works pretty well, but I wish there were more options.

Also I have to say if you are using a tool like this to manage thousands of machines you're absolutely doing it wrong. I don't even work in ops/infra but even I know that manually running commands on multiple machines via SSH is asking for trouble.


I think Go would be a logically choice if you're being completely language agnostic, but most teams aren't. If teams are working exclusively in python already for web or data projects, there's a benefit to not introducing a new language just for architecture deployment if that's a small part of a teams function.


What would you have used? All of your issues aside, Python is very approachable to people who are used to managing infrastructure but may not have a strong programming background.


Shiv is a decent solution for making a portable package. Single file that only depends on a recent system Python being installed.


It’s better than Yaml or HCL though


Python is a nightmare when used for tooling. I’ve wasted so much time wrangling Python tooling for embedded development. Go would be a much better choice.


Does it allow me to run a script against an EC2 instance, say, and it spins it up and take care of everything? Something like packer would but without creating an AMI


You ever try cloud-init?

You can specify your config in user-data when launching pretty generic AMIs. https://cloudinit.readthedocs.io/en/latest/index.html


I don't want to launch instances (and run a script to set it up etc), I want to run my script THROUGH an instance.


I made this a while back that utilizes shell script and AWS cli to spin up and cloud init to run things https://github.com/nijave/cloud-init-golden-image

For this type of use case AWS has managed services like Batch, ECS, or even auto scaling groups that can make this easier depending on what you're trying to achieve.

ECS with Fargate executors is fairly easy to run arbitrary things inside a VPC


You'd need to create the EC2 instance outside of pyinfra (ie in Terraform). This could be done as part of the inventory itself, but wouldn't self-delete afterwards. If using Terraform there's a connector that allows you to plug Terraform output as a pyinfra inventory: https://docs.pyinfra.com/en/2.x/connectors/terraform.html


fyi, in Packer, there is an option to not create the final image


Yes! Which is what I am doing now, but was feeling it was using a tool that wasn't meant for that job a bit.


IMHO Ruby is better for creating DSLs, so I wrote a small thing to scratch my own itch: https://github.com/marius/koch

This is not meant to scale to more than a handful of machines, but you get the idea how nice straight Ruby is for a machine specification DSL.


https://github.com/marius/koch/blob/main/example/Rezeptfile suffers from the same problem I have with every single ruby ever: what are the available verbs I can type?

Contrast that with https://docs.pyinfra.com/en/next/examples/client_side_assets... where any sane setup will show completions after both the `from` and the `local.` typing


Thank you for the feedback!

I agree that completion would be nice to have, and probably relatively hard to implement for koch.

However, I prefer the cleanliness, dare I say beauty, of the config file and Ruby.


Anecdata: maybe yes, but when I was using puppet it would take 45 minutes just to load


Seems like an interesting generalized mix of something like https://github.com/cloudtools/troposphere and Ansible from a glance.

The value add would be unifying provisioning and configuration management in a Python-y experience? The lifecycle of each is distinct and that's traditionally where the headaches of using a single tool for both has come in


Is this something that would be a good fit to automate node reboots/restarts of complex clustered systems? Think Kafka, Elasticsearch or Flink, where you can’t restart the next node without revalidating the state of the cluster and the rejoining of the previous node. Please feel free to suggest other tools for this purpose.


I use pyinfra through molecule for testing sensible roles, it’s made it possible to have a process resembling TDD and have automated tests for my roles and playbooks. I actually don’t know how else to do it than with molecule and pyinfra, being able to have automated tests on ansible “code” made a big difference for me!


Yeah, I like this approach.

There's something about YAML that just sucks the joy out of programming. It seems like a giant step backwards when we have plenty of amazing programming languages in existence.

Even when infrastructure yaml like cloudformation are wrapped by some SDK, it can still be a pain because you end up with stuff like...

    do_something("___((!-prickly_config_string_::might as well use yaml _blah-blah:blah))")
Back in the days of java and xml, there used to be a distant promise of "binding" the xml to code (remember jaxb?) so that you could then just manipulate it fluently as code and then "marshall" it out back to xml when you were done. Those days and that promise are gone, right?


https://github.com/aws/aws-cdk#at-a-glance is the "generate cloudformation using code," and is the AWS version of troposphere as best I can tell


CDK looks like it definitely does do that!


this feels like Michael DeHaan's OpsMop project that existed for like a week before he pulled all the code offline.

https://news.ycombinator.com/item?id=18717422

Interesting to see all the Ansible comments here. I'll have to check this out asap.


I would like to point to a virtual machine or a set of virtual machines that I have configured and make the tool reproduce / translate the state of these „model machines „ to some hosting environment.

Can this or any other tool do that?


aws ec2 create-image :-D

In all seriousness, I would guess this requirement has a hidden 80/20 in it because it is very unlikely that one wishes every machine to be a perfect copy of each other, unless the config files have been very, very disciplined about the hard-coded strings and assumptions made

So even in my glib "create-image" response, even then there's almost certainly going to be some cloud-init that subsequently stamps the booted instance with its actual identity


How often is this kind of tool needed since containers went mainstream? I had gathered they were not used as often any longer.


My experience has been that for day zero stuff, e.g. how do you get a system _prepared_ for containers, this kind of tooling is handy. I side with the sibling comment that cloud-init is The Way but it also requires (a) some trial and error (b) to think entirely in terms of cattle/pets which some folks/organizations are not there yet


There is need for python module that complies to ansible code


This is really cool! Kinda seems like the Nix config approach.


However the email address was not being processed


Why does this sound so familiar to Chef?


Does anyone have any info on if saltstack is going to be enshittified? That is the situation that would get me to go looking for a replacement such as this


Maybe this is the best thing ever, but the documentation doesn't answer one simple question: What problem is it solving?

It is a configuration management tool, like Ansible?

Is it meant for running one-off commands across the infrastructure, like Salt?

It says it integrates with Terraform, so it's not a provisioning tool...

What does it do different (and presumably better) than other tools?

The Getting Started guide doesn't cover this. The FAQ doesn't cover this, and the Docs doesn't have an Introductory section to cover this.

It's disheartening to find a potentially interesting project, but not really know what it does and how it might fit in your workflow.


It's similar to Ansible, but uses Python as a declation language rather than YAML.

It can also run one-off commands across the infrastructure (like Tentakel: https://pypi.org/project/tentakel/ ).

I've been using Pyinfra for some time. It's good enough for me.


I've been using it as well with great success.

A couple years ago I inherited about 100 Mac Pros that are part of $dayjob's CI infrastructure. They had been managed over the years using a combination of shell scripts, Chef, and manually via VNC. No two machines were alike. The Chef recipes had all bit-rotted and weren't usable and due to $reasons were based on an old version of Chef that $company was stuck on.

So I looked around for alternatives, and being most comfortable in Python, I explored Ansible, Salt, and Pyinfra.

Ansible seemed like the obvious choice, but it has very few playbooks/actions for macOS systems. I was going to have to write my own. As I dug into its documentation, I found it was taking me a long time to wrap my head around all that I needed to do and started to sour on its complexity. This is a matter of taste, but I just didn't find Ansible very welcoming. I wanted something simpler.

I had previously used Fabric, so considered using it again. But Fabric offers too little (it's really not much more than parallel ssh-if you want idempotent operations you have to write that yourself), and I don't agree with the direction it took with version 2.x.

Then I found Pyinfra. It took me less than 30 minutes to understand it in its entirety. It's conceptually simple: you have an inventory of machines that it connects to in parallel over ssh. You provide it with a deploy script that combines facts and operations. Pyinfra uses the deploy script to gather facts about each machine, then you use those facts to decide whether you need to perform any operations. It then performs those operations on each machine as needed. The inventory file, deploy script, facts, and operations are trivial to write for someone comfortable with Python. It's all Python with the facts and operations being decorated functions. There is no DSL to learn. (It comes with a bunch of pre-written facts and operations, but they are mostly for Linux systems. I had to mostly wrote my own for macOS, but I found them really easy to write.)

I had it operational the same day I found it. I used it to successfully get all of the Mac Pros into consistent state: things like system settings, installing Xcode, automating installs of brew packages all at the same version, installing JVMs, updating and upgrading macOS, installing Sentinel One, etc.

I've been very happy with it, even contributing a few PRs to fix small bugs and contribute minor functionality.


I would love to see any macOS facts/operation code if you can/would be willing to share! We also managed a bunch of macs using pyinfra but mostly stuck to shell commands.


> It is a configuration management tool, like Ansible?

Yes

> Is it meant for running one-off commands across the infrastructure, like Salt?

Also yes.

> It says it integrates with Terraform, so it's not a provisioning tool...

The TF integration is specifically to use TF as an inventory source - ie TF to create resources and pyinfra to then configure them.

> What does it do different (and presumably better) than other tools?

The homepage covers the highlights, I originally created pyinfra because debugging Ansible was complicated (no plain stderr as not "just" commands on the remote side) and slow, but things have evolved significantly since then.

> The Getting Started guide doesn't cover this. The FAQ doesn't cover this, and the Docs doesn't have an Introductory section to cover this.

Hugely appreciate this feedback, this is super helpful and something I will attempt to make clearer.

---

Quick attempt at a better explanation: You write Python code that defines operations (either state "this apt package should be installed" or stateless "run this command"), provide an inventory of targets (SSH, local machine) and pyinfra executes it.

Roughly sits where Ansible does for configuring servers, but also solves the case of "how do I run this command across my server fleet" (which I believe Ansible can also do).


I hope I wasn't coming across as too negative.

I genuinely think that an introduction with a few user stories would go a long way!


> and slow, but things have evolved significantly since then

Well, Ansible is still dog-slow, so that part has not evolved...


Heh yeah this is very true, I updated the perf test repo earlier this year to confirm https://docs.pyinfra.com/en/next/performance.html


> Great for ad-hoc command execution, service deployment, configuration management and more

I found that pretty clear to be honest.


People will know whether it solves their problem when they see it, no need to akchually the OP or maintainer


It's like Ansible.

That's what I discovered by reading the homepage.


I would disagree with that since ansible actually does two things simultaneously: cloud provisioning and local provisioning (and that "local" is actually hiding a 3rd axis, actual local, not just local to the managed instance, say for example if you needed the azure libraries or such, you can use a pre_tasks: block to create a virtualenv and install the deps locally before firing up the main workload)

Reasonable people can 100% disagree about whether yaml is the correct packaging for those operations, and ansible is a bit too imperative for my liking, but as far as "I have one hammer..." it does all the things


Ansible can technically do cloud and local provisioning, just as Terraform can technically do cloud and local.

But practically, these tools have their areas of speciality.


> just as Terraform can technically do cloud and local.

I feel as though we're splitting hairs here, given there is, to the best of my knowledge, no `resource remote_file make_sshd_config { inventory_host = "whatever" dest = "/etc/sshd_config" src = "./sshd_config.tmpl" vars = {...} }` in TF. There is template, and there is local_exec and the rest is a Simple Matter Of Programming :-/

I'm waiting patiently for someone to chime in "well, just spawn ansible in local_exec" as if they're missing the point


Any kind of provisioning doesn't seem too far a step though. It is just another "operation" with its own state management logic.


I mean, I hear you in that python is Turing complete so all things are possible through another level of indirection, but I didn't see one shred of amazon.aws.autoscaling_group anywhere in their docs so .. what, I write my own? If I was going to go through the trouble of writing custom shit for Yet Another Awesome Cloud Thingy I'd fire me


The fact that Pyinfra does not currently support a feature which can be implemented using Pyinfra philosophy does not make it different than Ansible. I believe that was what the parent comment was about.


Digging in the docs, it uses words like "inventories" and "operations" which indeed look like a configuration management system, much like Ansible, it's agent-less.

And that's cool- Ansible is a bit of an oddball system, but then I'm still left wondering, why is this better, or why it is better for the author at least?

I've used cfengine, Puppet, Chef, bcfg2 (briefly) and ansible. I want to know what makes this tool different and better. :)


Salt is not meant for running one-off commands. You can easily make sure state.apply is run for all of your infra several times an hour


it's bc it solves all problem ever exist.


[flagged]


See FAQ #2


We build a similar tool except we focus on AI workloads. Also support on-prem clusters now in addition to GPU clouds. https://github.com/dstackai/dstack


Should probably just stick with the Terraform CDK or Chef if you need this level of expressibility.

This is no where near the level of readiness needed to be reliably used in a production environment.

Verbose logging is not a reason to introduce a non-standard tool into your stack.


> Should probably just stick with the Terraform CDK or Chef if you need this level of expressibility.

I'm not familiar with Terraform CDK, but I don't see what Chef does/has that this doesn't?

> This is no where near the level of readiness needed to be reliably used in a production environment.

Why?


> This is no where near the level of readiness needed to be reliably used in a production environment.

This is baseless FUD.

Pyinfra is 8 years old, just 2 years younger than Terraform. It's well maintained, stable, and used by many teams in production. Just because it's not as widely known or adopted as other tools, doesn't mean it should be avoided. In fact, as you can see from testimonials here, users often prefer it over Ansible.

> Should probably just stick with the Terraform CDK or Chef if you need this level of expressibility.

Terraform is used for provisioning infrastructure. Pyinfra is a configuration management tool. They're not equivalent.

Chef is closer, but it's an older tool that has largely been superseded by Ansible. It shouldn't be anyone's first choice, unless they really need some obscure feature it does better than Ansible, or Puppet for that matter.

> Verbose logging is not a reason to introduce a non-standard tool into your stack.

Why would that be the only reason to use this? That's not even one of its prominent features, and surely all tools in this space support verbose logging...

What a confused comment.


so, it's Ansible...?

Configuration Management tools (that's what this, and Ansible, are) are a nice idea, but get very complicated very quickly. The tools themselves get complicated, the configuration gets complicated, you're constantly finding ways that the state gets broken that you need to re-incorporate into your script, it has to work in a variety of states, and you have to keep re-running and re-running and re-running it, monitoring for problems, investigating, fixing. Very complex, lots of maintenance, lots of potential problems. The "Pets" model from the phrase "Cattle, not Pets." I strongly recommend you do not raise Pets.

Instead, use Immutable Infrastructure: build an immutable image one time that works one way. Deploy that image. If you need to change it, change the build script, build a new image (with a new version), deploy a new instance with the new image, take the old one out back and shoot it. (The "Cattle" of "Cattle, not Pets") If the state gets out of whack or there are problems, just shoot it and deploy a new one that you know works.

This is the single most revolutionary concept i've seen in over 20 years of doing this job. It is an absolute game-changer. I would not go back to Configuration Management for all the tea in China.


You're conflating different things - this has nothing to do with Pet vs cattle.

Even in your confusion, State still exists in the real world and needs to live somewhere, it also is unfeasible to always recreate big states.


This happens so often on HN, and it is so god damn frustrating. I'm literally a fucking expert, telling you the best thing to do, and explain why, and I get downvoted for it. The next person who tells me in a comment "explain your opinion! you're not helping!" when I don't write an entire novel to justify my position, I'm going to link back to this thread. Pointless.

I've gone to the trouble of googling these articles for you (it took me a whole 30 seconds!). Please read any of them.

https://webcache.googleusercontent.com/search?q=cache:https:...

https://devopscube.com/immutable-infrastructure/

https://thenewstack.io/a-brief-look-at-immutable-infrastruct...

https://www.digitalocean.com/community/tutorials/what-is-imm...

https://www.hashicorp.com/resources/what-is-mutable-vs-immut...

https://www.techtarget.com/searchitoperations/definition/imm...

https://www.oreilly.com/radar/an-introduction-to-immutable-i...

https://www.terraformpilot.com/articles/mutable-vs-immutable...

https://www.bmc.com/blogs/immutable-infrastructure/

https://www.linode.com/docs/guides/what-is-immutable-infrast...

https://devops.com/immutable-infrastructure-the-next-step-fo...

https://openupthecloud.com/what-is-immutable-infrastructure/

https://www.opsramp.com/guides/why-kubernetes/infrastructure...

https://www.cloudbees.com/blog/immutable-infrastructure

https://www.daily-devops.com/devops/immutable/architecture-p...

http://radar.oreilly.com/2015/06/an-introduction-to-immutabl...

https://highops.com/insights/immutable-infrastructure-what-i...

https://docs.aws.amazon.com/wellarchitected/latest/financial...


Maybe you're not the only expert in HN ? For someone to write what you wrote after 20y of experience is a bit interesting - and you did write a lot !

I might have more YOE than you do for example, and might have worked on bigger companies/infras than you did - what does it matter to the opinion at hand ?


As someone who had to write infrastructure in Python, every time from scratch, for large projects: pyinfra isn't it (and neither is Ansible, if you care about that).

It will probably work for some simple and common cases, but they barely need any automation anyways...

The problem isn't even the tool itself, it's the lack of standards. Every large enough system is too unique to be easily managed by cookie-cutter tools like this one. Some people will bite the bullet anyways, and try to adapt general-purpose infra tools to their case. I've seen that too. This is a very miserable experience. Frustrating in that very obviously simple and necessary things are sometimes described as "impossible" due to how the chosen framework works. To contrast that, the home-brewed systems usually suffer from the lack of generality, worse user experience in general, quickly start lagging behind the underlying technology updates...

Also, out of popular languages, Python would be somewhere towards the bottom of the hierarchy if I had to choose a language to manage infrastructure. The only redeeming quality of Python is its popularity. On engineering merits alone its unremarkable at best.

----

PS.

    import click
If I see this in the project source code, I blacklist it and never look at it again. This is a red flag, a sure sign that the person writing it are clueless.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: