
Cluster SSH – Manage Multiple Linux Servers Simultaneously - praveenscience
https://www.putorius.net/cluster-ssh.html
======
cheald
This is where Saltstack really shines, IMO. Once you have the minion
installed, you have a ZMQ command channel on each of those boxes.

    
    
        salt -G "roles:whatever" cmd.run "blahblahblah"
    

Easy to target by any number of things - name, OS, IP subnet, custom metadata,
etc. Very powerful, and makes intelligent administration of a fleet a lot
easier.

For example, I recently used this to audit our fleet for vulnerable PHP+nginx
installs, regarding the recent CVE:

    
    
        salt "*" cmd.run 'ps ax | grep -q "[p]hp-fpm" && ps ax | grep -q "[n]ginx" && dpkg -l | grep -P "ii.*php"'
    

While we'll upgrade PHP on any systems it's installed on (also easy to find),
this gave me a high-priority hitlist of all machines that needed PHP upgrades
because they were running active php-fpm + nginx combinations.

~~~
oneplane
Yep, salt has the best of all elements. You want simple remote execution? Got
it. Agentless? Sure, use salt-ssh. Event bus? ZeroMQ and RAET included.
Automation? Sure, reactors cover that. Integration with other systems? Got
that too, state, pillar, mine, external events, all included. Even a REST
interface.

There isn't much of a point any more to use bare SSH, heck, you can even use
proxy minions to control systems that only have telnet or outdated SSH (like
SSH1 or SSH2 with unsupported ciphers), or plain web interfaces with no API.

~~~
usrme
I've been using Chef for many years and have been pretty happy with how
straightforward everything has been, but I am always open to other solutions.
If you've used Chef, Puppet or Ansible, how would you say Salt differs from
those?

Your comment has already inspired me to look deeper into Salt, but I am hoping
you can lay out some general pros and cons as well. Thanks!

~~~
cheald
I've used all three in the past.

Chef was the first I used, many years ago. It's...complex. Despite being Just
Ruby, the DSL takes quite a lot to learn, and I don't like that Chef is
imperative rather than declarative. It also just seemed crazy verbose. It felt
like it took ages to get my configurations right. For example, here's a
community cookbook to install and setup NTP: [https://github.com/chef-
cookbooks/ntp/blob/master/recipes/de...](https://github.com/chef-
cookbooks/ntp/blob/master/recipes/default.rb)

I tried Puppet after Chef. It was better, but I always seemed to be spending
more time fighting it than actually getting stuff done with it. It suffered
from the inability to push states to boxes at will. It's probably more pure to
do it periodically via cron, but when you have stuff to get done, sometimes
you just want your state applied to a machine. Configuration was less complex
than Chef, but still quite verbose. Because box state is "all or nothing" you
end up with a pretty slow management system that felt really subject to
bitrot. On the subject of complexity, here's the community NTP module:
[https://github.com/puppetlabs/puppetlabs-
ntp](https://github.com/puppetlabs/puppetlabs-ntp)

I like Ansible a lot. No real complaints about it, except that Salt is
"Ansible++" with the command channel, reactor system, etc.

Salt's just YAML, like Ansible. You have minimal logic available in YAML via
Jinja, but it's discouraged by design; your state declarations are meant to
_describe_ the state of the system, rather than to execute a series of
explicit steps. Salt figures out how to bring them into compliance. You aren't
meant to program in it, you're meant to describe in it. This makes states very
straightforward and easy to interpret.

By comparison to the first two, here's a similar formula for Salt:
[https://github.com/saltstack-formulas/ntp-
formula/tree/maste...](https://github.com/saltstack-formulas/ntp-
formula/tree/master/ntp). Terse, simple, straightfoward. ntp/init.sls is the
whole state that gets run - it just installs the package, marks the service as
enabled, starts the service, and if there's a config defined, installs it.
Templates can be colocated with the state definitions. Nothing too fancy, just
yaml and plaintext configs. I can apply just that single state to a box (salt
mybox state.apply ntp) or set of boxes (salt -G "os:Ubuntu" state.apply ntp)
or I can do a pull from the box (on mybox: # salt-call state.apply ntp). I
don't have to run the 479 states that apply to the box in sum; this makes
incremental management of boxes very quick.

You can use salt like Ansible via salt-ssh (shells into target box, executes
states), like Puppet or Chef (operating in master mode, with agents that
connect and execute states), you have the active command channel to your
fleet, and you can do event-driven stuff with it, too. For example. we
provision and renew LetsEncrypt wildcard certs on our Salt master. We have a
Reactor set up that that watches for when a cert (on the master) is updated,
which finds all machines subscribed to that cert and executes our `ssl.certs`
state, which pushes the new cert out and executes any service reloads
necessary to get the renewed cert into play. This lets us keep a small set of
certs/renewals in a centralized location rather than having to worry about a
bunch of scattered LE renewals all over the fleet. When we have to provision a
new machine or a new service, we can just subscribe to the desired cert and
we're off to the races - no need to do the LE provisioning each time we salt a
new box.

There's a lot to love about it. It's not without its bugs, but the Saltstack
project and team are very active, and the project itself is Python, which is
easy to read and write. I've had to debug things a number of times, but it's
never too difficult.

Been using Salt for years now, and I still think it's the best in class out
there.

~~~
usrme
Many thanks for your extensive reply! There's even more to consider now <3

EDIT: Can you vouch for the ease of use when configuring Windows machines as
well?

~~~
oneplane
Windows is supported as well, at least as a minion (a 'target'). You can run
cmd and ps1 commands remotely, but there is no SSH support. This will be added
later on when MS makes SSH a 'normal' integration and then salt can implement
that as well.

There are a number of Windows-specific states and if you have a universal
state (i.e. making a directory in a specific place) the documentation will
supply you with the things each OS does and doesn't support. For example, you
can't set POSIX permissions on Windows, but if the FS is NTFS you can set an
NTFS ACL.

------
chousuke
I've used clusterssh in the past, but nowadays Ansible is simply a better
choice. clusterssh is rather brittle.

I don't much like Ansible either (I have _opinions_ about using YAML as a
programming language), but for ad-hoc maintenance of a bunch of servers, it
works okay.

~~~
yjftsjthsd-h
Ansible can't do interactive sessions, though, right?

~~~
kbenson
I think there's a valid argument to be made that anything you would use an
interactive session for on multiple servers at once is better done as an
interactive session on one server, formalized into some rules of what to do
(even if just a bash script with a bunch of grep and sed commands), tested on
another server, and _that_ should then be run on lots of servers at once (or
better yet, batched into groups of a reasonable amount to check nothing went
wonky).

In the words of the venerable DevOps Borat[1], "To make error is human. To
propagate error to all server in automatic way is #devops."

1:
[https://twitter.com/devops_borat/status/41587168870797312](https://twitter.com/devops_borat/status/41587168870797312)

------
kempbellt
I used to use tmux to do this. Split the screen half a dozen times. Connect
each pane to a separate instance. Enter `setw synchronize-panes on` and you
are now running commands on 6 instances simultaneously. From there you can run
`htop` to see resources being used on all instances on one monitor, run `apt-
get` commands, etc.

~~~
xorcist
How do you split the screen into so many panes? tmux told me my panes were too
small and refused to split further.

I guess I'd like to split it in groups of panes and maybe switch which groups
is currently visible?

~~~
kempbellt
I had fairly large high-resolution monitor. 27" mac monitor, IIRC. I never ran
into tmux telling me that I couldn't split further and could have upwards of
50 panes. Might be a new "feature"?

------
ktopaz
Years ago I wrote this tiny script:

<run-on-all.sh>

    
    
      #!/bin/bash
    
      clusterfile=$1
      cmd="source /etc/profile; $2"
      ssh='ssh -n -A -o BatchMode=yes -o ConnectTimeout=10 -o LogLevel=quiet'
      #add "-o ConnectTimeout=x" for timing out the ssh connection after x seconds
    
      #redircting stdin for ssh command to /dev/null (using switch "-n" for ssh) since otherwise ssh command breaks bash's "while" loop
      #reference: http://www.unix.com/shell-programming-scripting/38060-ssh-break-while-loop.html
    
      echo
    
      while read  line;
      do
       echo "@==============@ running on $line @==============@"
       echo
       echo
       $ssh $line "$cmd"
       echo
       echo
      done < $clusterfile
    
      echo
      echo
      echo "==============> FINISHED RUNNING for $clusterfile <=============="
    

All one needs to do is call it like so: ./run-on-all.sh
/path/to/cluster/file/list-of-servers-here.txt "sleep 60; reboot"

~~~
scarby2
this is good for a few servers - but its sequential nature will become onerous
if for example you needed to run something that took 10 minutes on 100 hosts.

ansible ad-hoc commands do almost exactly what you do but in a more scalable
fashion

[https://docs.ansible.com/ansible/latest/user_guide/intro_adh...](https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html)

~~~
reacweb
<list_of_hosts xargs -I {} -P 0 -n 1 ssh {} hostname

~~~
downerending
Here's a more polished version of just this idea:
[https://github.com/michaelkarlcoleman/ssssh](https://github.com/michaelkarlcoleman/ssssh)

------
ComputerGuru
I'm seeing a lot more love for ansible than any of the other competitors in
the cluster/pool management sphere. I didn't know HN had reached a (rough)
consensus on this, is ansible well-enough-accepted as the configuration
management tool of choice when not integrating into a larger cluster
management system (e.g. Kubernetes)?

~~~
farisjarrah
Ansible has not taken the sysadmin world completely. Google uses Puppet, many
companies still use saltstack/chef or even just bash scripts via jenkins.
Ansible is a very very very useful tool. Even for kubernetes management.

~~~
atsaloli
Some companies are still using CFEngine as well -- especially at large scale.

------
burntwater
I feel like often people forget about the small to medium setups run by "The
IT guy." I'm a jack of all trades technologist in a major theater. One hour
I'm configuring our half dozen Linux servers. The next hour I'm creating
AutoCAD drawings, the following hour I'm changing SNMP settings on our 2 dozen
Cisco switches, and after lunch I'm designing the electronics for a new stage
prop.

I don't have the hours and days to learn Ansible, and it doesn't make sense
for such a small environment. And it's Yet Another Service my successor will
have to learn in what is already an incredibly niche position.

However doing the same half-dozen steps on a half-dozen servers takes up real
time. Tools like cssh, that take 15 minutes to learn and have an IMMEDIATE
time saving payoff, are invaluable. And my successor can easily skip using the
tool and just do things the hard way until they do have the time to learn more
efficient tools.

~~~
h1d
> I don't have the hours and days to learn Ansible

Really? Are you that filled or you don't want to use a minute of your weekend?

I've set up a few ansible tasks for occasional use like updating servers but I
think it's worth learning. It's especially good when installing a new server.
Run and it's to your usual state in no time.

The good part about task runners is that it is self documenting of your
repetitive tasks and easy to share. You could create a shell script with
comments but ansible is probably more portable.

~~~
burntwater
It's hard to overstate the breadth of skills necessary for this position.

Is Ansible worth learning? Possibly. But there are 100 other things worth
learning too, the majority of them not even in the IT space. My time and
mental bandwidth are finite.

------
scriptdevil
I used clusterssh in the past and it is really good at sending commands to
multiple machines. However for any real work, I would strongly recommend
keeping the typing to a bare minimum and do all your work inside a well tested
script. Better yet, use ansible or something like it to manage multiple
servers

~~~
carlsborg
While developing such ansible scripts, or prototyping, you usually need to run
the commands manually, so this becomes useful.

~~~
yjftsjthsd-h
For that, do you really need to control more than one machine at a time?

~~~
carlsborg
Sure if you are prototyping a cluster with various types of nodes.

~~~
scarby2
just run the playbook on a test cluster?

------
sdmike1
I wrote a similar tool[1] for a cybersecurity competition I was helping to red
team for. It would try a dictionary of username and password combos against a
list of hosts generated from the results of a masscan[2], once it logged it it
would run a bash script on the host to set up our persistence.

From there it would keep a session open on each host and allow you to run
commands on a single host, a subset of hosts, or all hosts.

The advantage of this over hydra or some other SSH brute forcer is that it
allows us to run our persistence tooling right away after finding a login and
keep that SSH session alive so we can re-use it even if the password is
changed.

The code is a tire fire, but it worked well for what we needed :)

[1] [https://github.com/sdshlanta/ssher](https://github.com/sdshlanta/ssher)

[2]
[https://github.com/robertdavidgraham/masscan](https://github.com/robertdavidgraham/masscan)

~~~
justinjlynn
It'd be neat if it attempted to lateral as well.

------
pepemon
There is plethora of similar tools (cssh, pssh, dsh) but Ansible ad-hoc mode
superseeds these ones at any real task involving "simultaneous" management of
Linux boxes.

~~~
bonoboTP
Why is Ansible better? I've considered it for maintaining a bunch of
workstations and servers. However after reading some tutorials it seems not to
be worth it.

It involves a whole new level of indirection, I now have to check up on what's
new in new versions of Ansible, perhaps adjust a bit on the syntax, Google
problems, find relevant Github issues or SO answers that work around the
inevitable weird quirks and bugs etc.

I think Ansible would have to provide a lot of value (i.e. time savings)
compared to, say pssh or cssh to be considered for a busy sysadmin who just
wants to get stuff done with as little effort as possible (CV padding is not a
consideration, playing with new toys is not a consideration). Doing things
directly with shell scripts is a much more robust approach in terms of being
understandable to other people, who don't know Ansible, and it will stay
understandable even after Ansible is gone.

There's also a proliferation of such tools. Do I need Chef, Puppet, Ansible,
Terraform, Salt or another one? Will Ansible die in 2 years and be superseded
by X?

Unless you do this job as a full-time sysadmin, the overhead and potential
headaches seem to be more than to be worth it. I may be wrong though.

~~~
scarby2
the commenter here is talking specifically about ad-hoc commands:

[https://docs.ansible.com/ansible/latest/user_guide/intro_adh...](https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html)

these are pretty easy to understand.

Also chef, puppet and ansible are becoming very niche tools these days now
that we have kubernetes and docker, all we really need them for is
provisioning out k8s infra if we run bare metal.

Terraform is a different tool entirely working at a different level.

------
res0nat0r
mpssh and kash are also good. kash is part of the kanif perl project, but the
one thing about it I really like that I don't see other projects doing is that
it will aggregate similar output before it spits it back.

Thus if you run it to check for a package version on 300 servers, you can get
maybe 2-3 sets of output grouped by host based on the output vs 300 lines of
output.

[https://github.com/ndenev/mpssh](https://github.com/ndenev/mpssh)

[http://taktuk.gforge.inria.fr/kanif/](http://taktuk.gforge.inria.fr/kanif/)

~~~
slumos
clush does exactly this with the -b or -B switches.

~~~
res0nat0r
Ah I didn't know this...Thanks! I will check this out.

------
BuildTheRobots
If you're using a linux distribution, then the `terminator` terminal app
supports multiplexing as part of its built in feature set and has been my
default terminal of choice for a while (tabs, splitting, etc...)

~~~
wastholm
I use tmux for this. Create multiple panes with ssh sessions in them and do:

    
    
        set-window-option synchronize-panes on
    

I have defined a keyboard shortcut for this, of course.

------
scarby2
I want to add a note to this - if you're using something like this to manage
linux servers you're almost certainly doing it wrong. At a server level use
ansible, puppet or chef.

------
segmondy
One should be thinking of using terraform and something like
chef/puppet/ansible or even better yet moving to kubernetes if you find
yourself having these kind of problems.

------
edf13
Shouldn’t you be using Ansible (or other) to manage your multiple servers
rather than ssh into them?!

Far too risky

~~~
bwann
Yes, but sometimes you can break your configuration management in such a way
that it can't recover and need a rapid way to fix things or see the state of
the world. It's very handy to have a tool ready to go that can assist. Sure,
it's very dangerous when operating on an entire fleet, but break glass in an
emergency.

It's also rather handy when you want to run ad hoc queries on your machine,
e.g. which kernels are out there, is this leftover rpm installed somewhere,
etc.

At a past job we had an in-house tool like this that also logged all of the
commands anyone had ever ran with it and saved the stdout/stderr output in a
webui+cli output. If you suspected somebody did something clowny, you can go
look at exactly what they did, when, and what the result was. This logging was
very important, imho and tools like it should have it.

------
linuxdude314
Managing servers using parallel ssh is a way old school (and powerful)
technique I don't recommend doing unless using it as a transport mechanism for
a config management tool.

A company I worked at developed a custom SSH tool that would tie into your
CMDB, kind of like a home rolled ansible you could use to blast commands to
whole a PoP or even the whole network at the same time. The thing was insanely
powerful, but using it was like wielding a machete through a delicate garden.

Ansible is really great in this regard since you have the ubiquity of SSH but
sane management, not some difficult to maintain shell script.

At scale parallel ssh performs poorly in comparison to message queues. As
mentioned elsewhere in this thread, SaltStack defaults to using ZMQ for
communication which has much lower overhead than SSH (with trade offs as
well).

------
tristor
Folks are acting like the only use case for this is sending commands for
deploying software or something. Even in fully automated environments at scale
I used Cluster SSH extensively in my past life for things like grepping local
logs on all members of a cluster at the same time, or querying local status.
It's immensely helpful when troubleshooting issues in large environments.

Of course you shouldn't be redneck deploying stuff with Cluster SSH when you
could write an Ansible Playbook or deploy Chef/Puppet/Salt. But to
troubleshoot issues and manage host/OS level functionality on multiple
identical systems at the same time, it's invaluable.

------
seqizz
Also an alternative: polysh [1]

[1] [https://github.com/innogames/polysh](https://github.com/innogames/polysh)

~~~
carlsborg
Another alternative, for AWS nodes:

[https://github.com/carlsborg/dust#parallel-ssh-
operations](https://github.com/carlsborg/dust#parallel-ssh-operations)

------
johnklos
Not sure why there's such an Ansible yankfest going on - Ansible is a solution
for problems many of us would rather simply avoid - but it's starting to make
me believe that there are too many fake accounts used for marketing purposes.
We know they're on Facebook, we know they're on Reddit, but Hacker News?

~~~
rumanator
Do you actually have any evidence that such a conspiracy is real?

I mean, I've seen plenty of people jumping to wild assumptions and invent all
sorts of conspiracy theories just because they are faced with the fact that
other people have different opinions and tastes, and for some reason that is
impossible unless there's a vast conspiracy to push ideas that don't match
their personal whims and tastes.

I've used Ansible in the past. Ansible sends python scripts over SSH to hosts
and runs them remotely. Users specify the state they want the system to be in
and the little python scripts run all the checks and apply all changes. Does
it take a conspiracy to prefer this approach over simply multicasting SSH
connections to multiple hosts?

------
lykr0n
I can't believe `pdsh` isn't mentioned. Dead simple "run this command via ssh
on these hosts" program

~~~
maaand
good man!. pdsh has been my tool of choice for years!

------
rbanffy
It's neat and useful for clusters with a few nodes. I'd suggest using Ansible
or Salt instead.

Also, for the small clusters, I have a tmux script that splits the windows,
ssh's into each box and sets keyboard sync. It's relatively trivial to
generate such a program from a list of nodes you want to connect to.

------
znpy
it's proprietary, but SecureCRT works awesomely in that sense.

You can decide to just write commands in the common panel and relay those
commands to all ssh connection in the current window. When you're done, you
can just stop the relay and it becomes back a regular tabbed ssh client (with
a lot of features).

~~~
ComputerGuru
No need for any proprietary tech: as mentioned in other comments, tmux can do
this out of the box.

------
lewaldman
I _really_ like cssh...

Yeah... I have tmux, _all_ my infras are on Ansible, etc, etc, etc... But
there are times where I _need_ to log to several boxes at once and when the
time comes this tool is invaluable!

------
sheepdestroyer
Ásbrú Connection Manager has clustering too : [https://github.com/asbru-
cm/asbru-cm](https://github.com/asbru-cm/asbru-cm)

------
alexellisuk
I posted this a day or so ago, glad to see it is now trending on the front
page. Cluster SSH looks very useful and reminds me of what Salt Stack tries to
do.

------
badrabbit
Surprised ansible does not do this

------
exabrial
Available in homebrew as well!

