
StackStorm – IFTTT for Ops - spdustin
https://github.com/StackStorm/st2
======
amoghe
Orchestration tools (puppet/chef) are already able to get your infrastructure
to a target/desired state, and keep them in that state, and notify when
deviations occur or the target state cannot be achieved. What does StackStorm
do that these tools cannot?

~~~
lotyrin
This is for the guy who gets a support call because of a disk getting full,
and has to ssh into a box and delete old log files because logrotate is
fubared by some other team but he finds out that there's a nagios monitoring
everything (with yet another team ignoring the noise) so he wants to just have
his bash oneliner for deleting old log files run any time the disk monitor
hits critical, and all he has to do is sell someone with a purchase card a
SaaS app (easy), and doesn't have to sell his entire organization the concept
of not being fuckups (hard).

It's sad how big the market for this is.

~~~
bigdubs
I don't like being the voice of the purist in this, but this seems like a
bandaid on a bullet wound.

For most of the cases where this would seem to be useful there is probably a
failure upstream of that usage that should really be fixed.

~~~
doriftoshoes
Giant disclaimer: I work at StackStorm...but I also have an extensive Ops
background.

This is really the next step in runbook automation. It gives users a way to
express procedures and operational patterns in code (the workflow definitions
are in yaml). With any sort of automated remediation there is always the
concern of "painting over the mold" but at the same time you don't want to get
stuck doing a large number of manual steps when you could be focusing your
energy on tracking down the root cause of the issue and resolving that. The
more important aspect to me personally is the easy of version controlling
these ops patterns. Store the workflow definitions in source control and it is
easy to diff the changes in your procedure.

~~~
mst
It seems YAML is the new S-expressions for people trying to pretend they
didn't actually invent a programming language.

This is, I guess, less annoying than executable XML.

~~~
doriftoshoes
YAML is definitely the hotness right now but it works. No need to invent a
full blown language for something like this. Way easier than trying to write
json.

~~~
eropple
Or you can use Ruby or Python and Perl and have a scripting language that
looks and acts as a scripting language is generally expected to look and act.
(And with Ruby in particular it's very easy to provide a flexible and terse
DSL that provides significant benefits on its own.)

This "invent your own worse programming language" fad is disappointing.

~~~
crdoconnor
>Or you can use Ruby or Python and Perl

Then you get turing completeness, and turing completeness harms readability
and makes your language a magnet for technical debt.

This is why most of us these days use an intentionally dumb language to
template HTML (another non-turing complete language). The alternative was a
god-awful fucking mess (remember PHP without a framework/templating
language?).

This is what Tim Berners Lee alluded to with this:
[https://en.wikipedia.org/wiki/Rule_of_least_power](https://en.wikipedia.org/wiki/Rule_of_least_power)

Here's my example (yes, I'm one of those people):

[http://hitchtest.com/](http://hitchtest.com/)

I don't think you could write a cleaner, more readable parameterized test case
in python.

>This "invent your own worse programming language" fad is disappointing.

There's a lot of disasters out there for sure (in the testing world as well
=), but I'm pretty happy to see custom YAML-based declarative languages
catching on.

I think Ansible states are, likewise, way easier to deal with than the
equivalent in python would end up being.

~~~
smw
You're making the same exact mistake that the guy who decided ant should use
xml as a programming language did.

As soon as you need loops, conditionals, subroutines -- which are all very
common when writing tests -- you're making up your own language with horrible
warts.

~~~
crdoconnor
>You're making the same exact mistake that the guy who decided ant should use
xml as a programming language did.

Ant was a badly done turing complete langauge. I did not create a turing
complete language I created an very, very dumb declarative language with no
functions or control structures - only data.

>As soon as you need loops, conditionals, subroutines -- which are all very
common when writing tests

I _already have_ loops and conditionals via jinja2 (a templating language I
did not create) on the high level and in python on the step level.

I am _not_ and will never implementing any control structures in YAML (a la
ant). The YAML will _always_ remain dumb to help maintain a strict separation
of concerns and test readability.

>you're making up your own language with horrible warts.

Warts such as what?

~~~
smw
[https://github.com/saltstack-formulas/mysql-
formula/blob/mas...](https://github.com/saltstack-formulas/mysql-
formula/blob/master/mysql/server.sls)

Down this path lies madness. Now the user has to deal with the difference
between YAML values and Jinja values. Can't really reference YAML data set in
other files, or other places in this file.

It's a tangled mess. Please don't do this to your users.

Maybe use tcl? Guile? Some real language that lends itself to making a clean
api/dsl for what you're trying to do.

~~~
crdoconnor
I agree that YAML block style as on lines 22, 23, 24 should be eliminated (I
will probably prevent my framework from parsing this). If you use an unescaped
{ or } it should signify that you are using Jinja2.

Similarly, I'm no fan of the {% sets %} at the top - it's a code smell.

Apart from those things, though, what you linked to seems easy to read and
understand to me.

I'm absolutely positive that Tcl or Guile (or python) would create the
potential for bigger messes than what you just linked to under similar
circumstances. Simply being turing complete is enough for that.

------
kentonv
We should make it possible to run StackStorm on Sandstorm
([https://sandstorm.io](https://sandstorm.io)), mainly for the confusion
factor. :)

Apparently our offices are like three blocks apart, too!

~~~
lnkmails
StackStorm developer here! Don't be surprised if I turn up at your door
tomorrow :).

~~~
kentonv
In all seriousness, email me (kenton at sandstorm.io) if you want to visit us
for lunch or come to a LAN party[0] sometime.

[0] [http://kentonshouse.com](http://kentonshouse.com)

------
DevOpsDotCom
We actually just posted a piece on monitoring vs. remediating about
StackStorm, check it out guys, i think you may find it interesting and
relevant. [http://devops.com/2015/10/07/enough-monitoring-
act/](http://devops.com/2015/10/07/enough-monitoring-act/)

------
aaronbrethorst
What's the difference between this and all of the other runbook automation
suites out there? (besides the open source license)

...or am I missing something?

~~~
dzimine
Disclaimer: I work with StackStorm. In the past, I built Opalis IIS aka
Microsoft SC Orchestrator. Seen both sides.

StackStorm to legacy runbook automations is what chef/puppet to legacy config
management. It's open source, infra as code, and respect devops tools and
mindset. Some folks on our team are devops with field experience putting their
learnings in.

Our key design principles: 1) infrastructure as code, which means: workflows,
rules, action metadata, and other artifacts are readable, source-controllable
code (yaml)

2) integrations are "easy", which means: use python, ruby or shell, or turn
any existing script into action by adding yaml meta-data. If you did an
integration with something like HP OO or MS SystemCenter you appreciate the
difference.

3) yes, opensource. I think it's a deal breaker, especially when it comes to
integrations.

That's our perspective, how do you guys see it?

------
cowsay
Anyone actively using this?

Seems pretty interesting and looking for any suggestions for what may be the
best wow factor.

~~~
epowell2015
Netflix is one:
[https://news.ycombinator.com/item?id=10272955](https://news.ycombinator.com/item?id=10272955)

~~~
armabiz
Also Cisco, Rackspace.

But it should work well even for small startups/companies.

Own infrastructure as code, where you can control everything and tie together
Monitoring/Configuration management/Issue creation/ChatOps/Auto-remediation -
is really powerful thing.

~~~
zobzu
its not infra as code though, its bandaiding as yaml. so say, your logs are
filling the disk, nagios complains.

what you do, is a yaml file that goes and delete some files around when this
happens....

.. instead of... fixing logrotate config

i dont know, it feels wrong: as much work, except it also takes setup, new
machines, new stuff that can fail, be misconfigured etc.

~~~
lnkmails
StackStorm developer here.

I've worked with multiple services in multiple teams where upstream fixes take
a while and meanwhile devs and ops people get paged like crazy for a
diagonized and remediable problem. Agreed that logrotate config needs to be
fixed for this case but it is only a simple demo for auto-remediation. For
years, Cassandra dead node replacement is a 6 step manual process. You'd think
upstream should be fixed but unfortunately not. So StackStorm fills the gap
between what is ideal and what is running in production. Usually, there _is_ a
gap. See
[http://docs.datastax.com/en/cassandra/2.0/cassandra/operatio...](http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html)
vs [https://stackstorm.com/2015/09/22/auto-remediating-bad-
hosts...](https://stackstorm.com/2015/09/22/auto-remediating-bad-hosts-in-
cassandra-cluster-with-stackstorm/). That is just another example.

