Hacker News new | past | comments | ask | show | jobs | submit login
Designing state machines (drivy.engineering)
211 points by adipasquale on June 26, 2017 | hide | past | favorite | 57 comments

A paper that changed my approach to designing state machines is "Statecharts: A Visual Formalism for Complex Systems" by Harel (http://www.wisdom.weizmann.ac.il/~harel/papers/Statecharts.p...).

Statecharts (also called hierarchical state machines) are essentially generalized state machines which allow for nesting and parallel composition of states. The 'nesting' part is my favourite, since it allows one to delegate event handling logic shared by multiple states to a 'parent' state, reducing code duplication.

The great thing about this paper is that you can glean most of its key ideas by just looking at the diagrams.

Exactly. UML looked like over-complicated crap when I saw it after reading on Statecharts and Yourdon. The latter are still in use on occasion in high-assurance with Tenix's Dats Diode being an example.

Funny how most visual programming languages (for example, puredata) look like these diagrams.

While I understand the importance of state machines and its direct usage in complex Business Logic Workflows (and indirectly almost everywhere), there is nothing special in this post. It is a generic comment on State Machines. Am I missing something here?

A surprising number of entrepreneurs and developers are unfamiliar with the application of state machines to business logic workflows, so it's useful to see the topic presented in a succinct way. Good learning content tends to get upvoted on HN, especially when the discussion around the concept (for example, the discussion of State Machines and state machine libraries in this thread) is interesting as well.

you don't even have to be unfamiliar with state machines to have found some usefulness here. sometimes it's just nice to be reminded of a concept as you're wrangling the 20 different problems you face as an entrepreneur on any given day.

the blog post wasn't particularly deep, but the (short) discussion here was worth perusing, as i'm considering employing a state machine for my rails app and have been generally looking for options and best practices (in a mental background thread). but it looks like most (all?) existing gems assume a fixed set of states and transitions, and i was hoping to find something designed primarily for user-defined states and transitions.

I am always surprised to see how many software developers I come across who have no experience whatsoever applying FSM's in the course of their work.

I can unequivocally say that FSM's have made me tons of money. By that I mean that they have simplified projects, made them far more robust, easily extensible, simple to modify and, in general terms, quicker to develop with less bugs and errors.

I've used FSM's in FPGA's (for example, optimized DDR memory controllers), embedded projects, robotics and system software. In other words, everywhere.

Is "state machines" a global term that includes finite state machines like how vending machines deal with varying coin amounts that go in to trigger the price and dispense proper change?

Googled looks like it is. That was a cool homework/project.

Yes, the vending machines examples is one of the most directly relatable ones. In fact, pretty much every piece of code you write has gone through a "Finite State Automata" (during the lexical phase of compilation to be exact) which in very simple terms is just a huge set of rules saying - "if this then that".

finite state automata haha

There was an article posted here a while back on how to implement a state machine in Rust, purely using its type system to facilitate state transitions. It's a neat concept, even if it's a bit more verbose than what you'll find in other languages.


How does this compare to the expressivity of Idris' dependent types? I'm intrigued by the idea of type-safe state-machines as e.g. file i/o and network socket code is almost always error-prone. So it feels crucial to rely on the support of a strong type system here.


FWIW there is a standard to describing (finite) state machines despite the RFC being expired https://tools.ietf.org/html/draft-bortzmeyer-language-state-...

I have used an "improved" version of it https://github.com/andreineculau/cosmogol-abnf when working on https://github.com/for-GET/http-decision-diagram

I can't recommend Akka's state machines enough. It's an incredibly simple and rock solid state machines built right into the library. (http://doc.akka.io/docs/akka/snapshot/scala/fsm.html) We're already on a Play/Scala backend so adding these was incredibly easy for us, but I think they're one of the hidden amazing features of the akka/scala world.

Hmm. I've tried using them multiple times, but have ended up ditching them for one reason or another. There always seems to be one thing that makes it easier to just implement one via a regular akka actor or a regular scala class.

I just looked through and I may have missed it but do you know if this does distributed FSM management?

Akka FSM in and of itself does not address distributed FSM's. However, supporting such would likely involve their clustering[0] support and perhaps distributed data[1].

0 -http://doc.akka.io/docs/akka/current/scala/index-network.htm...

1 - http://doc.akka.io/docs/akka/current/scala/cluster-usage.htm...

I use it with akka.net, it's pretty cool

There is good tool for making diagrams (so state machines as well) called mermaid[1], just by writing simple readable text.

[1] https://github.com/knsv/mermaid

I wanted to like Mermaid, then I realized it was just a neutered version of GraphViz. I had wanted to write a Mermaid-to-Graphviz compiler but never got around to it.

It produces pretty ugly graphs that can't be exported to any other format. Meanwhile just about ever graph manipulation program (e.g. Gephi) can handle GraphViz DOT format.

I didn't know GraphViz. I have to try it. DOT language looks very similar to Mermaid.

It was too hard to make that compiler or you have encountered another problem?

I didn't even know what BNF was when I had the idea ;)

It's still on my list of short projects. I have a lot of learning to do, knowing nothing about writing parsers and such. Yes, I know I could probably hack it together with a bunch of garbage Python. But that just does not sound fun to write, let alone maintain.

Apparently some people also wanted graphviz syntax: https://github.com/knsv/mermaid/issues/5

I also find another project called Viz.js[1]. It's graphviz in javascript. Looks promising. It's hard to choose between this two. On the one hand is very simple syntax, and on the other hand more flexibility.

[1] https://github.com/mdaines/viz.js/

PlantUML is a text based solution, too. I use it write in the Atom IDE.


When dealing with business processes, I feel that the evented nature of a formal fsm can be a bit off-putting. I have on several occasions used a batch-processing approach, that maintains state in a persistent record, giving many of the same characteristics as an fsm. Let's say we have our movie from the example (pseudo-code, obviously):

    class MovieState < Model
      belongs_to :movie

    class InitialStep
      def select
        MovieState.where(state: 'initial')

      def process(movie_state)
        movie_state.state = 'in_production'

    class InProductionStep
      def select
        MovieState.where(state: 'in_production')

      def process(movie_state)
        if movie_state.movie.finished_shooting?
          movie_state.state = 'in_theaters'

    class MovieWorkflow
      def initialize
        @steps = []
        @steps << InitialStep.new
        @steps << InProductionStep.new

      def run
        @steps.each do |step|
          step.select.each do |state|
This makes it very clear when things are happening. It's also easy to debug and test.

Of course, there are a lot of limitations, most notably that it can't deal with an async event happening. In my experience though, this can be dealt with by storing the payload of that event somewhere in the db, where the step-selector can access it on next run. One thing this model deals very well with, is delays, which is often part of business processes.

Any thoughts? Does this design have a more formal name?

That's the problem with kids these days. Everything has to be a friggin class.

You know we did some pretty amazing things with computers before OO. We even landed people on the Moon.

Can anyone program without OO any more?

Just saying.

I recognise that your tongue and cheek may be in close proximity here, but care to elaborate?

It's just a silly observation. I come from a generation that had to create products like AutoCAD to run on CP/M systems with 64K of RAM and precious little permanent storage. As an aside, few people know that software like AutoCAD started on such --by today's standards-- crippled systems.

Anyone who came up in software development through that path has a full understanding of the cost of every bit of code written, every abstraction.

Today's programmers, unless they've taken university courses to expose them to the low level stuff, are extremely isolated from the innards of what they write. They reach for classes and objects by default and seem incapable of thinking outside of these boxes. All of this adds overhead and bloat, something you had to be keenly aware of with limited memory and horsepower.

I do understand that a higher level of abstraction in the context of massive amounts of memory and horsepower can be nice. What gets me is that this is often an unnecessary default.

Going back to state machines. One does not to create a bunch of classes and methods to implement them. I see that almost as a violation of the conceptual simplicity of state machines.

I mean, to go in a different direction, I use state machine in FPGA's all the time. The synthesized hardware works fine and represents the bare minimum necessary to implement these very high speed state machines. No classes involved. One can do the same in software development.

Even earlier: I implemented a Statistical Multiplexor - 8 async serial users muxed over a single 9600 Bd synchronous line - on a Z-80 with 4K RAM and 4K ROM. In 1977. Garunteed correct data delivery over lines as noisy as 1x10^3 BERT. A datacomms protocol that can do that is not a trivial beastie. The only technique that let me fit it in was state machine. It was the start of a very successful product line. I live by FSMs ever since. Software and FPGAs. And I agree about the OO cruft. I just use a function for each state and execute to completion. Perfect for event driven systems, which datacomms are a prime example. And very lightweight. In High level world, this needs a language with functions as first class objects. I have other useful representtions too. My favorite table based form is very handily implemented in Lua, should anyone be interested in seeing it.


I'll call it idiomatic form to use OO style in Ruby, which the article (and my examples) were written in. Considering the interpreted and managed-memory model of the language runtime, it is utterly pointless to worry about a few object instances. Heck, this is a language where primitives are objects.

I suspect we work in quite different areas though, which probably gives us very different perspectives of what matters. The kind of applications I build on a daily basis, deal with business processes, and integration with external systems. I/O is going to be 90% of my machines load, if it's ever doing any real work in the first place. Most of the time it'll just sit waiting for a human to do something though.

Well, of course, "When in Rome do as the Romans".

Not saying OO has no place. Of course not. That would be crazy.

Looks like an event loop.

I have recently created some dead simple code to explain the concept to colleagues whom, as the article notes, were not using this very simple and effective construct at all to control state transitions in business apps.

As the article does not really explain how state machines work let me try to do it here so maybe you will go back to your code and put one in place :) In general, if anywhere in your code you have some property/field called status/state/etc... that you update depending on events then chances are that a state machine would improve the robustness of your code.

Ok, so you have some states. For business apps this usually comes from business. For our dummy example we will simulate a day (using Haskell here for conciseness, but no worry, it's trivial and you don't need to know any Haskell to understand it):

    data State = Awake | Dressed | Fed | Ready | Work | Cafe | Home | Asleep
Now most apps stop here and thus they don't have a machinery to control how to move between these. If we were to draw a graph where vertices are the above states and edges are how we go from one to another then, without any restriction, we would implicitly get a complete graph where you can go from any state to any state. Eg. from "Ready" I could go straight to "Asleep" without "Work" :) Our aim is thus to restrict transitions from one state to another to transitions that actually make sense. Making sense obviously depends on the problem we are solving. For our example let's introduce the following events:

    data Event = Alarm | Dress | Eat | GoToWork | Think | Yawn | Coffee | GoHome | Read | WatchTV
Now that we have states and corresponding events we can draw a graph where vertices are states and edges are events. The graph is now explicit in that we can only move between states (vertices) via events (edges). To put this graph into code, let's create a function that takes a pair of state and event, eg. (Asleep, Alarm) and returns a new state, eg. Awake:

    myDay :: (State, Event) -> State
As said above, myDay is a function that takes a (state, event) pair, which we can also call a transition, and returns a state. This function is nothing more than a lookup table, aka. transition table, where we will write down how to transition between states:

    myDay (Asleep,  Alarm)    = Awake
    myDay (Awake,   Dress)    = Dressed
    myDay (Awake,   Eat)      = Fed
    myDay (Dressed, Eat)      = Ready
    myDay (Fed,     Dress)    = Ready
    myDay (Ready,   GoToWork) = Work
    myDay (Work,    Think)    = Work
    myDay (Work,    Yawn)     = Cafe
    myDay (Cafe,    Coffee)   = Work
    myDay (Work,    GoHome)   = Home
    myDay (Home,    Read)     = Asleep
    myDay (Home,    WatchTV)  = Asleep
    myDay (_, _)              = error "invalid state transition"
That's it, we have codified our graph. We explicitly stated the valid transitions and anything else is by definition invalid. We are pretty much done at this point. A state machine itself can be thought of as a generic function that takes a transition table, a transition and returns a state. This is only to decouple a specific transition table from the general machine itself, but it literally does nothing other than lookup whether the transition is in the table. If yes then it gives back the corresponding new state otherwise we get and error.

You can see a pretty picture of the above graph and some code here: https://github.com/pwm/fsm

Brilliant. Superb explanation! Just couple of days I was trying to understand this from my team mates and it took him 30 mins to make it complicated.

Excellent explanation. Thanks!

Well done. The puml script was a nice addition!

For anyone else working in ruby, I've grown to love the "Workflow" gem[1] for my state machines. It really helps to streamline things, and also has a built-in way to generate a visual of your class's states and transitions.

[1] https://github.com/geekq/workflow

Looks interesting! I've always used https://github.com/aasm/aasm, but this looks pretty good.

More generally I get worried about my ActiveRecord objects getting too bulky and business logic getting all wound up in post-transition callbacks and such. But that's something that can be managed with service objects and being disciplined about triggering state changes.

I used this for a huge project with a lot of states. Separated my workflow out into individual files for each class so they could just be included at the top. Then for each transition method, I created a service object (Interactor gem[1]) to handle all of my complicated transition logic, so the object-level transition methods would just check for a success or failure and perform the transition appropriately. Really love how it turned out.

[1] https://github.com/collectiveidea/interactor

Can't recommend state machines enough. A well designed state machine makes troubleshooting straightforward and makes adding new features a breeze without refactoring much.

In case you're using Python, I can't recommend this library enough - https://github.com/pytransitions/transitions . An absolute pleasure to work with.

For goodness' sake, no! I just ripped that package out of our production code. It's terrible! It adds methods to your object at runtime, crazy stuff.

I replaced it with just this:

        class StateMachine(object):
	    A very simple FSM class.

	        initial: Initial state
	        table: A dict (current, event) -> target

	    def __init__(self, initial, table):
	        self.current_state = initial
	        self.state_table = table

	    def __call__(self, event):
	        '''Trigger one state transition.'''
	        self.current_state = self.state_table[self.current_state, event]

	class Foo(object):

	    STATE_TABLE = {
		(current_state, event): next_state,

	    def __init__(self, xid, token, config):
		self.fsm = StateMachine('start', self.STATE_TABLE)

	    def begin(self):
	        while self.fsm.current_state not in {'success', 'error'}:
	            method = getattr(self, self.fsm.current_state)
	        if self.fsm.current_state == 'error':

If you're using Python anything more involved than a dict mapping (current, transition) -> next is just overkill. Use it, don't abuse it. ;-)

SMC (State Machine Compiler - http://smc.sourceforge.net/) supports multiple languages and is fairly simple to learn and use.

I typically try to avoid state machines. I like to have a function like model = reduce(model, event) where model is an environment of variable:value bindings. This function then just uses (if expr/else if expr) to find a condition in which to process the event. Processing the event involves changing the values in the model environment. I suppose on some level the two approaches are the same, but they feel and look different to me. If the "if/else if" pattern is deeply nested, then this is kind of like traversing a state machine to reach the correct state for processing the event. So in this case maybe a state machine is more efficient. However, a state machine, involves a history. You really need an external diagram. You need to see how you get to a state and how you transition out of a state. An "if/else if" type approach puts all the decision making in the code right in front of me. I don't need a diagram. The "if/else if" approach is often much more concise as the expressions are a powerful modeling approach. Finally, I prefer to avoid deeply nested if/else if" conditions, because it can be a chore to traverse the tree and really understand the path taken to get to the event handling spot. If the conditions can be written as linear list of rules where each rule can be thought about in isolation, I find this easier. This often involves repeating test conditions. Just flatten the decision tree by repeating the preconditions. This is again less efficient, but if you save the results of the expression computations then it becomes a simple test of Boolean values. I like to think of this approach as a rules engine instead of a state machine. A state machine encodes information in the state diagram so that it doesn't need to be stored in variables. The rules engine encodes all the knowledge in variables and rules.

You might want to read-up on table driven state machines. I'm not sure you have a full view of what's possible.

In regard to tip 4, I've been using statesman[1] which allows you to store the transitions as table backed active record objects along with any contextual metadata.

Depending on how much auditability you need on why transitions happened, this may be preferable to papertrail.

[1] https://github.com/gocardless/statesman

The podcast Podcast.__init__('Python')[1] recently interviewed the developer of Automat module [2]. The author explains when you should consider using a state machine in your Python project. I didn't realize there were so many types of state machines, and the author explains where his module fits into this ecosystem.

[1]: https://www.podcastinit.com/automat-state-machines-with-glyp...

[2]: https://pypi.python.org/pypi/Automat/0.5.0

What a coincidence, I've been working on a PHP state machine implementation myself in the last few days.

Anyway, I find the concept of "events" a bit weird in this case and I prefer to work with "states" only (state transitions rather than events that change the state). It seems like an unnecessary abstraction to think in terms of an event rather than a state.

Ultimately state is a function of time. Events are thus temporal effects from the outside world that can, depending on their values, cause a transition from a state to another state within the machine. If you model the machine as a graph where vertices are states and edges are events then a transition can be seen as a function from a (state, event) pair to a state, ie. (CurrState, Event) -> NewState. I've posted a simple explanation below, check it out if you like.

What about the DB side of things ? How do you model the state graph in the DB ?

I have been thinking of using Postgresql CTE to do this. When a cycle occurs, I just create a new step to the dB.

Anyone from CRM startups who have interesting learnings here ?

If you mean modeling the state graph itself, it's usually not modeled in the db but only in the code. It could indeed be interesting to store the graph itself if it evolves often, or at least a version number.

If you were speaking about storing the instances lifecycles, a simple model using a RDBS is to store one row per transition event in a separate table. This is what the papertrail[1] gem does for example.

[1] https://github.com/airblade/paper_trail

"Enterprise" applications such as ERPs or CRMs often have fairly complex workflow features where everything is in the underlying database - both high level workflow/state-machine definitions and compiled code.

You can use user defined aggregates in postgres to model state machines. I'm using this technique quite successfully and have been meaning to write a blog post about it.

could you talk a bit more about this ?

i was looking at https://sqlsunday.com/2014/05/25/directed-acyclic-graphs-vs-... https://www.qcode.co.uk/post/73 to learn how others have done it

Anyone got tips for working with FSMs in REST and/or MVC? Very rarely do you find HTTP verbs match FSM events leading to a fudge somewhere

You might find it useful to spend some time looking into the hypermedia constraint of REST itself (i.e. hypermedia as the engine of application state). The point of REST is to represent state machines by including hypermedia links in the responses of the API. Those links describe the state transitions a client may invoke in a given state.

Instead then of mapping CRUD operations to HTTP, you would map your application semantics to HTTP methods via link relations. Link relations let you go as far past CRUD as you would like to go depending on your domain and allow you to describe your states and state machine.

I always think a good way to think about a hypermedia state machine is in the context of HTML. Consider a todo app example. You might enter the application and get an empty list of todo items. If you have permissions to add a todo, you might have a "create todo" HTML form. Once you use that form, that todo has a new form called "mark complete." Once invoked, that todo has new transitions called "mark incomplete" or "archive." Once archived, you might see a new form for "unarchive." All of this captures the state machine and transitions in the REST API itself using domain-specific semantics.

Of course, there are other ways of solving this problem, but REST with hypermedia is a great way to work with state machines. There are lots of hypermedia JSON formats out there if you're interested in exploring (e.g. HAL, Siren, Collection+JSON, etc.).

yes,and now skip all the bs and grab yourself a real tool http://twysf.users.sourceforge.net/#A6 with a real manual http://twysf.users.sourceforge.net/ccide_man.shtml .combine this with m4 and generate efficient working code for luajit and c.and as a bonus you learn m4,a macro processor that can take output from cli programs and insert it inside your source code.much more pragmatic than lisp

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact