Hacker News new | past | comments | ask | show | jobs | submit login
State machines are wonderful tools (nullprogram.com)
338 points by chmaynard on Jan 1, 2021 | hide | past | favorite | 101 comments



No post on state machines would be complete without mention of the superset concept of Petri nets [1]. I have used Petri nets to build my own workflow engine previously. Building a workflow engine is one of those things... nearly every enterprise app has one, and normally you shouldn’t need to build one if you can source one... but it is an exercise that makes you a stronger engineer/developer too, deepening understanding of the patterns that lie beneath most systems.

Back when I was building, openly available workflow engines, BPML, BPEL etc. were not really a thing yet, so it made a little more sense to be building one.

[1] https://en.wikipedia.org/wiki/Petri_net


This is my first encounter with the Petri nets, and at first I thought they were just like state machines, in that you're only concerned with flow of control... boy was I wrong. Its going to take a while to fully grok, but it seems impressively powerful in what it can do. Thanks!


Amusingly, I was warned away from using Petri nets as the execution model for a workflow manager because "you can't prove the net will ever complete". Since then I've used several large-scale petri net-based workflow engines that ran large-scale industrial computing projects without problems, so I think that complaint was excessively theoretical.


That is funny, appreciate the anecdote. I have encountered resistance to application of lesser known ideas in daily practice in my life, because even as engineers, sometimes there is the fear that we are veering off into esoteric-land. Especially in a corporate setting, getting the balance between true engineering and the science involved and "go fast and break things" is tough.

I have always felt that the key to being an engineer is to understand that theory is a tool, and what makes it fact is science (wow, cool science! LOL) - testing the theory by putting it into practice and measuring the result/iterating. The fact that Petri nets give you a way to talk about the proof as well as model the problem is part of what makes them interesting to me.

So I guess I'm saying, moving fast while applying science.. and the learning and iterating.. that just makes sense right??? If the result is a working system, even better! Science and being agile are totally compatible!

The burden of proof is on everyone, to say, OK.. I am following this technique because it helped me build a working system. I am standing on the shoulders of giants, and the system has been tested and works... but if you don't think the theory is correct, then please, prove it wrong. Make it better, improve, iterate!

In this case, with a genuine and positive attitude, I kindly turn the computer keyboard around to the other person and ask them if they know a better way, to please explain it and I'm happy to learn and adapt. I'll show my proof if you show me yours, and we can discuss, kind of thing.


They are great! My bachelor thesis was implementing an algorithm to generate minimal petri nets based from an infinite partial language, described in something like a basic Regex.


Do you have more information on how to build workflow systems, and why is it good to do so with Petri nets rather than abstract the processes with classes or whatever else?

Any information is welcome


I'm one of the original creators of temporal.io open source project.

Long time ago I worked with an internal Amazon workflow engine that was based on Petri nets. It worked, but I learned that for the majority of the business level workflows it was too low level. It was always a hassle to map a real use case to it.

The AWS Simple Workflow service was created as a replacement of that engine. It greatly elevated the lel of abstraction. The Azure Durable Functions, Uber Cadence, and our latest project temporal.io are all use this new way to represent workflows. The basic idea is to not implement the business logic using an intermediate representation. Instead rely on runtime to make any code state fully fault tolerant. Imagine that the full orchestrating program state (including threads, local variables and even blocking calls) is preserved across any process and other infrastructure failures. This allows to write production code like this:

      boolean trialPeriod = true;
      while (true) {
        Workflow.sleep(Duration.ofDays(30));
        activities.chargeMonthlyFee(customerId);
        if (trialPeriod) {
          activities.sendEndOfTrialEmail(customerId);
          trialPeriod = false;
        } else {
          activities.sendMonthlyChargeEmail(customerId);
        }
      }
As the state is fault tolerant the 30 day sleep is absolutely OK. And if any of this operations takes long time (for example because a downstream service is down for a day) the code doesn't need to change. I bet that any Petri nets based version of the above code would be much more complicated. One of the reasons that large part of the complexity of the business level workflows is not in sequencing of actions (which Petri nets solve), but in state management and argument passing (which Petri nets don't solve at all).


Thank you for adding this flavor to the discussion. I do agree that Petri nets are often seen as too complex for basic business process modelling, which is why languages such as BPEL and BPMN were invented by OMG before, and then simplified into UML activity diagrams, for example.

That is to say, level of complexity in describing process in Petri net "modelling language" might seem higher. I can surely see how it feels more esoteric and not always clear how it ties to the implementation.

Many workflow engines take the approach you're describing, for example, I also used to use Microsoft's Windows Workflow Foundation to do similar things. Essentially you're sketching a workflow process skeleton and then managing the state atomically by compartmentalizing it, so to speak.

Actually, this is exactly what Petri nets propose - state is defined by tokens in places - i.e. compartmentalized.

I don't entirely agree with your comment about state management and argument passing in Petri nets. I do agree it takes digging around to find tangible examples/applications that cover argument passing, but the idea of "tokens in places and how they enable transitions of state" is the part of the puzzle to represent in an abstract way, the tiny pieces of state/arguments that enable transitions to fire. I could represent your code above as transitions which cannot fire until tokens representing the state of your conditions were present in the right places. For example, the passage of time was present in an input place, and the condition trial period = true and value customer ID is not blank, all as tokens that have to be in input places to enable the transitions to fire which trigger those activities.

This is to say that, I agree that representing that graphically using Petri net modelling may not be as business friendly as say, UML activity diagramming. But it also doesn't make the simpler approach any less a subset of what you can do with Petri nets, as it very much is one.

But definitely agree, use the right level of abstraction that fits the need, like the old adage, try to use the right tool for the job.

I'd argue that, the tool you describe could be modeled as a Petri net, but that perhaps you may not wish to have a user do it that way.

Do you agree or have a different opinion?


> I also used to use Microsoft's Windows Workflow Foundation to do similar things. Essentially you're sketching a workflow process skeleton and then managing the state atomically by compartmentalizing it, so to speak.

temporal.io and Windows Workflow Foundation are completely different. WF uses code to generate an intermediate representation of the workflow AST. And that intermediate representation is executed. It is similar to the most existing workflow engines from Airflow to BMPN ones.

Temporal doesn't use any intermediate representation. It directly executes the code keeping all its state durable.

> I don't entirely agree with your comment about state management and argument passing in Petri nets.

When I say about the state I mean the overall state of the business project which is much more than a token in a certain dish. For example such state can include a list of line items to order with all associated attributes and counters of processed events.

> I'd argue that, the tool you describe could be modeled as a Petri net, but that perhaps you may not wish to have a user do it that way.

All software at the lowest level can be modeled as a state machine. And Petri net is one of the possible representation of FSN. I'm interested in simplifying the way the workflow logic is specified by an engineer. At this point I believe the temporal.io approach of giving your the full power of a high level programming language and taking care of durability and scalability is the best option.


NOTE: I am not associated with temporal.io in any way, just learned about it frankly.

I just had a look through your docs and I understand the approach you're taking... similar to "infrastructure as code" concept applied to simplifying workflow. What you are saying makes sense to me. I haven't heard of temporal.io before (cool, hi!), but I get the concept of trying to eliminate the intermediate representation (which I suppose also could be seen as a bottleneck in a fully distributed system and to the "workflow" of a software engineer).

I would point out that, Petri net is a theory and modelling language for proving correctness/soundness of a model - and as I mentioned earlier, I agree that if you don't need that for your use case, yeah - it could be too much. I would like to make it clear for other readers that Petri nets are not specifically the implementation of a technology, they are a modelling technique/concept which could also be implemented/executable through an engine. Having, or not having, an intermediate representation in compiled code or runtime has nothing to do with whether or not you want to model your FSM, or graph, or network of processes, logic and state as a Petri net.

My goal in originally posting the comment was just to share the superset theory of Petri nets, as I don't often see people bring it up in discussions on workflow, FSM, etc.

The comment that Petri nets are one possible representation of FSM is true, with the key difference being that FSM are for single threaded operations and have some limitations in that regard, whereas Petri nets, as the superset, also handle concurrent operations - and workflow is a common use case where we see concurrent operations (but certainly not the only one, and industrial control, etc. is certainly more complicated that organizational workflow). I think this deck is kind of handy for describing some of the differences between FSM and Petri nets [1]. Interestingly, we know that dealing with concurrency, parallel processing, multithreading is difficult, which is why any tools to reduce some of that complexity from day-to-day coding (ex. coroutines/async/await, workflow engines, etc.) make the engineer's life easier.

When we were talking about compartmentalization/encapsulation of state, I see how conceptually you are abstracting that away from the engineer in temporal. I suppose, what I meant when I was talking about WWF was, the fact that the engineer's code and their own state could be "compartmentalized" into the pluggable definitions of the activities provided by the workflow engine. It seems like your team has kind of inverted that a little bit, but roughly the idea is similar. You appear to have the engineer "write the workflow as code" and use a wrapper around the activity to ensure the state management and simplify it for the engineer. A slight difference but interesting paradigm.

I would say, I believe that, while rarely done - one could define WWF workflows completely through code and that there is a way to serialize workflow as an artifact that could be version controlled, etc. But, how they display that visually and how WWF compiles into the intermediate representation and uses that at runtime does sound different than temporal. I suppose though, having the intermediate representation does perhaps allow for running tests against the workflow logic itself - whereas you shift the tests to the code itself.. so, different implementation but similar idea.

Again, I see how temporal simplifies that from the engineer's perspective, so kudos to you guys. I can see how what you are saying would work and be a helpful approach for a lot of use cases.

[1] https://www.cs.ucdavis.edu/~devanbu/teaching/160/docs/petrin...


There are two components to your question - the practical side, "how do I build a workflow system?" and then, "why Petri nets?".

If you really want to build a generic workflow engine, I think the way is to identify the pattern they follow and implement that. Build classes, attributes and methods that abstract away automating a process in a generic sense (e.g. turn workflow into one or more services) and then integrate those service(s) into your application as the glue that holds together complex processes. It's obviously harder than that one sentence, of course there are lots of nuances, but that's probably the simplest way I can think of to explain the high-level hand-wavy approach to doing this if you want to build it yourself, without going into all the details.

If you understand the pattern, and would rather not build, but buy or obtain an open source workflow engine/management system, many exist [1][2]. Most enterprise applications (e.g. Oracle, SAP, Salesforce, etc. etc.) have such workflow tools built in already.

Regarding the second part of your question, "why Petri nets?", well, one could argue that any system that automates a process in any way is a kind of workflow management system/engine or could be modelled as a "Petri net". I guess you could say, "why care about patterns in software engineering" then? The difference is, like many patterns, Petri nets give you a tried/tested technique for conceptualizing and modelling the system/process, validating its correctness, and in some cases, tools/engines that implement these concepts can even give you the executable framework for building too. You know, standing on the shoulders of giants and that kind of thing.

Technically, Petri nets are a superset for modelling/designing/visualizing/validating what you can do in a workflow system. Also, to be clear, the concept of Petri nets is in every way compatible with/related to object oriented design, and even functional and procedural programming paradigms.

Perhaps I can try to answer your question in this way.. by relating Petri nets to design patterns in software engineering (think, Gang of Four, Patterns of Enterprise Architecture, etc.). You know how, when you're developing software - over the years, you start to see patterns emerge? The best way to understand Petri nets and how they relate is to see what problem they solve in work you might have done yourself.

If you have ever implemented a large chunk of any information system, you start to realize, that even though it might be comprised of many smaller components working together, ultimately, it had to enable some kind of overall process to function. And how, there were certain core components that you realize are more infrastructural in nature and are re-usable? For example, logging, security (AAA, authentication, authorization, access control), etc.?

I was exposed to workflow systems when I first started working on case management systems (for example, legal cases). Case management is a scenario that includes, coordinating multiple child processes to build a "case", which is an instance of a process that is being executed. Think.. Case ID #123 is an instance of a distributed process, which results in a case file, but may have many independent and ancillary sub-processes, approvals, communications, notes, reviews, etc. that must come together to "complete" the case.

You might diagram the flow of that legal process in a "flow chart". Business people love flowcharts right? The next logical step is to think of.. what if these flow charts could represent/become computable models (e.g. math/validity/correctness) and perhaps even a shell of executable code?

You could construct an entire monolithic case management system, that handled all the work related to legal cases - by learning the business process, hard-coding the logic from that flow chart (classes with attributes and methods that define the full behavior of the system, interactions, etc. if we were doing OOP), and so on. If you did that, and you did that reasonably well, that system would certainly work for its designated scenario (legal cases).

In the midst of doing that, you might have realized that, the IT ticketing system you bought for your organization, had a similar process (for example, case management in ITIL). You might notice.. startling similarities between the basic design of that system and the legal case management one.. to the point where you start asking.. is there a pattern here?

Sure enough, there is one. The basis for that pattern is what some call workflow patterns - Professor Wil van der Aalst and Professor Arthur ter Hofstede being two influential thinkers in the area [3]. If we follow this thread, we find one helpful paper: "The Application of Petri Nets to Workflow Management" [4] ,which I think would be of interest.

Then we start to realize that, some processes don't happen in a monolithic way. They need to be "consistent", but the underlying code and execution could be distributed in different services or different systems and yet our process needs to pull it altogether to carry out the work of the process. We might see that even though distributed, these processes and systems could be represented holistically. Petri nets help in this situation.

I think one thing people might have missed when we got into this whole world of federated, independent teams, service-oriented architecture and now microservices, was that - ultimately, systematic behavior and processes must still function, and function well enough for organizations to fulfill their purpose. While self-emergent behaviors are entirely possible and arise all the time, in complex systems.. when you absolutely must make your system/process work, you need a way to engineer that and Petri nets give you a way to think about, model, and validate/reason about that.

For a current very relevant example, consider microservices today. On their own, they provide little value - but when orchestrated and working together to compose a larger system, to enable human and machine processes to execute and thrive, it would be nice to have ways to think about and model those processes that live on top. Petri nets are one way to conceptualize and visualize those processes, where that conceptual model can be proven to be sound mathematically, and even turned in some cases directly into an executable framework within which to plug in your own code. That's what workflow systems often do, for example.

Another relevant and recent case - event sourcing.. and event-driven mechanisms, and event handling, and the processing that goes along with it to essentially create a "directed process" - these are all directly related too. Also, there's a reason that data workflow automations such as Apache Spark or Airflow, for example, chose directed acyclic graphs (DAG) as models for complex, reliable, distributed process execution - and you can represent those as Petri nets too.

What I always liked about Petri nets when I discovered them, was it gave me a way to link together strict state management, with process control, based in logical/provable theory - but also gave me a way to bridge between the "human" side of process ("flow charts") and the technical side of things. It gave me a framework to "plug in" my different systems, functions, services, objects and behaviors into something that coordinated complex process both in theory and in practice.

I do not believe that one must implement monolithic process in code - as there is also emergent behavior - but, Petri nets and workflow management systems give you a way to think conceptually about how your system works.. and even potentially to build the "glue" that puts your system together, if you chose.

By the way, one interesting thing I saw that came out of this work, was simulation software where you could use Petri nets to essentially "run" a process in a simulated way and see if there were any natural bottlenecks in the process that you could optimize in advance - before building any code at all. Process simulation [5] is another whole related rabbit hole to go down, but a fascinating one!

[1] https://en.wikipedia.org/wiki/Workflow_management_system

[2] https://github.com/meirwah/awesome-workflow-engines

[3] http://www.workflowpatterns.com/

[4] https://www.semanticscholar.org/paper/The-Application-of-Pet...

[5] https://en.wikipedia.org/wiki/Process_simulation


Thanks for the long and thoughtful answer, perhaps my biggest remaining doubt is whay PNs instead of graphs and state machines, but I guess I should start reading some of the links you posted to discover more of its potential as I see there are many uses and properties which are easy to miss out.


Think of it like this - please, go ahead and use graphs and state machines. If you would like to model the operation of those graphs or state machines, one way to do it is to model them using Petri nets. And if you use Petri net modelling, you get some nice properties that have mathematical proofs behind how it works. And they might be handy for working out logical problems with your graph or state machine due to how they are constructed- which you discover based on the language Petri nets give you for talking about the process.

Like, set logic in relational modeling.. if you abide by that technique, set logic gives you certain guarantees about operational characteristics that have mathematically proven grounding. It gives you primitives for talking about set operations, like union, difference, intersection, join, etc.

Petri nets are a modelling technique. Activity diagrams (UML) are another way of expressing similar processes, but they don't necessarily have the mathematical grounding inherent in the way you model Petri nets.

Perhaps yet another way of saying it is that Petri nets are closer to a visual modelling language of execution that shows more of the logic of the concerned process or system.


Thank you very much for writing this extremely insightful post.


Thanks for the wikipedia link. Do you have any recommendations for books or papers to read? Not just for state machine and Petri nets, but also any other software models you would consider useful to learn and potentially implement.


I'm reminded of a thing I wrote about seeing a state machine implemented in real life:

https://www.solipsys.co.uk/new/StateMachineInRealLife.html?U...


That was a delightful little read, thanks!


Great little story!


It’s humbling to see a state machine implemented using coroutines. I program professionally since about 6 years ago but have only recently appreciated generators and coroutines.

I’m working through learn.unity.com for fun and because, in my estimation, there is an upcoming boom in real time interfaces for real work. All the people playing video games have mastered skills that aren’t being taken full advantage of. Anyways, coroutines in Unity can be used to do interesting things over time while attached to the game loop.

My point, despite my digression, is just how there are still basic/intermediate topics I have only been able to appreciate recently. Humbling.


100% agreed on the increasing importance of video game interfaces. Nice to see I am not alone in that thought.


Every now and again as I play RTS (and other) games I imagine how the interface could be used in for business, engineering or marketing. There’s potential, IMO.

The test (as I imagine it now) is making work fun enough that my 13 year old self would want to do work.

At any rate, ideas are cheap. Prototypes and implementation are what matters.

We’re speaking publically but I would like to know what you imagine about when you think about this topic?


I sometimes wonder what it would take to visualise the state of a system in a game engine such as OpenTTD[1]. Then I remember the video of VMware's VMWorld 2017 event keynote speech[2] where they did such a poor example of VR - cubes representing virtual machines which you could virtually throw into the trash to delete them, or pick them up and move them to another host to use VMotion. It was a poor combination of low quality uninspired use of graphics and 3D space, and individually managing servers one at a time for things VMware DRS will do (load balancing moving of VMs between hosts) and things you don't want to be doing by hand (individually provisioning and removing servers).

I'm not convinced that a game engine is capable of showing enough information compared to flat text, tables of numbers, graphs, and things made as web pages or documents where you can scroll down several pages. In a game engine, the things you aren't looking at end up far away, smaller, easier to miss. Text usually has to be bigger to be readable. How do you bring things to attention without just replicating a web browser showing a page of stats in a game engine? How do you make "work" possible without replicating a right click menu in-game, or a popup Quake style command line?

How do you envisage an RTS game interface doing work - what kind of work, and how does the interface help with it?

> "The test (as I imagine it now) is making work fun enough that my 13 year old self would want to do work."

A game you have to play for 6+ hours continuously, cannot quit, cannot pause, cannot undo/retry, almost cannot die or lose, has no story, no end to get to, and is a repetitive grind?

[1] Isometric world map full of moving parts, metrics and graphs https://linuxhint.com/wp-content/uploads/2020/01/OpenTTD.jpg

[1] which has overlays for things like cargo pressure in two directions https://www.openttd.org/screenshots/1.4-cargodist.png that could be displaying network link load, etc.

[2] https://lukaswinnnet.files.wordpress.com/2017/09/vmworldeu20...

[2] https://i.ytimg.com/vi/EiKyR4X5kxs/maxresdefault.jpg


You should also try to groc Unity's animation controller. It's a state machine with non-discrete transitions (primarily for animation blending). Neat stuff.


Not just real work but also education, for me a good interface can help build intuition about a task rapidly.


FSM are fun to write but horrible to modify especially if you’re not the one who wrote it.


The problem with modifying FSM is if you start with modifying the code. This is wrong approach.

The FSM is not a code but a model and the code is just an implementation of the model.

What you want to do is to recreate the process of modelling (or modify existing model) and then recreate the code based on that.

What I do is to have an implementation-agnostic model of the FSM that gets translated to code. That code is never modified, it only is called from external modules or calls external modules.

There are some cases where I would just hardcode the FSM without the model. Those cases are when I do not expect the state machine to change because it follows from the fundamental description of the problem. For example, I might have some communication protocol and the state machine describes the state of that communication channel. If I do my job well the state machine maps 1:1 to the problem and will never need to change even as the rest of the application changes.


In your example, what happens if changes need to be made to the protocol? It is less frequent than business/application changes, but definitely can happen.


Good protocols are usually made in layers. You would typically want to have some state machine describing lower part of application protocol on which you build higher level of your application protocol, composed of specific functions.

This way you can modify functionality without necessarily touching low level implementation.

I can give an example.

I have worked with credit card terminals and I have implemented entire application from scratch.

There is a protocol to communicate between the terminal and the acquirer's system (ISO 8583).

This protocol specifies a flow and format of messages. For example, if you send a transaction request, you will get a response. If you don't get a response you may or may not retry it, etc. If you send a transaction advice (ie. transaction has already happened) you must be retrying sending it until you get a response, but every retransmission after first must include a bit that says it is retransmission. If the request or advice has been sent and transaction was cancelled, you need to be sending cancellations until you get an acknowledgement.

Now, all those messages have a huge amount of fields that change regularly as requirements change, but the basic flow of the protocol is fairly constant and has been constant for many years.

I have implemented the protocol as a tree of state machines telling the application what to do based on events and I believe the code has not been touched in over 10 years even though the application was in constant development.


As parent has noted, FSM models are flexible boundary abstractions and you can apply them repeatedly to reach the desired level of granularity.

For example, you might start with a large block of code with many inputs and many side effects.

You can map this block of code into a FSM directly - as a "giant switch statement" of all conditional combinations and copy-pasted side effects - or you can devise a way of specifying it that makes it an "input FSM" and an "output FSM" where the input FSM enumerates the types of side effects, and the output FSM acts upon them.

You can also hybridize FSM behaviors with simple constraint optimization to create "smart" solvers: Define a heuristic for solution goodness. Then run the FSM speculatively with various combinations of inputs and rank which combo best fits the heuristic.


This is only true when considering a specific software design that implements the state machine. I'd say the most popular way to define a state machine in code is how Xstate does it: https://github.com/davidkpiano/xstate. You define your states, and then you define your inputs, and then you map out every single transition between states.

This is probably the most inefficient way to design a state machine. There's a simple mathematical explanation for that. If we agree that a state diagram is a directed graph consisting of N nodes, then there are N^2 possible transitions (that's the definition of a completely connected graph). This means there's a large space of possible transitions relative to the number of states, which means there's a high probability that you will have to modify lots of transitions when modifying the state machine itself.

In this article, this author is sharing several alternative designs for state machines, which don't look like they suffer from exactly the same problem. I'm most familiar with TLA+, where you describe a state machine simply with one logical predicate: the "Next" state relation. This is the most powerful way I've found to define state machines, and modifying one function to entirely change the behavior of the state machine is very easy in practice.


> This is probably the most inefficient way to design a state machine.

Your mathematical explanation of the combinatorial explosion of states and transitions is correct for state machines, and is exactly the reason why statecharts (an extended formalism for state machines) was created. It mitigates the problem by introducing hierarchy, orthogonality, broadcast communication, history, and other features that avoid the combinatorial explosion problem.

XState implements statecharts, not (just) state machines.


First of all, that sounded negative and I'm totally not knocking Xstate. I'm pro any library that encourages people to think about state machines :D

I'm really speaking tactically about how best to represent the machine, more of an optimization thing. For that, the features that you mentioned do help, no question there. Statecharts is a great formalism overall. But in my opinion, they help more at the macro level, like when thinking about the entire UI as a single state machine and how to modularize that.

Hierarchy and orthogonality are not always present - sometimes state variables truly interact. I'm speaking more about that scenario. I'll take an example right from the Xstate documentation:

  list: {
    initial: 'none',
    states: {
      none: {
        on: { BULLETS: 'bullets', NUMBERS: 'numbers' }
      },
      bullets: {
        on: { NONE: 'none', NUMBERS: 'numbers' }
      },
      numbers: {
        on: { BULLETS: 'bullets', NONE: 'none' }
      }
    }
  }
Here, we have a state machine that looks like it powers some logic in a word processor, where a user can create lists. You can toggle bulleted or numbered lists on and off, as well as switch between them once either has been selected. Since each state can transition to each other state, each state references each other state in the transition description. What happens when you add another state there, let's say we can have a list with letters as indexes. You'd have to add another transition from each existing state to the new one, and add a transition from the new state to all the other ones. This is an O(N) operation as we agreed since every new state means there are N possible edges from it to all other states, and each existing state can have an edge to the new state. I believe this is what people mean when state machines are "hard to modify."

What if instead we used variables to represent the state, and had a single function which determined the next state to go to based on the existing state:

  type State = {
    showingBullets: Boolean,
    showingNumbers: Boolean,
  } 

  enum Action {
    ToggleBullets,
    ToggleNumbers,
  }

  function next(state: State, action: Action): State {
    let nextState = { showingBullets: false, showingNumbers: false };

    switch (action) {
    case Action.ToggleBullets:
      nextState.showingBullets = !state.showingBullets;
      break;
    case Action.ToggleNumbers:
      nextState.showingNumbers = !state.showingNumbers;
      break;
    };

    return nextState;
  }
Now, the process of adding a new state would be just adding a new case in the switch statement. That's constant time, O(1). This is because the next function is actually a relation - it can map a single state to multiple next states.

I will admit, this may be personal preference. With the Xstate approach, where you specify all transitions explicitly, there's no thinking. You just read which state maps to which next state. A relation has logic, which means you have to compute in your head all of the possible next states. It's the tradeoff between expressive power and explicitness.


So allocate an instance from an implied set of states instead of explicitly traversing the enumeration. Nice for a business process likely to need change, or where the actual state space is large, but inappropriate for something like utf8 decode where fixed space is required.


I'm not familiar with utf8 decoding, and I focus all of my efforts on thinking about business software. I think it's pretty guaranteed that different representations of anything will make sense for different use cases.


It’s funny, I look at this and think “redux”.


Redux absolutely implements a state machine. The next state is a function of the current state and an input action. That’s the textbook definition of a state machine.


Thank you very much for this detailed explanation, I really appreciate it!


Mind elaborate on the next state predicate approach? By next, I imagine the focus is on transition rather than state; however, if you have multiple transitions arriving at the same state, what makes the next approach better? At the end of day, the state machine is the same regardless how it is implemented.


Edit: looks like they're talking about this: https://en.wikipedia.org/wiki/TLA%2B

What I describe below is not related.

----

Not the OP, but here's a stab at it...

In state machines commonly used in embedded C code, each state is represented either by a function or by a struct that contains a function pointer. The state machine calls the current state function in a loop or whenever relevant events happen, and that function returns a pointer to the next state, calculated based on inputs. If there is no state transition, then the function returns NULL or returns its own state.


See the above reply where I gave an example with both representations.

Agreed, the machine representation is independent of the machine itself, it's the same mathematical object. So what I was talking about was how to represent using only a relation which describes the possible next states given the current one. This is the most common definition of a state machine, and for me, it's a cleaner representation since you avoid referring to each edge in the state graph explicitly.


> There's a simple mathematical explanation for that. If we agree that a state diagram is a directed graph consisting of N nodes, then there are N^2 possible transitions (that's the definition of a completely connected graph).

A state diagram is usually richer than a directed graph—usually the edges are labelled. This multiplies the number of possible transitions by the number of possible labels.

(Also, for anyone interested in comparing with the graph-theoretic terminology, the term "complete graph" (https://en.wikipedia.org/wiki/Complete_graph) is often used for undirected graphs with no self-loops, in which case there are only \binom n 2 possible edges.)


> A state diagram is usually richer than a directed graph—usually the edges are labelled. This multiplies the number of possible transitions by the number of possible labels.

This is true. The technical term for this is a "multigraph."

> (Also, for anyone interested in comparing with the graph-theoretic terminology, the term "complete graph" (https://en.wikipedia.org/wiki/Complete_graph) is often used for undirected graphs with no self-loops, in which case there are only \binom n 2 possible edges.)

You gave the formula for the number of edges in an undirected graph. The number of edges in a complete directed graph is n^2 - n when transitions from a state to itself is not considered. I take liberty with that fact, and often just say there are n^2 edges in a complete directed graph, since transitions to self don't usually harm the design. Either way, it's O(n^2).

As you said though, state diagrams can be multigraphs. In that case, I'm not aware of a general formula for the number of states. But I imagine it would still be O(n^2), just with possibly more edges.


Seems like you could just add some sugar to the definition process, eg an Any State or more than one source state per transition.


Really not sure what you mean. A code example would be helpful.


The general issue is that FSM require a lot of naming, since you want to give every state a significant name, and naming is hard.

When the states are clear enough, modifying FSMs is very nice.


That isn’t necessarily a state machine problem, a state is an abstraction that maps to part of your business semantics. Naming becomes hard when your state abstraction isn’t clear, which in turn is probably due to how business semantics are modeled. One the most important things in implementing state machines is to draw it up and make sure every state and transition has clear meanings before actually implementing them. There usually will be multiple iterations of modeling before arriving at a good design.


Nomenclature challenges are a negative code smell


> FSM are fun to write but horrible to modify especially if you’re not the one who wrote it.

It really depends on the tooling you're using. For example, Qt's support for SCXML[¹] eliminates much of the pain of using state machines.

[1] https://doc.qt.io/qt-5/qtscxml-overview.html


It also depends on how you implement the state machine. If you're constrained by memory or performance (or playing golf), then you may end up with code like TFA.

But if you're implementing in a typical app, you can obviously organize and write that is very understandable.

I find the title of TFA misleading in its generality in comparison to the content of the article.


FSMs should be "written" visually, and they should have either hierarchical sub-states or some sort of a grouping scheme to keep complexity at a reasonable level.

FSM examples are usually too simplified, and they don't show how to manage the complexities of a real, more complex state machine.


But for many this is just React with Redux, and the community buzz around that has been that teamwork is made easier. You’d have a state machine if the problem called for it anyway, but with redux everything is now explicit and teammates come prepared.


Yeah, i find that using a library (like Stateless for C#) where it makes you "spell-out, upfront" each of the states and the transitions that are permitted from this state, it makes it easier to reason about the code.


Encoding parser FSMs by hand can be tedious and error prone, there's some nice tools to help you with that; I'm a huge fan of Ragel: http://www.colm.net/open-source/ragel/


I sometimes wonder if replacing the hand-coded FSMs of UI ravioli callbacks with a more parser-generator-like declaration would be a cleaner approach.

(or maybe it's already been done? I haven't done UI code since last century...)


These are some pretty cool examples, though a little too nitty gritty to communicate the practical importance of state machines. The truth is, state machines are so fundamental that programmers are not even aware that they are using them. Many, if not most, computer programs can be seen as state machines. In fact, programming can quite literally be seen as state machine creation and management. Programming languages are tools for creating complex state machines without having to manually wire them together.

For example, take the following factorial function in Python:

    def factorial(n):
      result = 1
      while n > 1:
        result *= n
        n -= 1

      return result
Here is a TLA+ spec for the same algorithm, along with the corresponding state diagram for calculating factorial(5): https://gist.github.com/amw-zero/c604f4dfd2d98e0d77fffe602c6.... This is a simple example, but the state diagram illustrates an important point that the while loop doesn't: the algorithm is completely linear, meaning it simply builds upon previous states. Even though there is branching in the code, there is no branching in the state diagram.

Of course, we're not so lucky and most state spaces are not linear. Here's the spec and diagram for a user interface that I was building a while back: https://gist.github.com/amw-zero/4dc16cbf8e578d9be9e68b0a85c.... There was a subtle state-based bug where a user had to perform a special sequence of events to experience it. The code seemed simple, but looking at the state diagram directly showed what the bug was immediately.

This was in a React app, and doing this exercise made me realize that React apps trivially translate to state machines as well. Whenever you call `setState` or the corresponding `setHookValue` function when using hooks, you are defining a state transition. The set of all state variables determines the full state space of the application, and bugs arise when we forget to consider that fact within some of our transition functions.

Long story short, you don't need to use a state machine design pattern such as Xstate to actually have a state machine. In fact, unless you are doing some very weird things, your application most likely already is a state machine under it all, but that's only because a state machine is a very simple, general concept that has tons of applications.


Yup. Basically all enterprise software is just state machine design. Any simple RESTFUL app with a database is just a state machine. Call one end point to create a resource, fetch it later from a second. Calling them out of order yields an error state. Super simple state machine. Of course we've reinvented this wheel so many times.

Something like LISP is fun because you can get both the executable and (human AND machine) readable representations of a state machine out of the exact same source code. Thats still the future of programming.


We’ve reinvented the wheel because CS education is really bad at teaching what computation actually is. Then, if you did not study CS (which I did not so I know), you’re really on your own for figuring out what the heck you’re actually doing when you program.

There are some very simple and elegant concepts at the bottom of the chain that are really useful to understand, and state machines are for sure one of them.

Even with state machines, people often dislike talking about them because they’re seen as impractical since they don’t always have a great correspondence to code. But computation is more than just code.


i think this has always subconsciously confused me whenever people talk about state machines, and your comment clarifies what i think ive felt. what would qualify as not a state machine in the broad sense?


That’s a great question. Combinational logic is an example of something that computes that isn’t a state machine. You can compute Boolean algebra operations with no notion of time, they’re essentially pure functions.

That’s where it gets slightly confusing though. Just because a function is pure, does not mean that its implementation is not a state machine. For example, an algorithm to compute the greatest common denominator between two numbers is stateful, even if it has a pure functional interface.


Cool trickery, but should I ever encounter one of those hex tables in a program I am maintaining, I would certainly hope to see at least the state graph of the encoded machine as documentation.


Some random fun things you can do with DFAs/NFAs:

1. From two machines you can build a single machine that corresponds to running them in lockstep/parallel. Most language classes are closed under unions but regular languages stand out by being closed under intersections, and that relies on this construction. You can't replicate this construction with pushdown automata to show that context-free languages are closed under intersection because the product automata would require two stacks. But in fact the intersection of a context-free language with a regular language is context-free via this construction.

2. You can run them backwards. Usually works best with NFAs since reversing a DFA can seriously inflate the state space. E.g. if you're trying to match a regular expression of the form '<re1> string <re2>' you can search for 'string' and then match re1 backward and re2 forward. Running machines backwards can also be used for computing capture groups after a match has been found since the fastest methods for matching don't let you compute capture groups on the fly. Decoupling the re1 and re2 machines also means you end up with two independent machines with smaller state spaces, which means smaller, cache-friendlier lookup tables. See the next item for a less conventional example of what you can do with small state spaces.

3. While DFA execution seems inherently serial, it can actually be parallelized very effectively if the state space is small enough, through speculative execution. Suppose you want to lex a string in parallel by chopping it in half. You don't know what state the DFA will be in when it reaches the halfway point, but you can process the second half for all possible states at that point. Then when both halves are finished you throw away all but one of the speculative answers for the second half based on the final state for the first half. This is extremely fast and practical with PSHUFB/TBL-like instructions if your state space is small enough, at most 16 states for PSHUFB. You can mux together multiple PSHUFBs to go beyond that limit but the cost increases substantially. (This is a parallel prefix algorithm when applied recursively. As far as I know the application to parallel lexing goes back to the Hillis-Steele paper on data-parallel algorithms from 1986.)

4. There's a critical optimization for the last algorithm which is sometimes applicable. A log file processor might have numerous internal states while processing a line, so you'd rather not speculate on all of them. Instead you can chop the file in half and from the second half seek forward to the next newline and start there. This avoids speculation entirely if there's a unique state following a newline character regardless of what state you were in before the newline.


What are DFAs/NFAs? I didn't see them defined in the article.


Deterministic/non-deterministic finite automaton. It's the standard term for a finite state machine which processes a stream of characters. All the examples in his article are DFAs.


Thanks! I'm interested in the idea of a reversible state machine, for parsing an application specific language syntax into some machine representation, then allowing the user to modify it and unparse back to syntax again. So far I've had to implement procedural code for both directions but I figure there must be some way for some smart code to both parse and unparse using some declarative parser rules^. I've had no luck finding any existing art so far.

^I figure this won't be super simple, since e.g. whitespace is often ignored at the tokenizer step and would need reinjected by the unparser, but that sort of stuff is probably not too difficult to handle.


Notwithstanding caveats about what "reversible" means here, I'm interested in the application. I've been interested in "non-destructive compilation" for a while, including what you might call "language skins"—derivations built from reversible, deterministic transformations on the syntax graph created from an arbitrary parse. With the current popularity of (one-way) transpilers, I'm surprised there's not more interest and work on making them two-way, for example, but that's just one possible application.


> I'm interested in the application

I'm part of a team developing a scientific modelling tool. The user defines a model via syntax similar to SPICE's netlists, specifying model objects and their parameters and connections. The nature of the type of simulation requires that the user runs the script over and over while tweaking various parameters, some by hand and some via optimization routines. We use the syntax as a way to serialize the complete state of the model, to be embedded in the generated results (the script size is negligible compared to the results). It is therefore a requirement that we can generate syntax from the model that, once re-parsed, results in the same model again - thus the need for forward and reverse parsing.

My current approach is to store the parsed AST as part of the model, then start from this (updating anything that was changed/added/removed in the meantime) when generating syntax.

> I'm surprised there's not more interest and work on making them two-way, for example, but that's just one possible application.

I suppose it's a combination of not many users typically wanting such a feature, the ease of storing a program's state as a proprietary/binary blob (Python's pickle module for example), and the tight coupling such a system creates between syntax and objects: almost everything created by the parser should map 1:1 to syntax, even if modified by the user after parsing. One example of that last point: you might define some sort of axis via syntax, like `range(100, 200, increment=1)`. The parser then creates this axis as an array of 10 data points. But the user may also modify this in memory, e.g. adding `50` to the array, so it's no longer a monotonic range. when you come to unparse you need to have some code to identify what this array represents (is it still a range? or a sequence with no order?), and generate corresponding syntax. That's tricky and requires lots of code for this usually niche feature.


"Reversible" meant consuming the character stream in reverse order while matching the same language, so it's different from what you have in mind--automatically generating a printer from a parser. On the surface it sounds very doable if each pattern of AST node is generated by a unique grammar production. Then pretty printing is the task of matching against an unambiguous tree grammar. It's not really different from what the compiler for a language like ML or Haskell does for pattern matching.


Ah, now I see what you mean.

> On the surface it sounds very doable if each pattern of AST node is generated by a unique grammar production.

This seems to be the key, from my work implementing this in my application. Luckily for me the grammar itself can be somewhat tweaked to help ensure a 1:1 mapping between productions and objects, at least until we release v1, but if someone is having to work with an existing grammar I can see the need for all sorts of hacks.


Yeah, the good news is that your tool can statically detect the ambiguities and you can let users specify priorities for the conflicting productions to define a unique mapping. That's a standard technique in parser generators. There the ambiguities are in the opposite direction, but the same idea applies.


I'm interested in doing that as well, I've often thought of being able to go from Pascal -> ast + symbol table -> executable and back.

You would want to preserve all the whitespace, to make it completely reversible. The ability to modify the symbol table to directly rename a variable or procedure, would be quite powerful, and never wrong. 8)



Thinking about point 1 makes me wonder why no popular regex engines have an intersection (logical AND) operator. It seems like it could allow writing complex regexes in a much clearer way.


Most popular regex engines use backtracking and support non-regular features. They aren't based on automata, so implementing AND is perhaps not as theoretically straight-forward in that context.

There are some popular finite automata based engines though. RE2 comes to mind. In theory, things like AND and complement could be implemented, but in practice, they blow up the size of the state machine.[1, 2] The current maintainer of RE2 did build a tool that permits intersection and complement though.[3]

[1] - https://dl.acm.org/doi/10.1145/2071368.2071372

[2] - https://github.com/google/re2/issues/156#issuecomment-354068...

[3] - https://github.com/google/redgrep


Regarding state space blow-up, the only thing I'd add is that after DFA/NFA minimization you always get a canonical state machine (up to state relabeling) so the size will be the same regardless of how it was constructed. This doesn't mean that state space blow-up won't happen but that there isn't any inherent overhead (after minimization) compared to writing an equivalent regular expression by hand without using the product/complement constructions. But minimization interacts poorly with some practical features like capture groups, so you could probably only use the minimized automaton for matching and you'd have to compute the captures after a match some other way.


DFAs can't do capturing anyway. They aren't expressive enough. You can do it with "tagged" DFAs, which I believe are equivalent to transducers. In that case, I would imagine minimization would work just fine, it would just probably be ineffective. The paper that came out of the re2c project is good here.[1]

But sure, I think it's just a matter of adding features where most uses of them will result in pretty poor performance. In RE2's case, state blowup manifests by thrashing its hybrid NFA/DFA cache, which slows it down considerably and eventually will push RE2 to fall back to the NFA simulation.

And I'm actually not sure off the top of my head how complement/intersection impact the Thompson NFA construction. That has to be reasonably fast and space efficient on its own.

[1] - https://re2c.org/2017_trofimovich_tagged_deterministic_finit...


This is interesting, thank you!


Could 2 lookaheads starting at the same position not be used for this purpose?


Lookahead is what causes a language not to be regular. In fact, common regex libraries support broader classes of languages than the regular ones. Perhaps that's why there's no `and` operator.


By any chance, do you know an example of a regex library or tool that uses that backwards-matching trick?


I learned the specific 're1 string re2' decomposition with backward matching from Hyperscan but AFAIK several fast grep-oriented regex engines factor out both prefixes and suffixes and use whichever is longer/most rare as a substring anchor. For the suffix case you need backward matching to confirm the regex match after the substring match. Hyperscan actually decomposes the entire state machine graph into 're1 string re2' style fragments, which lets them select the fastest implementation technique for each sub-machine.

I think Russ Cox talks about backward matching in his article series on regular expressions, so it's probably used somewhere in re2.


The grep-oriented tools have it a bit easier. They can (and do) extract inner literals as well. The trick is that the search is line oriented and lines tend to be small. So when you find a match for an inner literal, all you need to do is find the bounds of the line that contains the literal and run the full regex engine on just that line. It works extremely well when matches are somewhat rare.

As for RE2, backward matching is used to find the starting location of a match.


Russ Cox's article in question:

https://swtch.com/~rsc/regexp/regexp3.html


Every program is fundamentally states and state transitions. You can be explicit about it or not, but the states and transitions are there regardless.


Absolutely. This is the definition of what computation is. Even in the lambda calculus, where everything is defined with pure functions, we build up state sequentially over time, based on previous values of the state. The fact that there’s no mutation under the hood is inconsequential.


State machines are incredibly useful and I use them all the time when I write UI code.

There are a couple benefits that come to mind with respect to writing UI. First, it forces you think through all the potential "inputs" on your system, be they button presses from users or network request responses. You have to be explicit about what could have an effect on the system. Secondly, it makes your business logic easier to reason about because you essentially turn your UI code into a bunch of functional statements. All your transitions are just pure functions where the input is the previous state and some input, and the output is the new state.

A lot of UI frameworks do it this way today, of course, but I picked up from reading about how Elm works. Whenever I write new UI from scratch, I always end up falling back on the state machine for guidance.


I don’t mind being reminded every 2-3 years how great State Machines are. I was first introduced to the concept back in 2002 and my mind was blown. At that time, I was in my 20s...now I’m pushing past 40 and I’m starting to think my life is a state stuck in an event loop.


FWIW, here's an example I wrote in Python "that imitates what “compiled” FSM code might look like in an “unrolled” form. Most FSM code uses a little driver loop and a table datastructure, the code below instead acts like JMP instructions (“jump”, or GOTO in higher-level-but-still-low-level languages) to hard-code the information in the table into a little patch of branches."

https://joypy.osdn.io/notebooks/Derivatives_of_Regular_Expre...

It's part of an (incomplete. slightly broken) notebook on Brzozowski’s derivatives of regular expressions.


I ended up using one in MaraDNS 2.0 to parse configuration files, since the C code for parsing MaraDNS 1.0 configurations files was a lot more error-prone and hard to maintain.

The state machine’s grammar is embedded in the Deadwood (MaraDNS 2 recursor) binary.

Here is a full explanation of the finite state machine, along with the actual embedded code:

https://github.com/samboy/MaraDNS/blob/master/deadwood-githu...


Isn't FSM more like writing a program consisting only of GOTO's? I call this the "loopz-effect", where any single state may have effect in any other state of the program.


It's the opposite. In a FSM you have clearly defined states and all possible transitions between them.

If you write the same code as normal imperative code that modifies some variables - to understand what it does you have to "run it" in your brain and reverse engineer the states and transitions yourself.


Strictly speaking, yes. But you should think of it as a formalized abstraction over that; a way to reason about something you can easily translate into a bunch of goto statements.

It's just like how all if/then and function call constructs compile down to goto, as well. Some languages will offer more abstraction over this, but you can still do such constructs in assembly language if you want to by hand-holding the machine through every step.

If you do find yourself writing assembly language (or C, for that matter!) then you should do just that; implement high-level constructs at the low level with a bunch of jump statements. The resulting code is verbose and superficially seems gnarly, but will also be far more scalable and robust.

But there are compilers for a reason! There are some examples of automatic FSM to C compilers elsewhere in this thread. Some very high-level languages have embedded domain languages for FSM generation as well.


A FSM is simply a structure that progresses over time, where the next state is dependent on the previous state. By your usage of “goto,” I feel like you’re implying like this is a negative thing. But state is an inherent part of computation. A Turing Machine can be seen as a slightly more powerful state machine, and most if not all of the code that we write can be easily modeled with the less powerful state machine.

And no, functional programming does not actually help this.


Yes. And not only that, but GOTOs are the normal recommended way of implementing an FSM.


What? Where? Most sane people write FSM as switches inside an infinite loop


Right here, by me. Or perhaps I'm insane.

Using goto ensures that all of a particular state's logic is nearby in code. It also eliminates the need to check an exhaustive list of irrelevant switch cases at runtime.


Relevant example case is in a branching interpreter dispatch loop vs compiling threaded code, and all the hybrid concepts in between: https://en.m.wikipedia.org/wiki/Threaded_code

It's a very old CS construct where right answers depend on the environment. In many modern structured languages you don't have a goto at hand so you're ushered towards switch statements, but then you can opt to do weird things like throw an exception and use the handler to jump.


Makes perfect sense. Languages with first class function values may shine here, ie. Go.


Using a direct goto should be faster because of fewer conditionals.


It's more like having a switch statement inside of a loop.

The variable you switch on is the next state and the case you're in is the current state.


A long time ago I used to auto gen http and smtp servers and message queues (similar to RabbitMQ) from very compact descriptions. My in-house I infrastructure build-out productivity went up ~30x, but all the downstream teams hated it.


Now you equivalently have two problems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: