I would say that the premise of this article is a bit wrong. Naturally the examples given about the language usage in physics do look odd because these are very simple examples based on the form of the writing that has been agreed on long time ago.
Now when we evaluate something more elaborate then we will suddenly discover that there are multiple ways to write down some very fundamental concepts. For example Maxwell equations can be written in many different forms http://www.maxwells-equations.com/forms.php
So in fact the language is a very important tool even in physics.
Even the author continues and defines a new language to express the relations between and in the state machines.
Nonetheless this is an interesting piece of the writing.
It's a bit of a subtle point, but it is hard precisely because physics doesn't quite have anything that is analogous to programming languages, and Lamport does draw a distinction between a universal formalism -- "ordinary" math -- and a programming language, which is something that is far more complex. The different forms of Maxwell's equations you present are all very much written in the same language of multivariate calculus (but in different words).
I would present Lamport's thesis like this: engineering disciplines usually use a different formalism to reason about system from the one they use to construct it. Consider an electrical engineer designing an analog circuit, and a mechanical engineer designing a contraption made of gears, pulleys and springs. When reasoning about their designs, they can either use the “standard” mathematical approach, plugging in the equations describing what the components do, or invent an algebra of electronic components and an algebra of mechanical components and reason directly in a “language” of capcitors and resistors (one capacitor plus one resistor etc.) or a language of gears and springs. The engineers obviously use a domain-specific tool to design the low-level details of the system, but they use standard math to reason about it (to guide their high-level design and to make sure it achieves what they want). Lamport suggests that software engineers do the same.
In practice, most software engineers reason about designs only informally, and even then mostly in their heads. This is fine for simple things, but very hard to do when more complex software systems are concerned. The software verification (academic community) indeed uses an approach similar to Lamport (which uses more-or-less simple math to reason about programs), but they hide the techniques from the user, exposing them merely to the results in a sort of a CAD tool. The programming language theory community indeed tries to make mathematical reasoning explicit, but they're trying to make reasoning possible within the construction language (like the algebra of gears). There are a few advantages to that approach, but one major disadvantage: because programming languages, as Lamport notes, are so complex, the mathematics required to reason about them syntactically is very arcane and extremely complex; it is very much a completely new math. And because it is so new and so complex, PLT researchers have not yet been able to accomplish the task. Reasoning in that new math is currently much harder and much more limited that in Lamport's "simple math" approach[1]. Of course, it's possible that PLT will come up with good solutions in the future.
But in addition to the big pragmatic advantage of Lamport's approach (which may be temporary), he claims that using the universal language of "ordinary" math helps find the great similarities between different programs that can be written in many different ways, but are essentially the same. This helps to better understand what computation is, and also helps focus on the core issue -- how the program's environment behaves, how the program interacts with it, and why the program's behavior achieves its goals -- rather than being distracted right from the beginning by details such as the choice of abstractions (which are important only after the core issues are understood).
[1]: I sometimes see academic papers about how to reason about pretty basic program properties in Coq that would never be written about Lamport's TLA+ for the simple reason that an ordinary engineer would find them almost trivial in TLA+ after only a month or so since first encountering it.
> The programming language theory community indeed tries to make mathematical reasoning explicit, but they're trying to make reasoning possible within the construction language (like the algebra of gears).
We discussed this point before, and this comment finally makes it clear to me why we never saw eye to eye.
Equational reasoning is a tool for reasoning about programs, but not the only tool. Lamport uses it too, in the form of refinement (this is exactly what is used in practice, outside of "educational" papers about functional programming). More importantly though, nobody insists that all reasoning has to be equational. The Hoare logic and its Owicki-Gries variant for concurrency are the grandfathers of modern program logics, which give you a very precise language (ordinary math + X, same as what Lamport wants) for reasoning about program correctness.
The only difference in this paper is that Lamport insists that programs should be syntactically represented as state machines, while most PLT people would insist that this is too much of a restriction. Typed programs enable more abstract programs and specifications, because the abstraction mechanism offered by types is in some sense dual to refinement. Types allow you to restrict client code in a lightweight way. Combined with higher-order functions (i.e., first class functional abstraction), this allows you to achieve more abstract specifications than what can be done with state machines.
If I have a program in TLA+, I represent it as a state machine and any specification I write down is essentially just a refinement of this state machine. You cannot really hide internal states, unless you can explicitly skip them via simulation (or at least that is my impression of TLA+, you are far more experienced here and very welcome to contradict me).
In a typed programming language, quantification gives you the dual abstraction of restricting the users of your program. Existential types allow you to pick an invariant relation between two implementations. Crucially this invariant does not have to be a (bi)simulation. If you look at this from a state machine perspective, then these can be two completely different state machines externally, but they are different in a way that clients can't exploit.
For a concrete example, in https://people.mpi-sws.org/~dreyer/papers/caresl/paper.pdf , the authors show refinement between flat combining and locking, as well as between a lockfree and a lock-based stack with iterators. In this generality, this is only sound for typesafe clients.
----
If this sounds overly negative and critical of Lamport's work, then I apologize. The main point I want to make is that we are using substantially the same tools. There is nobody in program verification who seriously insists that equational reasoning is the only tool the world needs. The point of equational reasoning was always that it is a lightweight method which is easily taught to ordinary programmers. That's what makes it valuable.
If you encounter a paper (typically by Richard Bird :) ), verifying complicated programs using only equational reasoning, then take this with a grain of salt. The interesting thing about such papers is that it's possible at all ("see how far you can push this idea!"), but I've yet to meet anyone who seriously insists that this is always the easiest proof you can come up with...
> Equational reasoning is a tool for reasoning about programs, but not the only tool. Lamport uses it too, in the form of refinement
I wouldn't call refinement equational reasoning. Equational reasoning, as I understand it, is the use of value semantics for terms, within an algebra for term composition. It pretty much requires that what a computation does is encoded in the value that the term denoting it evaluates to. I'm not sure how refinement is related, as it presupposes neither value semantics nor an algebraic composition of terms. It is a relation on sequences. As in program logics like TLA all terms evaluate to TRUE or FALSE, and the algebra for combining them is boolean algebra, the argument could be made that this is a form of equational reasoning, but I don't think it is, as the boolean values that the terms evaluate to have nothing to do with values computed by the computation.
I haven't had any experience with equational reasoning, but the examples I've seen show it to be no easier than assertional reasoning, and unlike assertional reasoning it does require a particular programming style.
> The only difference in this paper is that Lamport insists that programs should be syntactically represented as state machines
Just to make sure: He does not intend for compilable programs to be represented as TLA formulas, just as electrical circuits are not laid out with ODEs. He says that reasoning about the properties of programs should be done in a system like TLA, just as electrical circuits can be modeled as ODEs. TLA+ itself comes with a tool that lets you write specification in a syntax called PlusCal, which is similar to familiar imperative languages, and Lamport himself says that he made that language because that syntax may be more convenient for some tasks.
> while most PLT people would insist that this is too much of a restriction.
Reasoning syntactically about a program in its ready-for-construction source has several advantages, but I doubt anyone would view TLA as any sort of a restriction (quite the opposite: it allows to specify discrete dynamical systems that are not even computable). But the main difference for practitioners is this: TLA works today for a wide range of programs. The ideal of reasoning about programs in their source language with similar cost is far from realized. TLA succeeds in practice simply because it is far less ambitious.
> Typed programs enable more abstract programs and specifications, because the abstraction mechanism offered by types is in some sense dual to refinement.
I am not sure what you mean.
> Combined with higher-order functions (i.e., first class functional abstraction), this allows you to achieve more abstract specifications than what can be done with state machines.
I am not sure you quite realize how state machines are used in TLA. You can use higher-order functions to your heart's content, as well as higher-order state machines. The machine's are abstract, and what constitutes a state is completely up to you. TLA+ is rich enough that you can write an entire program completely using pure functions, that executes as a single TLA step.
But reasoning with functions is a specific level of abstraction. TLA allows you to go arbitrarily higher or lower. Functions are too high an abstraction if you're interested to reason about, say, complexity, and they're too low an abstraction if you want to reason about interaction (where they can only constitute single monadic steps). They may be either too high or too low for concurrency. Of course, you can embed TLA in FP and vice versa, but TLA is "naturally" broader in scope.
> You cannot really hide internal states
Why not? That's what existential temporal quantification is for. `∃∃ x : F`, means that F performs its observable operation using some internal hidden state, x (∃∃ is usually written as bold ∃)
> In a typed programming language, quantification gives you the dual abstraction of restricting the users of your program.
I'm not sure exactly what you mean. You can achieve the same thing you can with types in an untyped TLA-based language like TLA+. For example, the typing judgment `f:A->B` is simply a conjunction with the proposition `f ∈ [A → B]` in TLA+. Remember that the environment is also part of the process (or machine).
> Crucially this invariant does not have to be a (bi)simulation
The concept of bisimulation is a rather complex one designed to explain the difference between trace inclusion and stronger forms of equivalence in algebraic models that do not model state explicitly (like CSP/CCS). In TLA, bisimulation and trace equivalence are the same, and simulation is just trace-inclusion. The famous result "the existence of refinement mapping" by Abadi and Lamport shows that this is complete, namely that any implementation relation "A implements B" can be modeled as "A refines B", which in TLA translates to the familiar implication operator `A ⇒ B`. So there is no such thing as two machines, one implementing the other, that cannot be expressed in this way.
> In this generality, this is only sound for typesafe clients
Again, in TLA, a "typesafe client" is no more than an assumption about the properties of the client "process". In TLA, a specification often takes the form `◻User ∧ ◻Program` or the equivalent `◻(User ∨ Program)`
> That's what makes it valuable.
I would say that in practice the opposite holds. TLA+ is used by "ordinary" engineers to reason about systems that are far larger and trickier (concurrent, distributed) than anything that is currently done by engineers using equational reasoning in typed PFP. This advantage may (or may not) be short-lived, but I think the combination of simplicity, ease of learning, and scalability both in size and in scope is precisely TLA's pragmatic edge.
> But reasoning with functions is a specific level of abstraction.
Excellent point. The level of abstraction is a fundamental concern, not just in modeling but also in reasoning about systems. And state machines (as used in TLA or otherwise) add temporal abstractions to the mix of abstraction mechanism. To add to its power, it does not lead to any loss of generality; since any program in a language with well-defined operational semantics can be modeled using state machine (transition systems with infinite state space to be little more precise).
> In TLA, bisimulation and trace equivalence are the same, and
simulation is just trace-inclusion.
There is a fundamental difference between (bi)simulation and trace inclusion(equivalence); the former is a branching time notion and the later is a linear time notion. In fact if one it is easy to show if A is a simulation of B then A is also a "trace inclusion" of B. And this difference has some practical ramifications as well.
The branching time notions of refinement also has (not so famous) the completeness result.
This has some practical ramification as well: the refinement map in Abadi/Lamport is a projection function, while in branching time it can be arbitrary function. The former is intuitively clear but often difficult in practice. The flexibility has been exploited to efficiently analyze correctness of several non-trivial systems like pipelined processors.
> So there is no such thing as two machines, one implementing the other, that cannot be expressed in this way.
To drive home the point what the completeness result means is
that one can always reason at the level of abstraction that was use to specify the system: state and their successors. This is what we are after in type-based reasoning: reasoning at the level of functional abstraction.
> the former is a branching time notion and the later is a linear time notion
I think it is more precise to say that you cannot formally encode the concept of (bi)sumlation in LTL while you can in CTL. The notion itself, however, is a general structural relation between state machines that is independent of the temporal logic you use to formalize properties of their behavior, if any. I.e. (bi)simulation would exist even if temporal logic didn't.
> the refinement map in Abadi/Lamport is a projection function
One of the few things I don't like in TLA+ is that you can create a "refinement" mapping which isn't sound (and therefore isn't a refinement). In particular, you can add auxiliary variables that interfere with the specification. I know Lamport is working on a paper where he gives syntactic rules that ensure that the mapping is sound.
> This is what we are after in type-based reasoning: reasoning at the level of functional abstraction.
The challenge is making that reasoning rich enough (to reason about a wide class of programs), scalable enough, and at the same time affordable and simple for engineers. This goal is far from realized (and currently falls short, at least in the last two requirements).
P.S
I also think that TLA allows reasoning about structural properties of the state machine without any use of temporal logic at all. If A and B are action formulas and S is a state-predicate, then `S ∧ A` denotes the set of transitions from the set of states denoted by S (conversely `S' ∧ A` is the set of transitions into S). And so `S ∧ A ⇒ S ∧ B` means "for every transition A leaving S, there is a transition B leaving S". I think that you can therefore show (bi)simulation directly, structurally, rather than behaviorally; there is no use of behaviors (traces) or computation trees here at all (you may want to show that all the states S are reachable from the initial states).
[1]: Unless behaviors are sequences of states, in which case the relationship coincides with the trace-equivalence relationship. There may be subtle technical differences (that I don't quite understand) even then, but they don't matter.
> create a "refinement" mapping which isn't sound (and therefore isn't a refinement).
Could you please illustrate what do you mean by unsound refinement ? Refinement mapping along with the notion of correctness forms the part of "trust computing base". More precisely, the statement "A implements B under a refinement map r" assumes two things (1) what it means to implements; is it trace inclusion or simulation or stuttering simulation .... (2) what is the refinement map r. What am i missing here ?.
Also can you please illustrate (with an example possibly) on the syntactic rules that Lamport is working on ?
Addition of auxiliary variables to the specification introduces an extra proof obligation, namely, that it does not change its observable behavior. Often this is not very difficult as one can syntactically differentiate between auxiliary variables and the state variables.
>I also think that TLA allows reasoning about structural properties of the state machine without any use of temporal logic at all.
Definitely. Like you mention the refinement based on (bi)simulation or trace inclusion is different from the use of temporal logic to formalize properties of systems. Though two are related in the sense of adequacy of temporal logic: simulation preserves ACTL* properties while trace inclusion preserves LTL properties.
> I think that you can therefore show (bi)simulation directly, structurally, rather than behaviorally; there is no use of behaviors (traces) or computation trees here at all.
I do not quite understand what you mean here. Behaviors are succinctly represented using state machines -- set of states and there successors-- and then we look at a state machine as a generator of (possibly infinite) behaviors -- either as traces or computation trees.
The scenarios where trace inclusion requires addition of prophecy and history variables to recover local reasoning (in other words structural reasoning using state and its successors) is more related to the completeness argument you mentioned earlier.
> (1) what it means to implements; is it trace inclusion or simulation or stuttering simulation .... (2) what is the refinement map
Well, in TLA there's only one kind of implementation relation, as the three relations you mention coincide since a trace is a sequence of states, under a stuttering equivalence. All other implementation relations (like event-trace-inclusion) are the same relation on a reduced (abstracted) machine, with some of the state erased, or a refined machine with an added time variable. So the map (which we take to include auxiliary variables) fully describes the "kind" of implementation.
> Could you please illustrate what do you mean by unsound refinement?
The TLA+ features are, of course, completely sound, but you may not be mapping what you intended to map. The trickiest problem has to do with prophecy variables. You need to ensure that they don't restrict the behavior of the spec, and it's not always obvious.
> Also can you please illustrate (with an example possibly) on the syntactic rules that Lamport is working on ?
The simplest rule is that if the spec is
Init ∧ ◻[Next]_x
and, under an invariant:
Inv ∧ Inv' ⇒ (Next ≣ ∃i∈Π : N(i))
Then the spec remains equivalent when introducing the prophecy variable p like so:
∃∃p : (Init ∧ p ∈ Π) ∧ ◻[N(p) ∧ p' ∈ Π]_<x, p>
This basically means that the prophecy must cover the entire parameter space of N, and that once used, a prophecy must be "forgotten". The problem is parameterizing N. So the rules treat various forms of Next (disjunction, quantification etc.).
But if Next is parameterized well so that its behavior isn't restricted by p, then obtaining a useful value from p for the mapping can be tricky: so you know what value will be picked by an existential quantifier, and you know what disjunct will be chosen etc., but you still need an interpreter for those choices to obtain the result of the machine's action. You could write the spec itself in a clever parameterized way so that it would be its own prophecy interpreter, but then you'd make it completely unreadable.
In short, the problem is that you need to make sure that your prophecy variables are sound, but you still want to keep the spec at that refinement level readable and not sacrifice it for the sake of refinement to some other level, and the two requirements pull in opposite directions.
What you could do (and that's what I did in a complex spec), is to use a prophecy variable that interacts well with the spec (i.e., doesn't make it unreadable) but doesn't conform with Lamport and Merz's rules, and then prove that it is sound. Unfortunately, that property alone can be 95% of what you intend the refinement to show in the first place, and here the model checker can't help you at all.
We really need a model checker that can handle temporal quantification.
> I do not quite understand what you mean here.
Oh, it was just to emphasize the point that simulation is a structural property, unrelated to temporal logic or any notion of time or behavior.
Thanks for the explanation especially with the example.
I now understand what you meant by "unsound refinement" and the desire to keep the spec readable.
> and that's what I did in a complex spec
Do you believe that your hands were tied because the mapping can only be a projection of state?
Is this spec and the corresponding implementation that you are working on in public domain ?
> We really need a model checker that can handle temporal quantification.
Do you intend to restrict the quantification only over time variable only ? Why do I ask this? This is because even in presence of finite stuttering, one only needs to analyze about state and the successors ?
If you intend to quantify over state variables (including the prophecy/history variable, for instance in the example you mentioned) -- that is a more general problem of dealing with quantifiers and there is still long way to automate that aspect.
> Is this spec and the corresponding implementation that you are working on in public domain ?
Not currently. Maybe in the future.
> Do you believe that your hands were tied because the mapping can only be a projection of state?
Rather than? I'm not sure I understand the question, so maybe. My problem is that some reductions require moving certain transitions backwards or forwards in time. In theory, the idea is described well in section 3 of this paper https://research.microsoft.com/en-us/um/people/lamport/pubs/... (this is pretty much exactly my problem: I wanted to see if a distributed transaction is a refinement of an atomic transaction). In practice, using prophecy variables isn't easy.
> Do you intend to restrict the quantification only over time variable only?
Oh, no, in TLA "temporal quantification" means hidden state, or quantifiers over state variables. Having the model checker handle that would make auxiliary variables unnecessary in cases where the more abstract specification has hidden internal state (unless you want a deductive proof). It will save us the hardest work, and ensure that the implementation relation is sound (no auxiliary variables necessary).
It would have solved my problem, too. Even though auxiliary variables would still be necessary, the model-checker could check that they're sound by checking:
S ⇒ (∃∃ aux : S_with_aux)
to see that adding the auxiliary variables doesn't restrict the spec (and then I could check, as I do today, that `S_with_aux ⇒ Abstract_S`)
> My problem is that some reductions require moving certain transitions backwards or forwards in time.
This is definitely a more involved problem. A cleaner solution, I believe, requires a new notion of correctness than trace containment/simulation up to stuttering. This is because neither of these notions allow one to move forward or backward observable transitions. The key here is to differentiate between non-observable and observable non-determinism. While the former can be accounted by using prophecy variable in linear time framework or directly in the branching time framework. I will recommend reading the comparison of the two approaches in [1]. However, the later is not what can be accounted by adding auxiliary variables. It is done currently by complicating a simple sequential specification -- create a new specification that includes all possible interleaving of transitions implicitly or explicitly. I will be curious to hear your insights on this while using TLA+ in particular and Lamport's theory of refinement in general.
I must say that the above must be at best considered speculative from a practical methodology point of view. I do suggest a concrete alternative nor do I have extensive experience in validating actual distributed systems.
> Rather than? I'm not sure I understand the question, so maybe
This is the easier part and a less speculative comment :).
In the theory of refinement in branching-time framework developed in [1,2] the refinement can be any computable function rather than just a projection of states space. This often is very helpful. For example a refinement map for a pipelined processor is defined by invalidating all partially executed instructions and setting the program counter to the oldest instruction in flight. This is relatively more involved than just a projection map that "hides" the internal states. Notice that, as we have already mentioned earlier, branching time refinement implies trace containment, a linear-time notion. So there is nothing lost. There are follow up papers with case studies illustrating the use of this flexibility to define refinement map that makes automated reasoning more amenable.
Thanks for clarifying the temporal quantification.
I read the paper in the first link, and I may read it again, but I have a few remarks:
1. Refinement mappings in TLA also allow arbitrary state functions.
2. There are very good reasons to want machine closure, and one of the things mentioned at the end of that paper as an advantage (that there is no necessity in a separate liveness property), is actually a serious disadvantage.
Let me give you an example, using what Lamport calls "raw" TLA, which is TLA which is not stuttering invariant:
Spec1 ≜ t = 1 ∧ x = 1 ∧ ◻(t' = t + 1 ∧ x' = x + 10)
Spec2 ≜ t = 1 ∧ x ∈ Nat ∧ ◻(t' = t + 1 ∧ x' = x + 10 ∧ t = 10 ⇒ x = 100)
The two specifications specify the same behavior. But, in this world, a safety property can imply a liveness property, e.g. x ∈ Nat ∧ ◻(x' = x + 10) ⇒ ◇(x ≥ 100)
Now, this is bad because the second specification is actually not implementable in a computer with complexity similar to that of the behavior as it relies on "angelic" nondeterminism. It is possible to use angelic nondeterminism in TLA (it's important for high-level specs) like so:
Spec3 ≜ t = 1 ∧ x ∈ Nat ∧ ◻[t' = t + 1 ∧ x' = x + 10 ∧ t = 10 ⇒ x = 100]_<t, x> ∧ ◇(t ≥ 10)
(the square brackets mean that either the action is true, or <t, x> are unchanged) But now, the liveness property is not implied by the safety property, and it requires an explicit statement in the formula. That last formula, Spec3, is not machine-closed, and this makes it easier to see that the algorithm implied is unrealizable as written.
> 1. Refinement mappings in TLA also allow arbitrary state functions.
Could you please point me to a reference for this. The completeness result from "The existence of refinement mapping" only includes projection. I know that TLA does allow existential quantification over internal variables (corresponds to hiding) and one can eliminate the quantifier by defining a function. But notice that is just the logical way of specifying "hiding" and is different from having arbitrary refinement mapping.
>2. There are very good reasons to want machine closure ...
Definitely. However, these are the concerns when we are specifying a system -- is the specification correctly models the system that I have in mind. While the comments in the paper -- no necessity to separately analyze liveness -- refers to the easy while proving that a concrete specification "implements" an abstract specification. And this is what your example specifications illustrates (thanks for specific examples, it helped me to understand the point you are making).
> I know that TLA does allow existential quantification over internal variables (corresponds to hiding) and one can eliminate the quantifier by defining a function.
That's what I meant. The functions do not necessarily apply to hidden variables only, but to any variable. E.g., in TLA+, if Spec1 has variable x and Spec2 has variables q and t, you could write:
Spec2 ⇒ INSTANCE Spec1 WITH x ← q + t
Which implements Spec to by mapping (q + t) to x.
> But notice that ... is different from having arbitrary refinement mapping.
Can you explain what you mean?
> However, these are the concerns when we are specifying a system -- is the specification correctly models the system that I have in mind.
Right, but you want an easy way to determine if a specification is machine-closed or not. TLA provides a sufficient syntactic condition. A specification is guaranteed to be machine closed if it can be written as
Init ∧ ◻[M]_vars ∧ Fairness
Where M is an action and `Fairness` is a conjunction of ⬦ clauses containing WF/SF (weak/strong fairness) conditions only. In fact, when testing safety properties alone (which is normally the case), a specification is usually just
Init ∧ ◻[M]_vars
Which TLA guarantees is machine-closed. Otherwise, a ◻ formula alone may not be machine-closed (and may imply a liveness property, like in my example above), and proving whether it is or it isn't may be non-trivial.
I find myself recreating different versions of state machines for almost all projects I work on. The ability to reduce a problem to a state and reactions to external stimuli, resulting in a new state, is a powerful, simple and easily testable/encompassing process. You can write cohesive and testable pieces of code, and return to them later, and easily understand what you originally wrote. When I start writing callbacks and event-based code, I find myself confused and tracing through a "spaghetti of callbacks".
>Yet computer scientists are so focused on the languages used to describe computation that they are largely unaware that those languages are all describing state machines.
That brings back to mind that someone corrected me on that, computers were closer to linear bounded automata. Now I'm even more confused.
I am working with computer science students and I see Lamport's point.
Even for simpler programming exercises the students start coding away, writing dozens of nested function calls with different global and local states, when a simple state machine models the problem quite well and building one would lead to much cleaner code.
The problem I see (and what we're trying to repair in our basic courses) is that students learn their basics in CS and then they write java or c and the primitives they know from theory (sets, tuples) aren't very accessible and they can't just use the definitions they've learned (imperative vs. Functional), so they just hack away without thinking.
We're trying to start with Haskell as early as possible to be close to the definitions from the first semesters.
This is not quite what Lamport is saying. Lamport is not talking about the process of writing programs, but about reasoning about programs. You build an analog circuit by composing components (that would be programming), but you reason about what it does and how it does it using ordinary differential equations. Lamport notes that state machines are a universal model for discrete dynamical system, and they can be manipulated mathematically using notation analogous to ODEs for continuous dynamical systems. Lamport does not suggest that you should program using a state machine abstraction, nor does he offer any notation for doing so.
Yes, and both rely on Pnueli's temporal logic. But Harel and Lamport apply a similar theory to different domains, neither suggesting the exclusion of the application to the other domain. Lamport offers abstract state machines as a universal mathematical formalism for reasoning about discrete dynamical systems regardless of how they're constructed, while Harel offers concrete state machines for programming safety-critical, hard-realtime embedded systems that are easy for both humans and machines to analyze.
It's interesting to compare this with the push for functional + immutable design systems, which leads to a similar result: writing programs that operate as "nextstate = f(state,inputs)"
Immutable (or "pure") and functional are two very different things. State-machine-based programming languages -- usually called synchronous languages (note that TLA+ is not a programming language) -- may be pure, but are not functional. Functional means that you model a computation as a function. In synchronous language and in TLA, a computation is not a function but a behavior -- a sequence of states. You can say things like "no matter what happens, the value of x is always greater than 0", or "if y is ever negative, then z must later become negative, too". In addition, in TLA the next state is not a function of the current state; (state, nextstate) is a relation, so a state can have multiple possible next state, chosen nondeterministically. This is essential when modeling nondeterministic programs (like those involving concurrency or distribution), or even when modeling deterministic program at a high level of abstraction where you don't care about the details ("the algorithm will do either A or B for some reason I don't care about").
Now when we evaluate something more elaborate then we will suddenly discover that there are multiple ways to write down some very fundamental concepts. For example Maxwell equations can be written in many different forms http://www.maxwells-equations.com/forms.php
So in fact the language is a very important tool even in physics.
Even the author continues and defines a new language to express the relations between and in the state machines.
Nonetheless this is an interesting piece of the writing.