
Every Finite Automaton Has a Corresponding Regular Expression - lelf
https://semantic-domain.blogspot.com/2019/10/every-finite-automaton-has.html
======
Syzygies
The first chapter of Berstel and Reutenauer's "Noncommutative Rational Series
with Applications" presents Schützenberger's theorem that every noncommuting
rational power series is representable, and conversely. The idea is NOT
painfully abstract, but makes twenty minutes work of a semester of
undergraduate automata theory (an assertion I've tested multiple times in my
math office hours).

They work with coefficients in an arbitrary semiring, which can be like
watching paint dry. However, boolean true/false, and probabilities, make two
great examples. The idea that hidden Markov chains is exactly the same theory
as CS automata theory is mind-blowing.

Once one absorbs this, there's an interesting wrinkle worked out in the 1960's
that is still poorly understood. If the outputs are probabilities, do the
internal coefficients of the matrices need to be probabilities? Why yes, if we
are so enamored with fulfilling our expectations that we can simply draw
graphs and label them in understandable ways for people who aren't willing to
work very hard or defy expectations. Why no, if one chases the actual math.
There are examples that can require arbitrarily many matrix dimensions to
model using probabilities, that can be modeled in few dimensions using real
numbers outside the [0,1] range.

This has a simple explanation. There's a pattern one sees twice, a linear
combination with coefficients in [0,1] that sum to one: A probability
distribution, and barycentric coordinates. One can have a point cloud of small
dimension, whose convex hull requires arbitrarily many vertices. Restricting
the coefficients to the [0,1] range means working with these vertices as
separate matrix dimensions. Leaving the coefficients unrestricted means
working with the raw dimension.

~~~
Donald
This perspective is excellent. Besides the book reference, any papers in this
area that are relevant?

~~~
minipci1321
Not direct answer to your question, but Berstel's publications:

[http://www-igm.univ-mlv.fr/~berstel/PubsJeanBerstel.html](http://www-
igm.univ-mlv.fr/~berstel/PubsJeanBerstel.html)

You can try to contact him, he might be able to help you.

------
ridiculous_fish
I worked on a tool that infers a regex from sample strings, by building and
then reducing an NDFA.

The NDFA->regex construction worked by "collapsing" a node in the NDFA. Choose
a victim node. Construct a Kleene star for all "self loops." Lastly, form all
triples of (incoming, self-loop, outgoing) edges; these become the labels of
new edges. Now you can discard the victim node. Repeat until you're down to a
single start -> goal node, and you're done (the edges just become
alternations).

Surprisingly the length of a regex would vary dramatically by the choice of
nodes to collapse. Finding the _minimum_ regex was an unexpected challenge.
Has anyone explored regexp minimization?

Incidentally it's not the case that Thompson and subset construction "are the
basis of every lexer and regexp engine there is." It's not even true of the
regexp engine inside the browser you're using now!

~~~
bane
Very cool, do you know of any tool that does something similar (provide
examples, get regex)?

~~~
jraph
I am not sure this is what you are looking for, but I have been working on
Aude (AUtomata DEmystifier), an open source pedagogical application targeted
at CS teachers and students that works in browsers without installation (but
one can download and run it offline too).

The aim is to visualise and manipulate automata, including conversions between
them and regular expressions. Happy to take feedback!

[https://aude.imag.fr](https://aude.imag.fr)

~~~
AlchemistCamp
That's a neat project! What inspired you to start it?

~~~
jraph
Thanks for the kind words.

Well, I was a student and needed to practice the related lesson, "Languages
and Automata", before the final exam. So I implemented the algorithms of the
lesson and used Graphviz to render the result. The thing worked in a browser
but ran on the server (using D!). I figured my fellow students or the teacher
may find it useful, so I sent the link to the teacher (in a post-scriptum of a
very long mail in which I was asking for help, ah ah).

"Your tool seems really interesting, what about you continue developing this
during an internship this summer?"

Hell, yes. And Aude started.

Since then, I was lucky to mentor several interns to work on Aude with this
teacher.

~~~
AlchemistCamp
That is very cool and I'll bet having a steady flow of later cohorts of
students helped a ton!

~~~
jraph
I would not say a steady flow, there were two to three students two months per
summer these last three years, but they helped a lot anyway indeed :-)

And it was fun to work with them. The first "cohort" of three students called
themselves the Aude Team and they put that in their internship report.

------
2sk21
This is an example of an important piece of knowledge that typically only
programmers with a formal education in computer science are taught. This
theorem is taught in all formal language courses.

~~~
stcredzero
_This is an example of an important piece of knowledge that typically only
programmers with a formal education in computer science are taught._

This is the 21st Century. Why aren't we teaching stuff like this in Middle
School? It wouldn't require an entire course. Just cover FSM and stack
machines, then letting people know that there's stuff beyond that. This would
get people a basic level of the "mental furnishings" to deal with notions of
computational complexity, beyond the level of total fuzzy "woo." In 2019,
ordinary people need to know something about this stuff to make decisions at
work and to vote!

Yes, just the most basic, minimally concrete idea of how this stuff works
would be valuable for society. When Ford automobiles were new, some new car
owners would try to "fix" their cars by hanging bulbs of garlic under the
hood. Today, we find this quaint, magical thinking, because everyone has the
barest, minimally concrete, idea of how a car engine works. Are you annoyed at
the general ignorance of politicians around technical issues? Well, it's just
analogous to politicians with the above level of automotive ignorance passing
motor vehicle laws.

~~~
dgb23
I agree in general but I find this piece of knowledge to be too theoretical
and specific (the equivalence of regex & finite automata).

I find it much more important to know how networking and the Internet works
and basic client-server architecture. And maybe some things about
authentication. These topics seem much more tangible and important to non-
technical people than CS theory.

~~~
pingyong
Pretty sure this equivalence was more or less the first sentence in the regex
tutorial I read during high school...

------
joppy
That’s a very cool construction of regular expressions, just using matrices
and semirings. I loved the explanation of why F and G were as they were as
well. I had two comments:

> Assume we have a regular language (R, 1, ., 0, +, * )

Shouldn’t this data (a semiring with * -operation or “closure”) not be one
regular language, but _all_ regular languages? A single regular language
should be an element of R.

I also think that the power series A* = 1 + A + A^2 + ... looks more like
1/(1-A) rather than e^A, which makes sense given the defining equation A* = 1
+ A(A*), but that’s not much of an issue since neither of these properly make
sense in this semiring.

------
NHQ
What is the regex for a given state of Game of Life?

~~~
pubby
Game of Life isn't a finite automata. It has an infinite number of states (and
is turing complete, too).

~~~
NHQ
Is that why my neural network can't make any sense of GoL progression? 2^512
possible rules...

