Such theoretical basis (and even their practical considerations) seems much more important than "What are the advantages of latest fad.js"
 Should be based on first principles.
FSMs are often used when modelling physical devices - think microwaves, elevators, CD-players, something with buttons or sensors, something event-driven, embedded controllers, also network protocols, text parsers, etc. Basically, whenever we must ensure correct behavior having few constraints on sequence or timing of inputs.
How can you actually apply it in your everyday programming? Look at your class. Every method call is an event that can cause a transition. Member variables are the state. Theoretically changing any bit of any member variable gives you a different state. In practice you only care about the bits that affect your control flow. Figuring out how variables map to states is the tricky part. Sometimes one variable corresponds to several states (say, when airspeed goes over V1 a plane transitions from can abort to must takeoff state), sometimes several variables define one state (say, when ignition=on and door=closed car is ready to drive), sometimes you'll need to add an extra variable to distinguish between states. Then you draw a table with events on one side and states on another and look at each cell. This is when you have all those precious "oh, crap, what if the user hits play when the song is already playing" moments. And voila - now you know that your class behaves correctly because you've systematically considered every possible scenario. It takes enough effort that you probably don't want to do it for each and every class, but when you do it - it's magic :-)
 Any computation with finite memory, to be pedantic
I have had the opposite question from one posted to SE: What enlightenment will I attain when I can move beyond thinking of every solution in programming as a FSM? I have never understood the obsession with higher level languages with objects, classes, methods, etc. I can't seem to think beyond lex, regular expressions and case statements.
You comment is reassuring. Maybe I'm already emlightened. And I just don't know it.
There is a real enlightenment to be got from really understanding OO, and it's one that many supposedly good programmers seem never to reach. Keep seeking it.
My aim is to never reach this point. When I can no longer understand it, it's too big.
That said, I will remain curious about OO.
Now sure, you can say that when a single code project becomes too big to understand then it should be split up into a bunch of libraries, or a bunch of independent services. Which is true up to a point. But I find there's value in having a more granular continuum of decoupling scales - method, class, package, module, library, service - which OO makes possible. And I'd argue that many of these kind of libraries and services - particularly the REST style - are inherently OO.
Less granules to deal with. :)
I welcome limitation. It is what fuels my creativity.
I make no attempt to persuade others to mimic my personal tastes.
Studying finite automata equips you with a bunch of knowledge about their behavior, which equips you with a bunch of knowledge about the behavior of a variety of other subjects, because they can be projected onto automata (can be reduced to automata, can be morphed into automata, ...).
The link lists a bunch of properties of, and relations between, various kinds of FA's and explains how these are useful to other areas of CS and even to regular programming. But you cannot be expected to have these thoughts as sudden epiphanies. At most you can at a later time realize a certain property exists, realize the same property exists for FA's and realize that makes sense, because of the relation between your subject and FA's. You can then further realize that you know immediately know a lot more about the subject as well. Some of these realizations may strike you with a physically tingling sensation of aesthetically pleasantness which some call 'enlightenment', but YMMV.
> Turing machines (we think) capture everything that's computable. However, we can ask: What parts of a Turing machine are "essential"? What happens when you limit a Turing machine in various ways? DFAs are a very severe and natural limitation (taking away memory). PDAs are a less severe limitation, etc. It's theoretically interesting to see what memory gives you and what happens when you go without it. It seems a very natural and basic question to me.
I like to think of it the other way, though. A DFA with a stack is a PDA. A DFA with two stacks is a Turing machine. Looking at the 3 different classes as DFAs with 0, 1, or 2 stacks is just so beautiful...
While in most algorithms books, we treat integer operations (addition, multiplication) as an O operation, most do not even mention that this is only because these integers are constrained in size to a limited number of bits. Storing an integer in mathematics would require infinite amount of memory. Analyzing the Big-O complexity of the algorithms is too much more fun when you take the integer fixed-width assumption out.
A single integer can serve as a "stack" from the perspectives of computer science. Without loss of generalization, assume that the set of symbols has digits 1..9. Now push operation is multiplication by ten and addition of the digit, while pop is taking the units digit and dividing by ten.
In other words, if you are using a single integer in your program, and do not want to simply return incorrect answers or throw exceptions due to overflow errors, then your algorithm may already be beyond an NDFA. This would for example happen if you are trying to match parenthesis in a string like "(()())(((())))". You can easily do this by having an integer in your program, reading the string from left to right, incrementing the integer when you encounter a '(' and decrement for a ')'. The computer science books would however show the above problem to be unsolvable using a NDFA. (You have solved the problem for all practical purposes since the strings you'll encounter may never be that long, but a computer with finite memory is in principle incapable of solving this "correctly" for all possible inputs.)
What's really funky is that a DFA with a queue is a Turing machine. You can show this by putting a sentinel value in the queue that separates two stacks and having some crazy loop structures that allow you to push and pop from the two stacks it represents.
Basically you can emulate the Turing machine's tape on the left side of the head with one stack and on the right side with the second stack.
An important question to now ask would be what happens if the tape in the machine has more than two directions; say for example it has a two-dimensional tape. The head may move left, right, but also up and down. It can be proven that such a machine would not be any more capable than the Turing machine with a 1D tape. Again see Ullman's book for details.
Head moving long the tape is just popping from one stack and pushing onto the other. The head state (and other bookkeeping) can be encoded in the DFA.
Edit: I'm currently learning there, and enjoy it a lot. Especially in video 6, you may listen to the stories of Junglee startup, Started in the mid-90’s by three of his students, Ashish Gupta, Anand Rajaraman, and Venky Harinarayan. Goal was to integrate information from
Web pages. Bought by Amazon when Yahoo! hired them to build a comparison shopper for books. They made extensive use of regular expressions to scrap..."
Ultimately, this gives the class the feel of a graduate seminar. It is designed so that student's can readily succeed in learning the material rather than structured around the idea of weeding out weaker students. It assumes that anyone who wants to learn the material can with enough effort, and that that learning is really all that matters.
One of the features [or bugs depending on how you look at it] is that a lot of the forum participation is by people who have largely mastered the course material. This can be great for someone who is ok when the smartest person in the room turns out to be someone else [HN helps me develop this habit]. But at times, it is intimidating for students coming to the material for the first time or who are struggling.
The discussions in the Automata course are way out of my league...plus I'm not fully invested in keeping pace with the material right now. In the Programming languages course, I participate, but I find posts saying "this homework was really easy" unhelpful when I've spent twenty or more hours wrestling with the week's problem. I imagine that it can be downright discouraging to someone struggling just to pass and unable to devote as much time as it takes to solve it.
That said, the discussions in Automata are first class - right down to the formatted mathematical expressions, and I find the topic fascinating in and of itself.
Oh yes. I have a haphazard knowledge of Chomsky hierarchy, and this course is structuring it beautifully. It also fills a number of holes.
Today's latest success: I finally "get" what an LL(1) grammar is! And why so many programming languages have regular tokens and LL(1) grammars.
I bet I'll understand the fundamentals of Yacc before the course is over.
The author provides the PDF of the first edition on his website . Read the last paragraph of page 60 of the book (page 50 of the PDF). I read this again whenever coming in doubt.
A great thing about the book is also its annotated bibliography. The author cites hundreds of papers for extended reading, but has also written a paragraph on each cited paper explaining the context around it, how it fits with the rest of the subject, and a quick summary of what were the key contributions and findings.
You could then (attempt to) convert NFAs into a DFA using Myhill-Nerode theorem to determinise it; I say attempt as it was possible to make it consume huge resources if you gave it the right/wrong kind of regexp. It also did minimisation and a few other bits and bobs.
To say this gave me a good understanding of regular expressions is an understatement, and once you understand the basics then all of the parts of more complex regular expression languages become a lot easier! That was my enlightenment.
Additional discussion: https://news.ycombinator.com/item?id=6788969
A flow chart with a truth table, which is used to map all of the possible ways to arrive at one or zero by carrying out a set of steps, which may follow recursive paths, such that the similarities of different paths to identical outcomes can be better understood.
reduced = all roads lead to one or zero
ordered = applying good organization to
the steps of the decisions in
the flow chart, even when steps
in the decision tree can be
carried out in a variety of ways
I don't understand this one. Surely a modern processor is like a Turing machine and not a finite state automata. For example, a DFA/NFA can't express simple algorithms like matching parentheses, while obviously a modern processor can.
When you add the RAM, you provide a feedback loop: input depends on previous outputs (in plainspeak, when you write to the RAM, you can read it later). No feedback loop, no Turing machine.
It becomes much more powerful when given the ability to read and write from arbitrary storage, they same way theoretical DFAs become turing machines when given the same power.
I am glossing over the complications that on-chip caches and memory hierarchy introduce, but I feel the core idea of a processor chip as a DFA is still worthwhile.
For all practical purposes, your PC adequately models a Turing machine. In point of literal fact, it doesn't because it only has finite storage. In practical usage, it is sometimes helpful to consider various states your code may enter and how they interact. For understanding how the system actually works, the author appears to have found the state-switching model to be a useful one, and indeed I understand that it's common to model the internal state of various devices using finite automata.
In the somewhat twisted view of the machine I'm putting forward, it's not accessing 'arbitrary memory locations' as memory locations, rather the content of that memory location is part of the input state.
[edited] would have sucked even more
At least not with the theoretical construct. Some "regular expressions" libraries, notably PCRE, let you do plenty that goes beyond the theory at the expense of some efficiency. Parsing some context-free grammars with these libraries is certainly possible, but you're not really restricting yourself to "just" regular expressions at that point. Regular expressions plus interwoven use of a stack is a pushdown-automata, which can totally parse C and HTML.
State machines can do regular expressions, but they can't parse matching brackets,
Stack machines can do BNF grammars - but they don't have counters; also Stack machines are more general than state machines; each regular expression can be expressed by an equivalent BNF grammar.
Turing machines can parse Perl - that means any language that depends on the context; this is because Turing machines have counters/memory. Also Turing machines are more general than stack machines; each BNF grammar can also be parsed by a Turing machine.
All this business of deterministic/non deterministic machines is about efficiency of the parser; nice to know.
happens quite a lot in my experience
In particular, the UI often is non-static and can be in certain states, with the effect of user actions (and possibility of certain actions) depending on that state. If it is implemented or "grown" ad-hoc, it sometimes may turn into a jumble of interdependent code and flag/visibility variables, but very often the exact same behaviour can be modelled and implemented much simpler if you think of it as a DFA.
For example, the UI of recently popular single-page webapps, where JS is used to change 'screens' without page load, or change functionality by hiding/adding large parts of the page, can often be naturally be thought of as DFA, planned (on paper or wireframes) as states='screens' and transitions between them, and then implemented as DFA.
Thinking of UI as a DFA also forces you to consider the 'proper' effect of every user action in every state, which is just good UX practice. For example, in a classic url-based web app you may draw a graph/process of how an user proceeds through making a purchase on your site. Usually it will include only the typical process - but if you think of it as a DFA, you define behavior for actions such as 'requested url /ordervalidation/step2-address' at every state, since the result likely should be different depending on if the user went there from step1 by clicking 'ok, i want to buy this', step3 by clicking the back button, or two weeks later by clicking on a link that he accidentally bookmarked.
Somewhere in between that and Ullman's stuff in terms of formality is Sipser's Introduction to the Theory of Computation. I like its explanations of the DFA material more than the free Ullman book online, though I haven't read Ullman's dedicated Automata title. (A little too rich for my blood...)
See for example: http://www.uc09.uac.pt/JamesCrutchfield.php
eh?! we think?
Isn't the definition of computable, "something that can be reached by a Turing machine?"
Any guess on which formerly expensive NFA algorithms will be put to use on it in short order?
W.r.t cellular automata, I made this video, and if this isn't enlightenment, I don't know what is: https://vimeo.com/80147310
Although cellular automata are of some interest (Game of Life is Turing-complete, etc.), I wouldn't say they're nearly as important as the other models of computation you'd study in a normal undergraduate class (DFA's/NFA's, PDA's, Turing Machines).