Hacker News new | past | comments | ask | show | jobs | submit login
What is the enlightenment I'm supposed to attain after studying finite automata? (stackexchange.com)
336 points by netvarun on Nov 24, 2013 | hide | past | favorite | 75 comments

I have to say I am very happy with this discussion and the "data structures in the Linux Kernel"

Such theoretical basis (and even their practical considerations) seems much more important than "What are the advantages of latest fad.js"

Agreed, with the caveat that I can only handle a couple of posts this deep before feeling intimidated and longing for something more lightweight. I like the mix of articles because if all of them were this deep it would feel like I don't belong here without years of prerequisite study.

I totally support this statement. Longterm thinking is based on first principles.

[edit] Should be based on first principles.

This requires empirical support.

This post inspired me to up-vote. It's amazing how popular hacker news tends to be among the people I talk to but how few votes it can take to be on the front page.

Note that both the top answers are from the same person.

The answer on SE is still very theoretical. Yes, technically we can model any computation [1] as an FSM, but when does it actually make sense to do so?

FSMs are often used when modelling physical devices - think microwaves, elevators, CD-players, something with buttons or sensors, something event-driven, embedded controllers, also network protocols, text parsers, etc. Basically, whenever we must ensure correct behavior having few constraints on sequence or timing of inputs.

How can you actually apply it in your everyday programming? Look at your class. Every method call is an event that can cause a transition. Member variables are the state. Theoretically changing any bit of any member variable gives you a different state. In practice you only care about the bits that affect your control flow. Figuring out how variables map to states is the tricky part. Sometimes one variable corresponds to several states (say, when airspeed goes over V1 a plane transitions from can abort to must takeoff state), sometimes several variables define one state (say, when ignition=on and door=closed car is ready to drive), sometimes you'll need to add an extra variable to distinguish between states. Then you draw a table with events on one side and states on another and look at each cell. This is when you have all those precious "oh, crap, what if the user hits play when the song is already playing" moments. And voila - now you know that your class behaves correctly because you've systematically considered every possible scenario. It takes enough effort that you probably don't want to do it for each and every class, but when you do it - it's magic :-)

[1] Any computation with finite memory, to be pedantic

I did EE, before switching to CS, and we covered state machines for digital design in some depth, without learning automata theory. So, I don't think it's strictly necessary to know Theory of Computation to be able to model state practically and effectively.

This is a great comment. Thank you.

I have had the opposite question from one posted to SE: What enlightenment will I attain when I can move beyond thinking of every solution in programming as a FSM? I have never understood the obsession with higher level languages with objects, classes, methods, etc. I can't seem to think beyond lex, regular expressions and case statements.

You comment is reassuring. Maybe I'm already emlightened. And I just don't know it.

Encapsulation is what you're missing. OO programming only makes sense once you're working on a codebase too big to understand. I mean, I guess you could think of each object as its own FSM, but that's not particularly enlightening - in part because well designed objects don't have multiple possible states (rather, the state is captured in what they are. It's a hard notion to accept, like learning to use infinitesimals in mathematics - you've spent so long convincing yourself that this intuitive notion is wrong, and then suddenly you discover there's a way to make that intuitive notion correct after all, but you resist it because it seems like it's a messy, bad intuition). Yes, technically there are states to the whole system - but you need some way to factorize it into smaller systems if you're going to make any sense of it.

There is a real enlightenment to be got from really understanding OO, and it's one that many supposedly good programmers seem never to reach. Keep seeking it.

"... only makes sense once you're working on a codebase too big to understand."

My aim is to never reach this point. When I can no longer understand it, it's too big.

That said, I will remain curious about OO.

No-one fully understands their programs; even if you understand all the code, few people understand the language implementation, fewer understand the underlying OS libraries, and I'm pretty sure no-one understands every level of the hardware (I've seen guys who were lead on some section of an Intel processor say they didn't understand the whole thing).

Now sure, you can say that when a single code project becomes too big to understand then it should be split up into a bunch of libraries, or a bunch of independent services. Which is true up to a point. But I find there's value in having a more granular continuum of decoupling scales - method, class, package, module, library, service - which OO makes possible. And I'd argue that many of these kind of libraries and services - particularly the REST style - are inherently OO.

I more drawn to assembly languages than OO languages.

Less granules to deal with. :)

I welcome limitation. It is what fuels my creativity.

I make no attempt to persuade others to mimic my personal tastes.

The link does not answer the question. The answer is: there is no enlightenment you're supposed to attain immediately after studying finite automata. You've created the possibility for enlightenment further down the road.

Studying finite automata equips you with a bunch of knowledge about their behavior, which equips you with a bunch of knowledge about the behavior of a variety of other subjects, because they can be projected onto automata (can be reduced to automata, can be morphed into automata, ...).

The link lists a bunch of properties of, and relations between, various kinds of FA's and explains how these are useful to other areas of CS and even to regular programming. But you cannot be expected to have these thoughts as sudden epiphanies. At most you can at a later time realize a certain property exists, realize the same property exists for FA's and realize that makes sense, because of the relation between your subject and FA's. You can then further realize that you know immediately know a lot more about the subject as well. Some of these realizations may strike you with a physically tingling sensation of aesthetically pleasantness which some call 'enlightenment', but YMMV.

The question did not stipulate an "immediate" enlightenment. If that's what you wanted fine, but the question as asked was answered just fine.

The list in the 1st answer is amazing, but the 2nd highest answer has one of my favorite things:

> Turing machines (we think) capture everything that's computable. However, we can ask: What parts of a Turing machine are "essential"? What happens when you limit a Turing machine in various ways? DFAs are a very severe and natural limitation (taking away memory). PDAs are a less severe limitation, etc. It's theoretically interesting to see what memory gives you and what happens when you go without it. It seems a very natural and basic question to me.

I like to think of it the other way, though. A DFA with a stack is a PDA. A DFA with two stacks is a Turing machine. Looking at the 3 different classes as DFAs with 0, 1, or 2 stacks is just so beautiful...

Adding to the above:

While in most algorithms books, we treat integer operations (addition, multiplication) as an O[1] operation, most do not even mention that this is only because these integers are constrained in size to a limited number of bits. Storing an integer in mathematics would require infinite amount of memory. Analyzing the Big-O complexity of the algorithms is too much more fun when you take the integer fixed-width assumption out.

A single integer can serve as a "stack" from the perspectives of computer science. Without loss of generalization, assume that the set of symbols has digits 1..9. Now push operation is multiplication by ten and addition of the digit, while pop is taking the units digit and dividing by ten.

In other words, if you are using a single integer in your program, and do not want to simply return incorrect answers or throw exceptions due to overflow errors, then your algorithm may already be beyond an NDFA. This would for example happen if you are trying to match parenthesis in a string like "(()())(((())))". You can easily do this by having an integer in your program, reading the string from left to right, incrementing the integer when you encounter a '(' and decrement for a ')'. The computer science books would however show the above problem to be unsolvable using a NDFA. (You have solved the problem for all practical purposes since the strings you'll encounter may never be that long, but a computer with finite memory is in principle incapable of solving this "correctly" for all possible inputs.)

How is a Turing Machine a DFA with two stacks? I never learned it like that in my Theory of Computation course...

The informal answer is that you can treat two stacks like one tape. The top of the first stack is the current location on the tape. You can move the tape head to the left by popping from Stack 1 and pushing that value onto Stack 2. Pop from Stack 2 and push on Stack 1 to move to the right.

What's really funky is that a DFA with a queue is a Turing machine. You can show this by putting a sentinel value in the queue that separates two stacks and having some crazy loop structures that allow you to push and pop from the two stacks it represents.

Another way of saying this is that the two stacks as a list zipper[1], which makes the correspondence between a DFA with arbitrarily writable memory and a DFA with two stacks readily apparent.


See Ullman's book [1] for more details on formal proofs.

Basically you can emulate the Turing machine's tape on the left side of the head with one stack and on the right side with the second stack.

An important question to now ask would be what happens if the tape in the machine has more than two directions; say for example it has a two-dimensional tape. The head may move left, right, but also up and down. It can be proven that such a machine would not be any more capable than the Turing machine with a 1D tape. Again see Ullman's book for details.

[1] http://www.amazon.com/Introduction-Automata-Theory-Languages...

split the tape into 2 halves; the tape to the left of the head is one stack, the tape to the right of the head is the other stack.

Head moving long the tape is just popping from one stack and pushing onto the other. The head state (and other bookkeeping) can be encoded in the DFA.

If you're interested in learning Automata from Prof. Jeff Ullman of Stanford you may join the current session on Coursera. What's best about this course is that Prof. Ullman is always present in discussion forums, and helps everyone learn and enjoy great discussions. Join here https://www.coursera.org/course/automata

Edit: I'm currently learning there, and enjoy it a lot. Especially in video 6, you may listen to the stories of Junglee startup, Started in the mid-90’s by three of his students, Ashish Gupta, Anand Rajaraman, and Venky Harinarayan. Goal was to integrate information from Web pages. Bought by Amazon when Yahoo! hired them to build a comparison shopper for books. They made extensive use of regular expressions to scrap..."

The Coursera Automata course is awesome. Ullman's active participation in the current second offering is part of it. But what really makes the course for me is that Ullman's presentations are relaxed. He knows the material, he knows where various aspects of the material rank in terms of importance to the learner, he has a feel for where students will get hung up, and he has no need to prove anything.

Ultimately, this gives the class the feel of a graduate seminar. It is designed so that student's can readily succeed in learning the material rather than structured around the idea of weeding out weaker students. It assumes that anyone who wants to learn the material can with enough effort, and that that learning is really all that matters.

In addition to the lectures what has made the course interesting to me is the vibrant, active "Discussion Forum" associated with this course. Ullman himself frequently posts explanation on some of the topics here. It seems there are some takers of this course who are well versed with Automata theory. Some of them are quite active in the discussion forum providing good explanation to many of the trickier concepts introduced in the course (which has been immensely valuable to me).

Automata is the third Coursera course I've taken [the first session of Introduction to Systematic Program Design and the current session of Programming Languages are the other two].

One of the features [or bugs depending on how you look at it] is that a lot of the forum participation is by people who have largely mastered the course material. This can be great for someone who is ok when the smartest person in the room turns out to be someone else [HN helps me develop this habit]. But at times, it is intimidating for students coming to the material for the first time or who are struggling.

The discussions in the Automata course are way out of my league...plus I'm not fully invested in keeping pace with the material right now. In the Programming languages course, I participate, but I find posts saying "this homework was really easy" unhelpful when I've spent twenty or more hours wrestling with the week's problem. I imagine that it can be downright discouraging to someone struggling just to pass and unable to devote as much time as it takes to solve it.

That said, the discussions in Automata are first class - right down to the formatted mathematical expressions, and I find the topic fascinating in and of itself.

> The Coursera Automata course is awesome.

Oh yes. I have a haphazard knowledge of Chomsky hierarchy, and this course is structuring it beautifully. It also fills a number of holes.

Today's latest success: I finally "get" what an LL(1) grammar is! And why so many programming languages have regular tokens and LL(1) grammars.

I bet I'll understand the fundamentals of Yacc before the course is over.

If you are interested in going deeper, may I recommend "Parsing Techniques" by Dick Grune et al [1]. I have read this book over five times and still find gems hidden that I must have failed to grasp in a prior reading.

The author provides the PDF of the first edition on his website [2]. Read the last paragraph of page 60 of the book (page 50 of the PDF). I read this again whenever coming in doubt.

A great thing about the book is also its annotated bibliography. The author cites hundreds of papers for extended reading, but has also written a paragraph on each cited paper explaining the context around it, how it fits with the rest of the subject, and a quick summary of what were the key contributions and findings.

[1] http://www.amazon.com/Parsing-Techniques-Practical-Monograph...

[2] http://dickgrune.com/Books/PTAPG_1st_Edition/

In the current iteration of the course, the first programming assignment's code quality is awful, sadly.

More info on Junglee can be found in Peter Norvig's website. He was the first employee at Junglee, before becoming head of research at Google:


From my dim and probably exaggerated memory of a visit, his cubicle was pimped out witch-doctor style. (They had a jungle theme around the office floor in general.)

For my Comp Sci degree I wrote a regexp/automata library. It allowed you to create/modify/load/save/execute NFAs, and the same kinds of functions for a basic regexp language (character classes, ., Kleene star *, ?, etc) by first converting the regexp into an NFA (with epsilon transitions) as each regexp operation maps to a small NFA so they can be bolted together (with epsilon transitions).

You could then (attempt to) convert NFAs into a DFA using Myhill-Nerode theorem to determinise it; I say attempt as it was possible to make it consume huge resources if you gave it the right/wrong kind of regexp. It also did minimisation and a few other bits and bobs.

To say this gave me a good understanding of regular expressions is an understatement, and once you understand the basics then all of the parts of more complex regular expression languages become a lot easier! That was my enlightenment.

I had come across this answer some time back, but got to revisit it again while reading another one of the author's (Vijay D. - http://www.eecs.berkeley.edu/~vijayd/#about ) answers trending on HN (https://news.ycombinator.com/item?id=6787836)

Additional discussion: https://news.ycombinator.com/item?id=6788969

I originally went into my Theory of Computation class this year grunting and moaning about it was a waste of time and how I would rather be writing actually software. Now I actually kind of enjoy the class and can see its use. I find the discussion on Decidability to probably be one of the most important part. Although it's a little disappointing to find out that there are some problems that we will never be able to solve.

Coming from an EE background I remember lots of emphasis over the techniques for converting finite state machines into HDL code. For me, it's really cool to see the topic approached from the opposite direction, so to speak.

This. Micron's in memory automata is going to eat your lunch on many algorithms. Bye bye Van Neuman bottleneck. http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd...

ROBDD: Reduced Ordered Binary Decision Diagram

A flow chart with a truth table, which is used to map all of the possible ways to arrive at one or zero by carrying out a set of steps, which may follow recursive paths, such that the similarities of different paths to identical outcomes can be better understood.

Where (

  reduced = all roads lead to one or zero
) and (

  ordered = applying good organization to 
            the steps of the decisions in 
            the flow chart, even when steps 
            in the decision tree can be 
            carried out in a variety of ways


I'd say that finite automata give practical answers to many parsing problems which would be difficult otherwise.

A modern processor has a lot in it, but you can understand it as a finite automaton. Just this realisation made computer architecture and processor design far less intimidating to me. It also shows that, in practice, if you structure and manipulate your states carefully, you can get very far with finite automata.

I don't understand this one. Surely a modern processor is like a Turing machine and not a finite state automata. For example, a DFA/NFA can't express simple algorithms like matching parentheses, while obviously a modern processor can.

The processor plus the RAM is a Turing machine. If you decide in advance what input go to the input pins, regardless of the output, and you just record what goes out the output pins, the processor is pretty much a DFA.

When you add the RAM, you provide a feedback loop: input depends on previous outputs (in plainspeak, when you write to the RAM, you can read it later). No feedback loop, no Turing machine.

The processor chip itself is designed as a DFA (as is very common in digital logic design though EEs prefer the term finite state machine or FSM). These slides do a good job of explaining this perspective on digital design. [1]

It becomes much more powerful when given the ability to read and write from arbitrary storage, they same way theoretical DFAs become turing machines when given the same power.

I am glossing over the complications that on-chip caches and memory hierarchy introduce, but I feel the core idea of a processor chip as a DFA is still worthwhile.


It's basically because you don't have infinite storage -- I could, given enough time, feed your PC a string that it wouldn't be able to tell was unbalanced. Of course, to do that I would have to overflow your available storage which would take a while.

For all practical purposes, your PC adequately models a Turing machine. In point of literal fact, it doesn't because it only has finite storage. In practical usage, it is sometimes helpful to consider various states your code may enter and how they interact. For understanding how the system actually works, the author appears to have found the state-switching model to be a useful one, and indeed I understand that it's common to model the internal state of various devices using finite automata.

A PC can still read and write from arbitrary memory locations which means it can know a lot about it's past states, so the memory limit makes it a deterministic linear-bounded automata.


But the LBA is limited to the size of the input string, while a PC has a fixed but insanely large number of states (2^{number of bits of storage}) and can't in theory accept arbitrarily large strings.

In the somewhat twisted view of the machine I'm putting forward, it's not accessing 'arbitrary memory locations' as memory locations, rather the content of that memory location is part of the input state.

Oh yeah-- Without studying finite automata, writing my compiler would have sucked.

[edited] would have sucked even more

You might realise that parsing HTML or C with regular expressions is a bad idea.

Parsing C or HTML with regular expressions is a great idea. Parsing C or HTML with only regular expressions isn't.

It's tricky with HTML but can be done. The C language was one of it's initial use cases.

Scanning the C language is done using regular expressions, typically. Parsing the resulting token stream cannot be done with only regular expressions.

At least not with the theoretical construct. Some "regular expressions" libraries, notably PCRE, let you do plenty that goes beyond the theory at the expense of some efficiency. Parsing some context-free grammars with these libraries is certainly possible, but you're not really restricting yourself to "just" regular expressions at that point. Regular expressions plus interwoven use of a stack is a pushdown-automata, which can totally parse C and HTML.

No, it can't. As in, you cannot parse HTML or C with regular expressions, and you can prove that that's the case. Regexes simply aren't powerful enough. This is one of the advantages of studying formal languages -- you get to learn about really useful models of computation (here, DFA's), but you also get to learn about their limits.

I think the insight is that each formalism is defined by/equivalent to the input language that it is able to parse;

State machines can do regular expressions, but they can't parse matching brackets,

Stack machines can do BNF grammars - but they don't have counters; also Stack machines are more general than state machines; each regular expression can be expressed by an equivalent BNF grammar.

Turing machines can parse Perl - that means any language that depends on the context; this is because Turing machines have counters/memory. Also Turing machines are more general than stack machines; each BNF grammar can also be parsed by a Turing machine.

All this business of deterministic/non deterministic machines is about efficiency of the parser; nice to know.

on examinations of code it's normally pretty easy to see who has a formal understanding of state machines, and who doesn't.

Would you please elaborate. My experience has been that formal understanding of theory (say of state machines) equips one to handle complex problems and in no way indicates how well the code is written. For example, a GUI application code (say) making use of a framework wouldn't give any indication of whether the author had any formal understanding of state machines or not. In my opinion to write a good readable code one need not have a formal understanding of CS theory, but the understanding of CS theory is paramount in writing code for complex problems (say writing a compiler).

7 booleans to try and maintain state instead of a single enum value

happens quite a lot in my experience

GUI application code actually is one that often can be very clearly thought as a DFA.

In particular, the UI often is non-static and can be in certain states, with the effect of user actions (and possibility of certain actions) depending on that state. If it is implemented or "grown" ad-hoc, it sometimes may turn into a jumble of interdependent code and flag/visibility variables, but very often the exact same behaviour can be modelled and implemented much simpler if you think of it as a DFA.

For example, the UI of recently popular single-page webapps, where JS is used to change 'screens' without page load, or change functionality by hiding/adding large parts of the page, can often be naturally be thought of as DFA, planned (on paper or wireframes) as states='screens' and transitions between them, and then implemented as DFA.

Thinking of UI as a DFA also forces you to consider the 'proper' effect of every user action in every state, which is just good UX practice. For example, in a classic url-based web app you may draw a graph/process of how an user proceeds through making a purchase on your site. Usually it will include only the typical process - but if you think of it as a DFA, you define behavior for actions such as 'requested url /ordervalidation/step2-address' at every state, since the result likely should be different depending on if the user went there from step1 by clicking 'ok, i want to buy this', step3 by clicking the back button, or two weeks later by clicking on a link that he accidentally bookmarked.

Can you please elaborate?

What's the best book on this? (I'd prefer something non-academic)

Chapter 10 of the book "Foundations of Computer Science" (http://i.stanford.edu/~ullman/focs.html) which is available online provides an excellent introduction to automata theory. After that you can start referring to " Introduction to Automata Theory, Languages, and Computation" by Ullman, Hopcraft and Motwani. This is also the reference book for the Automata course (and taught by Ullman) currently being available in Coursera.

I wrote a non-academic book on this subject for the HN crowd: http://computationbook.com/

I took a quick look at Tom Stuart's book and it looks great (thanks for sharing that, Tom! something new on my reading list).

Somewhere in between that and Ullman's stuff in terms of formality is Sipser's Introduction to the Theory of Computation. I like its explanations of the DFA material more than the free Ullman book online, though I haven't read Ullman's dedicated Automata title. (A little too rich for my blood...)

New appreciation for the goto statement.

That nature computes (many, if not all natural processes can be thought of as computational in some respect), and that many computational systems can be thought of as at least analogs of physical or biological processes.

See for example: http://www.uc09.uac.pt/JamesCrutchfield.php

I'd say after learning about different classes of automata, you should understand what a compiler does in general. At least, that's what I took from it.

Some problems are best solved using FA. Pac mac? Vending machine? ATM machine? Once you have the rules. You can be confident of no bugs.

>Turing machines (we think) capture everything that's computable.


eh?! we think?

Isn't the definition of computable, "something that can be reached by a Turing machine?"

There's good evidence that Turing machines do capture the right notion of computability. You can enhance a Turing machine in many ways without changing what it can compute. Further, Turing machines, the lambda calculus, recursive functions, cellular automata, and register machines all give the same notion. Even a quantum computer can be simulated by a Turing machine.


That's the definition when "computable" is used in a formal setting. What we believe is that it is also equivalent to our informal, intuitive understanding of the term.

...cannot be played on record player B.

Suffix Trees. That kind of hyper efficient recognition is even harder to think about without the finite automata theory behind regular expressions.

I posted this above. Micron is releasing a new NFA card http://www.micron.com/about/innovations/automata-processing

Any guess on which formerly expensive NFA algorithms will be put to use on it in short order?

I didn't study finite automata at school, but I studied cellular automata on my own. If anyone would like to describe the difference (if there is one), it'd be appreciated.

W.r.t cellular automata, I made this video, and if this isn't enlightenment, I don't know what is: https://vimeo.com/80147310

Cellular automata are one of the things you study in automata theory, which is the study of different models of computation.

Although cellular automata are of some interest (Game of Life is Turing-complete, etc.), I wouldn't say they're nearly as important as the other models of computation you'd study in a normal undergraduate class (DFA's/NFA's, PDA's, Turing Machines).

Thanks! Much obliged for the explanation (and damn anyone who downvotes me for an expression of gratitude!).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact