There is a difference, though, between a finite state machine and a Turing machine with a finite amount of memory. An FSM cannot recognize context-free languages, but a Turing machine can--even when it's implemented on a computer without infinite memory.
(Actually, you only need a pushdown automata for context-free languages--I forget what the next step up in the hierarchy is, the one that does require Turing completeness)
A Turing machine without infinite memory can't recognize context-free languages. Take the standard example of recognizing the language of balanced parentheses. With finite memory, there's a limit to how many parentheses you can keep track of. That limit is going to be gigantic for any reasonable amount of memory, but it's still a limit. The language of balanced parentheses with a depth limit is no longer a context-free language, but a regular language, and as such can be recognized by a FSM.
It's trivial to demonstrate that a Turing machine with finite memory is equivalent to a FSM. Enumerate all possible memory contents. This number is, again, large for any reasonable amount of memory, but it is finite. For each possible memory content, enumerate the Turing machine states. Each memory state plus machine state becomes one FSM state. The Turing machine defines a transition from one state to another plus an action on memory, which becomes an FSM transition to the state that encodes both the new machine state and the new memory state.
Not at all true, the language of balanced parens is not regular and cannot be matched by a FSM no matter how much memory you have.[1] It can be matched by a Push Down Automata PDA.
The primary difference between a PDA and an FSM is a PDA has a stack which acts as memory. Thus, a PDA can remember how many parens it has seen before. Every time it sees an open paren it will push it onto the stack, when it sees a close paren it pops. When the stack is empty it matches if the input has been consumed.
The difference between a PDA and a Turing machine is the stack is now a read/writable tape which can move in either direction. Notice how in both formulations there is no limit on the memory (the PDA stack is could be infinite as can the Turing machine tape). In contrast a Finite State Machine doesn't have any memory. The finite refers to the number of states not the amount of memory. Indeed, Turing machines are traditionally formulated with finite states as well.
Thus, a FSM can never match the langauge of matching parens but a Turing machine can. It will either match or consume all of its memory if memory is limited. Memory limitations are not a statement on computational power but rather a statement on feasibility.
EDIT: Your second paragraph is of course correct - you could encode every possible stack as a FSM if there is a bound on the stack. HOWEVER, if you have a limit on memory but it is essentially unknown (you know memory is finite but you don't know exactly what the limit is) a PDA or a Turing machine will of course be able to match things your FSM cannot (because they can take advantage of the memory your FSM cannot assume it has in its states).
Memory limitations are in some sense an implementation detail. We could design a computer to pause while we buy more ram. Then it could use as much memory (in theory) as the human race could produce. That sounds like infinite for some definition of the word.
> Not at all true, the language of balanced parens is not regular and cannot be matched by a FSM no matter how much memory you have.
But this is not what the grandparent was saying. They said that the language of balanced parentheses with some depth limit d can be matched with a FSM, for any finite d. Similarly, a "Turing machine" with a bounded tape can only recognize a bounded version of the balanced parentheses language.
I'm deeply puzzled as to how you can say that my second paragraph is "of course correct" yet continue to argue that there's a difference.
No, the language of balanced parens is not regular and cannot be matched by a FSM no matter how much memory you have. That's completely true. However, it also cannot be matched by a Turing machine with finite memory, no matter how much memory you have. That is because, as I showed and as you agree is "of course correct", the two machines are equivalent in their capabilities.
Yes, I apologize for the parent comment. I realized I had misread your comment after I had posted and did the edit to try and rectify the situation. I left the original because I thought it might clarify some of the differences between the machines for people who don't know anything about the subject.
I guess I felt that the memory limitation was a misleading way of discussing the machines. Turing machines and PDAs are usually discussed with infinite memory. In practice the input really determines how much memory the machine will use.
Furthermore, I believe that the encoding scheme you suggest will use far more space than the equivalent Turing machine or PDA it encodes. Because, for every state in the machine you have to encode every possible memory configuration for that state. That means you have an exponential explosion in the number of states. This gets to the heart of the matter for me: when memory is bounded using the finite version of a PDA will let you match a deeper nesting of parens than a FSM because it will use memory more efficiently. You have to put the states of the machine somewhere and that place is either memory or hardware which are both limited.
You're absolutely right that there's a vast practical difference between the two things. This is, of course, why our actual real-world computers are modeled on Turing machines with finite memory, not on pure finite state machines. To model a machine with 1GB of RAM, an FSM would need 2^1024^3 states, multiplied by however many states are needed to model the CPU.
However, the difference is purely in practical terms when it comes time to actually build one. In the theoretical world they are equally capable and can recognize the same languages.
If a Turing machine with finite memory is equivalent in power to a FSM, why are non-deterministic linear bounded Turing machine considered so much more powerful than FSM's?
The amount of memory that can be used by an LBA is specified by the input, not by the machine, and there's no limit to how big it can be, only that there has to be some limit.
In short, it still has infinite memory, you just have to pick how much of it to use when you run it. This differs from an FSM or a Turing machine with finite memory in that you choose the quantity of memory when specifying the machine rather than the input.
(Actually, you only need a pushdown automata for context-free languages--I forget what the next step up in the hierarchy is, the one that does require Turing completeness)