It's true that BNF grammars can define non-regular languages, and so can't be parsed with an FSM. I wouldn't expect Leslie Lamport to have overlooked that fact, though... Skimming, I think what's being described are infinite state machines. FTA:
"For example, a BNF grammar can be described by a state machine whose states are sequences of terminals and/or non-terminals. The set of initial states contains only the sequence consisting of the single starting non-terminal. The next-state relation is defined to contain <s, t> iff s can be transformed to t by applying a production rule to expand a single non-terminal."
Isn't a stack just a "sequence of terminals and/or non-terminals"? (Or as you suggested, an infinite state machine where the states are all possible sequences of terminals and/or non-terminals.)
A pushdown automaton's stack alphabet is arbitrary (so you could make it "terminals union nonterminals" but you don't have to) and finite (so you could not make it "all sequences" without a length cap).
I suppose you could represent a PDA as an infinite state machine but the "stack" abstraction is kept around because it's clearer/easier to reason about. See, for example, the proof that CFLs are accepted by NPDAs and its correspondence with LR0 parsing.
Eh... for BNF grammars, you'd need a stack. Or an infinite state machine.