Sometimes you need just an FSM without attached process/thread with mailbox/queue (i.e. passive FSM).
Then you have an API to dispatch events and query current state.
For example if you have gen_server using FSM, with gen_fsm you will have 2 processes. If there was passive FSM module, you would save 1 process and handle twice as many simultaneous clients.
the fact that the gen_server is using a gen_fsm and that that requires two processes has little or no effect on the number of simultaneous clients you can have. Unless I am missing something, I am not quite sure why you are drawing that correlation.
When you handling millions of Comet clients per node - every byte counts. And even that processes are lightweight in Erlang, they still take about 300 bytes of memory + State.
FSM is abstract concept. When it's wrapped into Actor it's something else and not pure FSM. gen_fsm is more similar to Rational Rose "Capsule" design pattern, except that Capsule may have several message queues, while gen_fsm has only one.
You can also mask/unmask events in Capsule.
When many years ago I first learned Erlang I thought that every process will have built-in FSM as 1st class citizen (this is what I expected from programming language designed by Telecom company).
Also what I would like from FSM module, is to be able just to specify state transition table (STT) including on entry/exit. Then the callback functions only handle state transition logic. Currently gen_fsm is very verbose for huge FSMs and I sure it's easy to introduce bugs when coding STTs.
I think the reason most people don't use gen_fsm is because it's very basic and verbose, not because there is no need in built-in FSM behaviour in OTP.