youre missing the forest for the trees. the library this demo is using for audio...

youre missing the forest for the trees. the library this demo is using for audio encoding (ggwave) was not made by the creators of this demo. speed (or lack thereof) aside, having a direct audio<->text encoding is much more computationally efficient than speech<->text generation.

on the subject of the encoding efficiency, the ggwave depo mentions the use of reed-solomon error correction to make transmission more reliable. im struggling to find any info on error correction used by bell 103 or other modems, but if they aren't as robust that could partially explain the discrepancy you're describing