Hacker News new | past | comments | ask | show | jobs | submit login
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit (arxiv.org)
86 points by sel1 23 days ago | hide | past | web | favorite | 11 comments



Everyone race to build an ASR toolkit these days, this one is a good candidate actually compared to others. A good modern technology with subword pieces, parallel training, fast decoding. Very competitive decoding accuracy.

Disadvantages are:

1) A bit disorgranized codebase with directly imported fairseq

2) No online decoding in design which is a must for real-world applications.

Other toolkits:

1) ESPnet - crazy dual chainer/pytorch backend, pretty slow from beginning, otherwise good.

2) Mozilla DeepSpeech - very lightweight technology, no real accuracy and speed.

3) nvidia/NEMO - potentially good performance from GPU experts, but not clear how it will develop in the future

4) speechbrain - just announced, no real code

5) facebook/wav2letter - C++ codebase, not within general NN community

6) tensoflow/lingvo - a playground for Google guys, who uses tensorflow these days?

7) kaldi - good old one (if 7 years is old for you), still has very important features others do not have (semi-supervised learning, long alignment). But no Pytorch again, not very attractive for general NN community.

8) didi/delta - did anyone try it at all?

9) PaddlePaddle/DeepSpeech - very old technology too, but Baidu releases very good models trained on their proprietary data


> who uses tensorflow these days?

I'm working on a speech model in tensorflow. What should I be using?


It more depends on features you already implemented. Check the arxiv file, if you do not have all those features already (lookahead lm, proper sentencepiece, label smoothing), consider this espresso.


I think the question was more on what framework should be preferred to tensorflow now?

Or at least that's my question - as someone who has taken an interest in this area since the deepdream days, but is only now considering diving in fully, what platforms should I be looking at, if not tensorflow?


There is no single framework as you see, to dive fully you need to explore all of them. Each has some useful features.


Yeah I find it a bit ridiculous how many there are.. it's like every major speech group has their own now.


Entirely OT, but I'm getting a bit tired of the coffee-based naming. Some other software named Espresso:

ESPResSo is a highly versatile software package for performing and analyzing scientific Molecular Dynamics many-particle simulations of coarse-grained atomistic or bead-spring models as they are used in soft matter research in physics, chemistry and molecular biology.

Espresso, for people who make delightful, innovative and fast websites — in an app to match. Espresso helps you write, code, design, build and publish with flair and efficiency.

Quantum ESPRESSO is an integrated suite of Open-Source computer code for electronic structure calculations and materials modeling at the nanoscale.

And the list goes on and on ...


Tech company names in general have completely lost the plot.

Looking ANYTHING up in search engines these days is completely and utterly derailed by tech companies (/programming languages/frameworks/etc) who insist on naming themselves after common words used in everyday language.

Go, Espresso, Vanilla, Box, Square, Stripe, Express, Next, Angular, Feather, Mint, it goes on forever.

Each one insists on branding themselves just with that word, using it on its own and poisoning search results and online content the world over.


I'm pretty sure that is affected by the search engine (google?) knowing enough about you that you're probably interested in programming languages/frameworks.

But having said that I am suddenly hit with the urge to make something popular and useful that I will name something like God, Sex, or Pizza.



Thank you. I was looking for this




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: