As much as I sometimes criticize FB and Google over privacy issues, they also do a lot of good by realeasing open source systems. Most of my work involves using TensorFlow and Keras, and a little over two years ago I replaced a convolutional text classification model with fasttext, with good results.
Disclaimer, I wrote the TF RNN api.
Convolutions are cheap to compute and can be more efficient on smaller data but it's also possible that a CNN outperforms when you have a small dataset and an RNN when you have a larger dataset.
Disclaimer: I co-founded SigOpt and wasted way too much of my PhD on "graduate student gradient descent"
While CNNs are tempting and fast to train I've never been able to get the accuracy I can from RNNs. In NLP, accuracy is important because for lots of tasks NLP is right at that inflection point of being good enough to be useful.. if it's good enough.
It's worth noting that this TCN paper gets a perplexity of 45.19 on WikiText-103. That was competitive in 2015.
The current state of the art is 29.2 - not their claim of 48.4 (unclear what that came from).
Still, CNNs are nice in an ensemble model if that's your thing. They do tend to pick up different things to RNNs, which can be useful.
Edit: I now understand why their reported metrics are so wrong. They use generic models to compare against. They list SOTA performance in their supplementary material (they still get them wrong though).
Please point to more substantive sources if I am wrong. I am very interested to make TCN work, as it is much faster.
You realize that they do this so top researchers still want to work for them?
* I'm confused by why you bring up tax-payer funded education... do you think these employees don't pay taxes?
* What is your definition of "valuable"?
HN crowd doesn’t accept that, but most people here have no idea about how business works.
1. Give free tools to people
2. Drive adoptions of tools
3. ??? (Data-driven pivot)
The public benefits from it, and they get to hire researchers that want to continue that work. It sounds like a win-win?
It's like our economy and its effect on the climate. Win-win for consumers and companies, but (without intervention) a downward spiral for our planet.
Also, it puts us in a morally difficult situation because we are benefiting from the ones we criticize, and as such, it is hypocritical.
Of course everyone can do as they please, but in my view it is best to look for moral-issue-free software instead of using BigCorp's candy-ware.
What moral difficulties do you see? My opinion is these companies are despicable but taking advantage of their generosity is not hypocritical. Applications and motives can be immoral, tools without human action simply exist.
Tools exist because of human action.
However, pushing it out surely does enhance the public knowledge - in the same way that Carmack released his 3d engines that were out dated in industry by one gen but still helped the public.
Lastly - they need to push it out as they need to attract top talent, and need to demonstrate they have top tech there (and are willing to let their researchers claim credit for it once it becomes old enough).
Perhaps it might help you if you look at it from the perspective that what Facebook has open-sourced here isn't affecting a billion people's privacy and it's not being willfully used as a tool of intimidation and propaganda by governments.
That's just a couple thoughts on why other posts might attract more comments.
They are demonstrably very good at using technology and their omnipresence against their users (to monetize them without their knowledge).
(Notice how much they emphasize privacy and security in their marketing...)
Specifically they claim word error rates that are 1 to 2 percentage points lower, 3.44% on "clean" and 11.24% on "other".
https://github.com/facebookresearch/wav2letter/issues/93 also mentions a pre-trained model but without any reference which one or where to find it.
Googling "librispeech-glu-highdropout.bin" still shows the text "luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai ..." for https://github.com/facebookresearch/wav2letter/blob/master/R..., but clicking it, it's gone.
But the Google Cache still has the result, including 3 pre-trained models:
It would be great if anybody could build it all and try if the out-of-the-box experience with the pretrained model is good.
I've tried Mozilla's DeepSpeech a few times but so far it didn't recognise "this is a test" reliably without mistake out of the box from a good microphone.
And so FB for instance can send some voice data to their servers and get a text output. And then FB can use text sentiment analysis to get further context about the message.
Sadly, most people don't have the speech data to train their own recognizers on large vocabulary systems, and that's even harder for languages that are not English. With exception of Google/Amazon/FB/Microsoft/Baidu/etc other people have to use the API's offered by the above companies to do high fidelity recognition. Which sucks because there is a cost to each recognition. You have to pay someone else to do it.
Whereas FB/Amazon/MS/Baidu/etc can do high fidelity recognition offline on large vocabulary and offer it as a service. THIS is why FB wants to make speech recognition systems.
I wonder if you could bootstrap a sizable speech dataset by trawling audio off YouTube and then using one of the really good cloud speech recognition services to label it. :)
There have been much better, larger datasets available for a long time, for example the Fisher English conversational telephone speech corpus was released in 2004 and contains ~1950h of transcribed speech. There are tons of other datasets in various languages and for various applications (conversational speech, broadcast transcription, etc.).
Edit: It's even less :
$0.00 1993 Member
Is the implication that offline Android recognition does not train on the owner's voice at all? I imagine a lot of phones these days are at least as powerful as the Pentium 200s used to train (successfully!) Dragon Dictate et al 20+ years ago.
Secondly, when I say "train", it is in a totally different context than how you seem to be using the term. You are using it in the context of adapting an acoustic model to a individual speaker to improve the performance. I am talking about building the initial model. Typical RNN or even convolution based algorithms require a lot of time and processing power to train. What's even harder to get than the processing power though is of course, data to train off of.
Secondly, the trained model itself is very big just for storage, and inference against the model is also resource intensive. This is why Android/Google maps/search/etc go out to the google's backend recognition servers for speech to text before falling back on the shitty (but relatively good) offline inline model (that may not even be using state of the art speech recognition techniques and may be using old school GMM based recognizers).
Finally, the large models trained on the backend servers using their distributed computing infrastructure are extremely more accurate than the shitty fallback model, so speaker dependent adaptations aren't necessary. If you can get very very good performance from a speaker independent model, why would you put the extra effort to make speaker dependent adaptations if the gain is very marginal? Not to mention the fact that speaker independent models are more useful in more situations and are extremely powerful. Google for instance can caption videos automatically using speech recognition, which is amazing. If the models were speaker dependent they wouldn't be able to do that. That's why the focus has been so much towards speaker independent models.
I totally disagree. Compared to Sphinx it is still lightyears better.
To wit I use it for my android based home automation voice recognition and even from a distance with background noise it still works about >90% accuracy. My original tests with Sphinx in a similar environment garnered about 30%.
There’s also the new Portal hardware.
Facebook as research arm dedicated to play Go/Starcraft also, what do you think their reason is for doing that?
They can use a speech recognition system to transcribe videos, just like what Youtube is doing to improve ad targeting and recommendation. Why is this hard to understand?
Is that true? I don't have the stats, but I guess a very small percentage of yt videos were uploaded 'for profit'. Like 1% or less? Maybe much less.
From a software standpoint, this has never been proven. However, its super weird when it happens to you.
For example, I traveled to meet a coworker who was playing a mobile game I had never seen before and we talked about it. I never Googled it or anything like that.
Hours later I checked Instagram and the first ad was for the same mobile game. Coincidence?
Perhaps the game was simply advertised more in his city than my own?
Perhaps our phones being near each other prompted a "friend request suggestion" and then a took that to another level with installed apps?
Or just a coincidence and I am thinking too much about it. lol.
“Ah, but there’s an undetectable binary blob that gets linked in and called without being detected by anyone working on the code,” you say. In that case consider the battery life impact. The power consumption of the Facebook app compares favorably to its social media peers. Is everybody else also recording, encoding and encrypting all the time?
So what? Thousands of the same people have access to the full source code of everything underhanded Facebook do and it hasn’t stopped anything.
Facebook has explicitly denied spying on people's conversations. I can't think of any situations where they have flat out lied about something like that, so I'm curious to hear what conspiracy theories they have proven true.
Shadow profiles for example.
It’s a complete mischaracterisation that Facebook shared private messages with Spotify. By the same logic, you could say Google is sharing your emails with Apple when you access Gmail using the iOS Mail client.
Spotify was offering an integrated client to FB’s chat service. This UI integration was a market failure and was discontinued years ago. Of all the things wrong with Facebook, this wasn’t worth the noise.
Scrolling through a social media feed is a heavy activity on a phone (which may be surprising because it seems passive). Constantly fetching more data from servers, decoding incoming images and videos in background threads, shuffling data to GPU-accessible buffers for fast scrolling, etc. — There’s a lot going on all the time when you’re scrolling mindlessly.
Is this accurate? I haven't written C++ since freshmen year of college and it was very cumbersome then.
Who said that, anyway? I don't see it in the text linked by the OP link.
Will be interesting to see what people can do with this and the available data sets.
Reminder that Mozilla's Common Voice project accepts voice donations! https://voice.mozilla.org/
Kabylia is a region in the north of Algeria mostly inhabited by Berber people who are bilingual in Algerian Arabic and Kabyle. In recent years, an independence movement has developed that emphasizes Kabyle over Arabic for reasons of internal cohesion. To confuse matters, there's also a pan-Berber movement denying the existence of a separate Kabyle language, classifying it as a dialect of Berber/Tamazight instead.
Those heated politics have led to a large number of Kabyles contributing to various linguistic corpus projects to gain visibility for their cause. E.g. trying to overtake Berber on https://tatoeba.org/stats/sentences_by_language (As far as I know, Mozilla's Common Voice shares data with the Tatoeba project.)
The recordings are public domain audio books of public domain books, so the licensing should be fine. The audio isn't annotated, but given the value involved I think it would be worth attempting to use forced alignment to annotate the recordings with their public domain source texts. Forced alignment using the sort of speech recognizer you're trying to train in the first place may be a bit "chicken and the egg", but from some experiments I've run myself existing open source speech recognizers can do it reasonably well. Humans could manually tune up the alignment to improve the quality if necessary.
As for motivating people to actually do that mundane work... well these are audio books so maybe the work isn't so mundane after all! The LibriVox recording of Tom Sawyer (read by John Greenman: https://librivox.org/tom-sawyer-by-mark-twain/) is pretty great and has been listened to by millions of people. If somebody created a "read along" web app that showed you the text of the book from Project Gutenberg getting highlighted as the audiobook from LibriVox was played, users who have an interest in reading/hearing the book could have their attention held by Mark Twain and with the right UI provide fine tuning for the forced alignment at the same time.
 Panayotov, V., Chen, G., Povey, D. and Khudanpur, S. (2015). LibriSpeech: an ASR corpus based on public domain audio books. Proc. ICASSP. http://www.danielpovey.com/files/2015_icassp_librispeech.pdf
This project doesn't look like it's particularly easy to build: https://github.com/facebookresearch/wav2letter/blob/master/d...
I've been looking into things like mozilla/deepspeech and other open source libraries for automatically converting my messages to text. I'll have to take a look at this project as well!
Sounds like a shill and I don’t really care. I’m a premium member with 6,000 minutes of transcript time per month (and sometimes I’ve used almost all of it) and I couldn’t be happier.
You can export everything, support and head of product are kind and responsive, and you can click in the transcription anywhere and it will play the audio at that point.
Exactly what I need.
My main complain is that it’s geared towards corporate environments for conferences, meetings, etc, and so the grouping isn’t exactly what I like, but I was my text editor to keep the links more to my liking.
Being able to search by word hundreds of hours of my thoughts is a fantastically empowering experience and I hope you find the same.
Let me know what you think! Shoot me an email if you want to chat about it ever. If you can't tell I'm a pretty big fan.
Whether it's ethical to contribute back to the project knowing that the unethical creator might derive unethical utility from your contributions, is perhaps slightly more complicated. However the same could be said of any open source project, you could create something new wholly from scratch and if you release it publicly, somebody else could use it for something unethical.
I commend your consideration of ethical concerns, which I think is lacking in the tech industry today. But in this particular case I don't believe there is too much cause for concern.
If your use of it contributes to its popularity, perhaps becoming the standard of X area, does that give Facebook the company more power and possibly enable other unethical actions?
I think it's probably not as much of a worry given the narrowness of the area, but I do think this is something to consider when it comes to React for example.