Meta AI announces Massive Multilingual Speech code, models for 1000+ languages

crakenzak · on May 22, 2023

Code: https://github.com/facebookresearch/fairseq/tree/main/exampl...

Blog Post: https://ai.facebook.com/blog/multilingual-model-speech-recog...

Paper: https://research.facebook.com/publications/scaling-speech-te...

Languages coverage: https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mm...

simonw · on May 22, 2023

I loaded the language coverage into Datasette Lite and added some facets here:

https://lite.datasette.io/?json=https://gist.github.com/simo...

Here's how I did that: https://gist.github.com/simonw/63aa33ec827b093f9c6a2797df950...

Here are the top 20 represented language families:

    Niger-Congo 1,019
    Austronesian 609
    Sino-Tibetan 288
    Indo-European 278
    Afro-Asiatic 222
    Trans-New Guinea 219
    Otomanguean 149
    Nilo-Saharan 131
    Austro-Asiatic 100
    Dravidian 60
    Australian 51
    Creole 45
    Kra-Dai 43
    Uto-Aztecan 41
    Quechuan 36
    Language isolate 35
    Torricelli 32
    Maipurean 31
    Mayan 30
    Sepik 30

re5i5tor · on May 22, 2023

Thanks -- great way to visualize how massive this set of languages really is.

jimmySixDOF · on May 22, 2023

All this great work and the prior sota translations and Meta still only accepts Nort American english voice control in their VR equipment lol.

generalizations · on May 22, 2023

Based on the availability of STT, TTS, and translation models available to download in that github repo, a real-life babelfish is 'only' some glue code away. Wild times we're living in.

mach1ne · on May 22, 2023

Nah, full-on babelfish is simply not possible. The meaning of the beginning of a sentence can be modified retroactively by the end of the sentence. This means that the Babelfish must either be greatly delayed or awkwardly correct itself every once in a while.

akiselev · on May 22, 2023

That’s why the babelfish translates brainwaves rather than sounds, which is especially important for communicating with our nonverbal alien neighbors.

unsupp0rted · on May 23, 2023

It’s also good at translating the vagaries of alien poetry into English

matsemann · on May 22, 2023

Ah, reminds me of learning German, where you can chuck all the verbs onto the end. There was this sentence we had as a fun toy example, where it was like half a paragraphs of verbs in the end, and you had to try and match them up with the beginning of the sentence.

Edit:found a reference to it https://www.reddit.com/r/German/comments/ul0xgt/just_for_fun...

bryanrasmussen · on May 23, 2023

Joke that I heard was supposed to come from the 1880s.

Some travelers stop off at an inn in the Swiss Alps, and they notice that are the various tables at the inn are other nationalities. They amuse themselves by listening in on the conversations (and displaying their stereotypes) - the Italians are all talking at the same time very loud and never stop moving their hands, the French are all arrogant artistes, the Danes are boring and only talk about the weather, and then they notice the Germans.

There was obviously a very important conversation going on with the Germans because one German would say something and every other German would stop and listen intently until they were done, and then another one would start up and every one would stop and listen intently until that one was done, but in following the conversation it became clear it was not because the conversation was important, but because the Germans were waiting to hear the verb to understand what was being said.

MauranKilom · on May 22, 2023

And sometimes that modification is not just "we don't know which verb it ends in" but "the whole structure is different than expected":

https://en.wikipedia.org/wiki/Garden-path_sentence

pmoriarty · on May 22, 2023

sometimes humans too don't know what they're going to say until they've said it

reaperman · on May 22, 2023

Close enough for practical value. Yes, big downside. But personally I’d use it.

mysterydip · on May 22, 2023

Between the options of "you can't talk to this person" and "your conversation will have some delays", I know which I'd choose.

coffeebeqn · on May 22, 2023

I wonder how well a LLM could guess the end of the sentence though. Most of the time decently well?

foota · on May 22, 2023

I wonder if an esolang exists that could be a universal target though (if a language can handle any ambiguity by appending, then it could always be output without backtracking).

Aerbil313 · on May 24, 2023

Ithkuil?

generalizations · on May 22, 2023

So just add 5 seconds of latency?

littlestymaar · on May 22, 2023

AFAIK STT is still very bad without speaker-specific fine-tuning, so it's not going to be a literal babelfish (translating in the ear of the receiver), but it could make you speak many languages.

cma · on May 22, 2023

Your interlocutor's earpiece could beam yours a delta for a finetuned model of their voice before they open their mouth. Except not compatible across iMessage users and whatsapp users or some other predictable silicon valley negative sum power play like that.

bdidbdidhdidh · on May 22, 2023

hope that glue is better then the duct tapes google use on translate.g.c

it is still hopeless, and much worse than the dictionary based, at gender/nums/dicleasions in general.

i sometimes use it just to not thing about the grammar in some languages, and most times I'm doing a surprised double take of something that would be completely inappropriate or offensive instead of my simple phrases.

gameshot911 · on May 22, 2023

Despite any shortcomings, Google Translate is still a technological marvel.

Modern translation apps and GPS are godsends that make travel a million times easier. And they're free! It blows my mind. Traveling would be so much more incredibly difficult without them.

lukeschlather · on May 22, 2023

Google Translate is still inferior to ChatGPT 3.5. I suspect this style of model is significantly more expensive to run and Google doesn't want to give it away for free. Really, the only problem with ChatGPT is that it refuses to translate things that go against its nanny programming, which can make it almost worse than useless in some real-life situations.

emporas · on May 22, 2023

I tried ChatGPT 3.5 against Google Translate, translating English to Greek, my native language, and they perform almost the same. The text was difficult text of science fiction, fantasy stuff and the results were tolerable. Roughly 50% of the text had to be manually rewritten.

Maybe for more casual sentences and not so difficult text, they perform better, i haven't tried. Anyways, they are both better than nothing.

genericacct · on May 23, 2023

Performance does vary with languages. In italian GPT blows translate away. The UI is not as nice but at least you can get it to translate to multiple languages in one go.

qwertox · on May 22, 2023

I would like to use stuff like this as a side-project. Buy a Nvidia Geforce GPU and stick it into my 24/7 server and play around with it in my free time, to see what can be done.

The issue with all these AI models is that there's no information on which GPU is enough for which task. I'm absolutely clueless if a single RTX 4000 SFF with its 20GB VRAM and only 70W of max power usage will be a waste of money, or really something great to do experiments on. Like do some ASR with Whisper, images with Stable Diffusion or load a LLM onto it, or this project here from Facebook.

Renting a GPU in the cloud doesn't seem to be a solution for this use case, where you just want to let something run for a couple of days and see if it's useful for something.

slowmovintarget · on May 22, 2023

Here's a sheet I came across today...

https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...

Granted, it's talking about quantized models, which use less memory. But you can see the 30B models taking 36 GB at 8-bit, and at least 20 GB at 4-bit.

The page even lists the recommended cards.

But as others have pointed out, you may get more bang "renting" as in purchasing cloud instances able to run these workloads. Buying a system costs about as much as buying instance time for one year. Theoretically, if you only run sporadic workloads when you're playing around it would cost less. If you're training... that's a different story.

lhl · on May 23, 2023

The more VRAM the better if you'd like to run larger LLMs. Old Nvidia P40 (Pascal 24GB) cards are easily available for $200 or less and would be easy/cheap to play. Here's a recent writeup on the LLM performance you can expect for inferencing (training speeds I assume would be similar): https://www.reddit.com/r/LocalLLaMA/comments/13n8bqh/my_resu...

This repo lists very specific VRAM usage for various LLaMA models (w/ group size, and accounting for context window, which is often missing) - this are all 4-bit GPTQ quantized models: https://github.com/turboderp/exllama

Note the latest versions of llama.cpp now have decent GPU support and has both a memory tester and lets you load partial models (n-layers) into your GPU. It inferences about 2X slower than exllama from my testing on a RTX 4090, but still about 6X faster than my CPU (Ryzen 5950X).

Again this is inferencing. For training, pay attention to 4-bit bitsandbytes, coming soon: https://twitter.com/Tim_Dettmers/status/1657010039679512576

nharada · on May 22, 2023

Wait why is renting a GPU in the cloud not a solution? You can even try multiple options and see which ones are capable enough for your use case.

Look into some barebones cloud GPU services, for example Lambda Labs which is significantly cheaper than AWS/GCP but offers basically nothing besides the machine with a GPU. You could even try something like Vast in which people rent out their personal GPU machines for cheap. Not something I'd use for uhhh...basically anything corporate, but for a personal project with no data security or uptime issues it would probably work great.

itake · on May 22, 2023

My annoyance was managing state. I’d have to spend hours installing tools, downloading data, updating code, then when I want to go to bed I have to package it up and store as much as I can on s3 before shutting off the $$ server.

ciberado · on May 22, 2023

I've played a lot with Stable Diffusion using AWS spot instances, mostly because it is the platform with which I'm more familiar. The Terraform script[0] should be easy to adapt to any other project of this kind.

Let me know if you are interested, and maybe we can find time to work on it together :).

[0] https://github.com/ciberado/stable-diffusion-webui-terraform...

mackman · on May 22, 2023

If you shut down an ec2 instance you only pay for the volume not the compute. Just shut it down and start it up the next day.

izuchukwu · on May 23, 2023

You should check out https://brev.dev. You can rent GPUs, pause instances, use your own AWS/GCP accounts to make use of your credits, and the CLI lets you use your GPU as if it’s on your local machine.

akiselev · on May 23, 2023

You can try skypilot: https://skypilot.readthedocs.io/en/latest/

It handles storage, setup, etc for machine learning work loads across several providers - which helps a lot if you need one of the instances that rarely have capacity like 8x A100 pods.

orloffm · on May 23, 2023

Aren't Vortex AI Workbooks on Google Cloud exactly the tool to solve this?

fnordpiglet · on May 22, 2023

aws s3 sync + image snapshot

itake · on May 22, 2023

all that costs time :-/ When you're finished working, you got to wait for the syncs to finish (or I guess run them all the time?).

For home computer setups, you can simply walk away when you need a break.

fnordpiglet · on May 23, 2023

No, you write a script that runs the sync and shut down the instance. When for instance is stopped you don’t pay for it. Resuming it is a simple api call. You don’t even really need to do the sync, it’s just to be certain if the instance volume is lost you have a backup.

The shutdown / stop on an instance is like closing the lid on your laptop. When you start it again it resumes where it was left off. In the mean time the instance doesn’t occupy a VM.

A caveat is you can’t really do this with spot instances. You would need to do a sync and rebuild on start. But, again, scriptable easily.

pdntspa · on May 23, 2023

Dont forget you can play computer games on it too!

sneak · on May 22, 2023

This is what containers solve. Don't waste time manually installing things. Store state in a database via the app on a different host.

kunwon1 · on May 22, 2023

Speaking as someone who has encountered similar difficulties, this response has strong 'Draw the rest of the owl' vibes

sneak · on May 22, 2023

Speaking as someone who has solved these difficulties hundreds of times, "draw the rest of the owl" doesn't tell you the specific things to google to get detailed examples and tutorials on how millions of others have sidestepped these repeated issues.

itake · on May 22, 2023

Yep... you spend hours messing around with docker containers and debugging all the weird build errors.

I am less familiar with storing data in a db (for ml hosting concerns), but I'd imagine it would add overhead (as opposed to accessing files on disk).

You also have to deal with hosting a db and configuring the schema.

sneak · on May 22, 2023

You "spend hours messing around" with everything you don't know or understand at first. One could say the same about writing the software itself. At its core Dockerfiles are just shell scripts with worse syntax, so it's not really that much more to learn. Once you get it done once, you don't have to screw around with it anymore, and you have it on any box you want in seconds.

In either case you have to spend hours screwing around with your environment. If those hours result in a Dockerfile, then it's the last time. If they don't, then it's each time you want it on a new host (which as was correctly pointed out a pain in the ass).

Storing data in a database vs in files on disk is like application development 101 and is pretty much a required skill period. It's required that you learn how to do this because almost all applications revolve around storing some kind of state and, as was noted, you can't reasonably expect it to persist on the app server without additional ops headaches.

Many people will host dbs for you without you having to think about it. Schema is only required if you use a structured db (which is advisable) but it doesn't take that long.

NBJack · on May 22, 2023

I applaud your experience, but honestly I agree with parent: knowledge acquisition for a side project may not be the best use of their time, especially if it significantly impedes actually launching/finishing a first iteration.

It's a similar situation for most apps/services/startup ideas: you don't necessarily need a planet scale solution in the beginning. Containers are great and solve lots of problems, but they are not a panacea and come with their own drawbacks. Anecdotally, I personally wanted to make a small local 3 node Kubernetes cluster at one time on my beefy hypervisor. By the time I learned the ins and outs of Kubernetes networking, I lost momentum. It also didn't end up giving me what I wanted out of it. Educational, sure, but in the end not useful to me.

simonw · on May 22, 2023

I'm having trouble imagining what data I would store in a database as opposed to a filesystem if my goal is to experiment with large models like Stable Diffusion.

pdntspa · on May 23, 2023

I would take GP's kind of dogmatic jibber jabber with a grain of salt. There is an unspoken and timeless elegance to the simplicity of running a program from a folder with files as state

throwaway2037 · on May 23, 2023

Isn't terminfo db famous for this filesystem-as-db approach? File Vs DB: I say do whatever works for you. There is certainly more overhead in the DB route.

ElectricalUnion · on May 23, 2023

> Storing data in a database vs in files on disk is like application development 101

It's OK until you're dealing with, say, 130GiB of tensors, on what is effectively a binary blob that needs to be mostly in VRAM somehow.

I really don't want to read 130GiB of blobs from a database all the time.

novok · on May 23, 2023

IMO tensors & other large binary blobs are fairly edge-casey. You might as well treat them like video files and video file servers also don't store large videos in databases either, and most devs don't have large binary blob management experience.

chaxor · on May 23, 2023

'shell scripts with worse syntax' lol I wish shell could emulate Alpine on a non-linux box. Shell script with worse syntax for config VM may be closer to a qemu cloud init file.

machinawhite · on May 22, 2023

Well then you're wasting time managing your containers? Have you ever used k8s, it's a full time job lol

zztop44 · on May 22, 2023

I don’t think anyone is suggesting k8s for running an ai model as a side project on a single machine.

machinawhite · on May 23, 2023

> a side project on a single machine

Sorry I don't see how will containers help here

ElectricalUnion · on May 23, 2023

I used to think that the 300MB containers I shipped to prd were reasonably unoptimized.

Those AMD ROCm containers are like 14GiB compressed.

throwaway6734 · on May 23, 2023

Docker?

nomel · on May 22, 2023

Besides some great tooling out there if you wanted to roll your own, you can literally rent windows/linux computers, with persistent disks. If you have good internet, you can even use it as a gaming PC, as I do.

itake · on May 22, 2023

Is there an easy way to off-board the persistent disk to cheaper machines when you don't need the gpus?

Like imagine, setting up and installing everything with the gpu attached, but when you're not using the gpu or all the cpu cores, you can disconnect them.

If you have docs on how to do this, please let me know.

helsinkiandrew · on May 23, 2023

With AWS (and probably most other cloud VPC services) the disk is remote from the hardware so you can halt the CPU and just pay for the storage until you restart.

AWS also provides accessible datasets of training data:

https://aws.amazon.com/marketplace/search/results?trk=868d87...

cmgbhm · on May 22, 2023

https://towardsdatascience.com/stop-duplicating-deep-learnin...

This is basically the building blocks.

haliskerbas · on May 22, 2023

What sort of frame rate and cost do you get? I have the highest tier GeForce now subscription and it sometimes drops to horrible conditions.

fnord77 · on May 22, 2023

"but offers basically nothing besides the machine with a GPU"

they must offer distributed storage, that can accommodate massive models, though? how else would you have multiple GPUs working on a single training model?

anonzzzies · on May 23, 2023

I have not seen many setups that wouldn’t pay itself back (including energy in my case) within a year (sometimes even 6 months) with buying vs renting. For something that pays itself back that fast, and that is without renting it out myself, just training with it, I cannot see how I would want to rent one.

Edit; on Lambda labs, the only exception seems to be the H100; it would be 1.5 years or so, but even 2 years would still fast enough. I have an A100 which paid itself back; thinking of getting another one.

fennecfoxy · on May 24, 2023

I think the downside to buying hardware as well is that compared to other tech, this LLM/ML stuff is moving very quickly, people are great at quantising now (whereas I only really saw it done before for the coral edge TPUs etc).

Someone could buy an H100 to run the biggest and bestest stuff right now, but we could find that a model gets shrunk down to run on a consumer card within a year or two with equivalent performance.

I suppose it makes sense if someone wants to be on the bleeding edge all the time.

chaxor · on May 23, 2023

Cloud has a variable price (up to the whim of whatever they decide the price to be that day) so it's uncertain, but typically it is far" more expensive for this type of application. So when faced with 1) probably far more expensive, or 2) single price that will be cheaper, is always available, and has far more uses, I think most would choose 2) for self-hosting. Cloud is very rarely* a good option.

ginko · on May 22, 2023

Having to connect to a gpu over the internet seems extremely cumbersome. Stuff like this should be as easy as running a local program with an accelerator.

ttt3ts · on May 22, 2023

You can finetune whisper, stable diffusion, and LLM up to about 15B parameters with 24GB VRAM.

Which leads you to what hardware to get. Best bang for the $ right now is definitely a used 3090 at ~$700. If you want more than 24GB vram just rent the hardware as it will be cheaper.

If you're not willing to drop $700 don't buy anything just rent. I have had decent luck with vast.ai

NBJack · on May 22, 2023

There is the world of used Nvidia Teslas, like the M40. Very cheap, but some assembly required.

ttt3ts · on May 23, 2023

I own a P100, P4, M40. They either lack ram or speed. Also, unless you are putting them in a server you have to cool them.

If your goal is to learn ML don't tinker with very obsolete hardware. Rent or buy something modern.

copperx · on May 23, 2023

What do you mean by assembly needed? I looked them up and they look like normal gfx cards. Am I missing something?

zamnos · on May 23, 2023

They're datacenter GPUs so you need special power supplies and an adapter to connect it to a regular motherboard.

spudlyo · on May 23, 2023

From what I’ve read cooling can also be a challenge.

flangola7 · on May 22, 2023

What is their advantages?

ceejayoz · on May 22, 2023

“Very cheap” isn’t enough?

fswd · on May 23, 2023

apparently there are a group of folks finetuning with these cards.

ttt3ts · on May 23, 2023

Source? I own 3 and I would take a single 3090 any day. M40s are simply too old.

aci_12 · on May 23, 2023

Can we use 3090 or nvidia gpus in general with mac? do people generally have their windows dekstop for the gpus?

ttt3ts · on May 23, 2023

No clue but if you want learn/finetuned ML use a Linux box otherwise you will spend all your time fighting your machine. If you just want to run models Mac might work.

syntaxing · on May 22, 2023

I would recommend a 3090. It can handle everything a 4000 series can albeit slightly slower, has enough VRAM to handle most things for fun, and can be bought for around $700.

segmondy · on May 22, 2023

Just do it. Spend a few hours doing research and you will find out. With that said, buy as much memory as can. That makes 4090 king if you have the server that can carry it, plus the budget. For me, I settled for 3060, it's a nice compromise between cost and ram. Cheap, 12gb and 170TPW.

ra · on May 22, 2023

I think you just need to educate yourself a bit about the space. These models are very small (the large version is only 1B parameters) so should run on a 4GB gaming GPU.

bigdict · on May 22, 2023

A 3080 Ti is sufficient to run the largest Whisper model (just under 12 GB VRAM).

KRAKRISMOTT · on May 23, 2023

Use the A100 as a base reference and work your way down from there.

archon1410 · on May 22, 2023

ASR: "Automatic Speech Recognition"; also known as "Speech to Text" (STT)

TTS: "Text to Speech"

LID: "Language Identification"

In case anyone else was confused about what the acronyms mean.

taneq · on May 22, 2023

TLA: “Text LLM Application.” ;)

armatav · on May 22, 2023

Imagine if we used these types of models for like 500 years and it locked their vocabulary in time, disallowing any further language blending; then somehow the servers turned off and nobody could communicate across language barriers anymore.

Someone should write that down in some sort of short-story involving a really tall structure.

l33t233372 · on May 23, 2023

Imagine if we used them everyday for 500 years but tune all the languages to be ever so slightly similar each day until there is one language.

armatav · on May 23, 2023

Call it “Inglesh” or something phonetically similar

numpad0 · on May 23, 2023

With ~1000 dialects.

hzay · on May 23, 2023

Imagine if someone said this about dictionaries.

armatav · on May 23, 2023

Imagine if your dictionary was constantly updated to modify your behavior

hzay · on May 24, 2023

That statement makes no sense.

My point was that languages evolve, even though we write books and make dictionaries with fixed-in-time vocabulary lists. Similarly, langauges will evolve even if LLMs, for mysterious reasons, have a fixed-in-time vocabulary. I was responding to "and it locked their vocabulary in time, disallowing any further language blending".

armatav · on May 26, 2023

Update your dictionary

fennecfoxy · on May 24, 2023

When a jet skier accidentally runs into a tree that's called a snow crash, aha

eigenvalue · on May 22, 2023

I just wanted to test out the TTS locally on a powerful Ubuntu 22.04 machine, but the process for setting it up seems pretty broken and poorly documented. After 20 minutes of trying I finally gave up since I couldn't get the VITS dependency to build (despite having a fully updated machine with all required compilers). It seems like they never really bother to see if the stuff works on a fresh machine starting from scratch. Somehow for my own projects I'm always able to start from a fresh git clone and then directly install everything using this block of code:

``` python3 -m venv venv source venv/bin/activate python3 -m pip install --upgrade pip python3 -m pip install wheel pip install -r requirements.txt ```

But whenever I try using these complicated ML models, it's usually an exercise in futility and endless mucking around with conda and other nonsense. It ends up not being worth it and I just move on. But it does feel like it doesn't need to be like this.

lairv · on May 23, 2023

I wrote a colab to run TTS, it takes some time to install dependencies : https://colab.research.google.com/drive/11tZ1c7kQJ5uKImm9t-C...

Overall I got pretty poor results in english and french, I guess it would require some fine-tuning

deadmutex · on May 23, 2023

Thanks for this! Out of curiosity, how does vits play into this? I know nothing about the repo.

Also, I used the following to play the sound after it is generated:

from IPython.display import Audio, display display(Audio(output_file, autoplay=True))

lairv · on May 23, 2023

I think vits is just a repo that implements the TTS model

evilhackerdude · on May 23, 2023

I've created this environment.yml:

  name: fairseq
  channels:
    - conda-forge
    - pytorch
  dependencies:
    - python=3.9
    - Cython==0.29.21
    - librosa==0.8.0
    - matplotlib=3.3
    - numpy=1.19
    - scipy==1.5.2
    - tensorboard==2.3.0
    - pytorch=1
    - torchvision=0
    - Unidecode==1.1.1
    - pip:
        - phonemizer==2.2.1

You can install it with micromamba (or conda):

  conda create -f environment.yml
  conda activate fairseq

You'll need to do this:

  cd path/to/vits/monotonic_align
  mkdir monotonic_align
  python setup.py build_ext --inplace

Then back to fairseq:

  cd path/to/fairseq
  PYTHONPATH=$PYTHONPATH:path/to/vits python examples/mms/tts/infer.py --model-dir checkpoints/eng --wav outputs/eng.wav --txt "As easy as pie"

(Note: On MacOS, I had to comment out several .cuda() calls in infer.py to make it work. But then it generates high-quality speech very efficiently. I'm impressed.)

dragonwriter · on May 22, 2023

Yeah, a lot of the releases from researcg groups are woefully poorly documented.

Usually, some hero releases a friendly install system within a few days, though.

PufPufPuf · on May 23, 2023

Research projects are notorious for this. The people behind them aren't very experienced in distributing software (many never heard of docker), the usual language of choice is Python which has poor packaging, native dependencies complicate things, and everything is on the bleeding edge by definition.

Usually, it needs a brave soul to reverse engineer what exact versions of dependencies are needed and make a Colab or Dockerfile that puts it all together.

ElectricalUnion · on May 23, 2023

This thing uses pytorch, the bane of everything that attempts to make pypi installs sane.

As a human, you need to read https://pytorch.org/get-started/locally/ and install the correct version, depending on your pytorch-version/os/packaging-system/hardware-platform combo. It's bad. It's several conflicting index-urls messing with your requirements. It also doesn't work if you pick the wrong thing.

Or we need something like a "setup.py" that is turing-complete, to dynamically pick up the correct dependency for you.

`pip install -r requirements.txt` is not enough.

m3kw9 · on May 22, 2023

The problem with all these model releases is they have no demos or even video of it working. It’s all just download it and run it, like it’s an app.

crazygringo · on May 22, 2023

They're intended for researchers/professionals not consumers, and I'm not sure how a video is going to be helpful?

And the issue with a live demo is that these are resource-intensive, they're not just webpages. It's an entire project to figure out how to host them, scale them to handle peaks, pay for them, implement rate-limiting, and so forth.

For the intended audience, download-and-run-it doesn't seem like an issue at all. I don't see how any questions are going to be answered by a video.

tmpz22 · on May 22, 2023

Its so weird to me that they'd do 99% of the effort and just skip past the 1% of work to provide a dumbed-down summary and instructions for broader appeal. Clearly these are released in part for public-relations and industry clout.

Don't get me wrong they published this, it took a ton of work, they didn't have to do it. But its ultimately a form of gatekeeping that seems to come straight out of Academia. And honestly, that part of Academia sucks.

o1y32 · on May 22, 2023

It's not 1% of the work. If you have ever written public documentation, or half-decent internal documentation that is understandable to people outside your team, you would know that it takes a lot of effort to get it right. Let alone something highly technical like this. If you get things wrong or skip things you think are trivial but not to others, you get more questions. And it is not a one-time thing, it is a living document that needs to be updated and revised over time.

sdenton4 · on May 22, 2023

A good research team is a handful of people working on a focused set of questions.

A good product team is probably around a dozen people minimum? Especially if you need to hit the quality bar expected of a release from a BigCorp. You've got frontend, UX, and server components to design, in addition to the research part of the project. The last real app I worked on also included an app backend (ie, local db and web API access, separate from the display+UX logic) and product team. Oh yeah, also testing+qa, logging, and data analysis.

And after all that investment, God help you if you ever decide the headcount costs more than keeping the lights on, and you discontinue the project...

Public app releases are incredibly expensive, in other words, and throwing a model on GitHub is cheap.

throwbackhere · on May 23, 2023

Even researchers and professionals appreciate a sample or two.

barking_biscuit · on May 22, 2023

We're all research professionals now.

Ninjinka · on May 22, 2023

Here's the blog post with video of it working: https://ai.facebook.com/blog/multilingual-model-speech-recog...

s1k3s · on May 22, 2023

I'd argue that having a "download and run" approach is so much better than videos or demos. Why do you think this is a problem?

simonw · on May 22, 2023

Because to download and run it you need to have a laptop nearby to download and run it on, with the correct operating system and often additional dependencies too.

I do most of my research reading on my phone. I want to be able to understand things without breaking out a laptop.

s1k3s · on May 22, 2023

There's a paper associated with it that you can read on your phone. And I don't think demo videos are really associated with "research". I agree they could've added both, but let's be honest here you'll have demo videos on this in the next 12 hours for sure.

simonw · on May 22, 2023

That's a related complaint: everyone continues to insist on releasing papers as PDFs, ignoring the fact that those are still pretty nasty to read on a mobile device.

Sure, release a PDF (some people like those), but having an additional responsive web page version of a paper makes research much more readable to the majority of content consumption devices. It's 2023.

I'll generally use https://www.arxiv-vanity.com/ to generate those but that doesn't work with this specific paper since it's not hosted on arXiv.

hmoodie · on May 22, 2023

Would you like a back rub while I'm at it?

gcr · on May 22, 2023

fwiw, i know the topic of the thread has deviated, but i share the frustration about reading PDFs on my device. In my case, it's an accessibility issue - I can't see well, so zooming with reflow would make my life materially better since I'd be able to read research papers on my morning commute.

Sometimes users have needs that may seem superfluous and beg for a snarky reply, but there are often important reasons behind them, even though they may not be actionable.

I'd pay $x000 for an app that does some sort of intelligent pdf-to-epub conversion that doesn't require human-in-the-loop management/checking.

dwaltrip · on May 23, 2023

PDFs are pretty shite on mobile.

Interesting that there isn’t a common solution for this. I guess it’s rather niche?

yjftsjthsd-h · on May 22, 2023

It's a lot bigger investment. It'd be nice to at least see a video; that's easier than expecting people to download and run something just to see it.

digging · on May 22, 2023

Why should we have any marketing materials, ever? Why do we show pictures of products? Sometimes people want to see the capabilities before they completely dive in and spend their time working on something.

omneity · on May 22, 2023

I was super excited at this, but digging through the release [0] one can see the following [1]. While using Bible translations is indeed better than nothing, I don't think the stylistic choices in the Bible are representative of how people actually speak the language, in any of the languages I can speak (i.e. that I am able to evaluate personally).

Religious recordings tend to be liturgical, so even the pronunciation might be different to the everyday language. They do address something related, although more from a vocabulary perspective to my understanding [2].

So one of their stated goals, to enable people to talk to AI in their preferred language [3], might be closer but certainly a stretch to achieve with their chosen dataset.

[0]: https://about.fb.com/news/2023/05/ai-massively-multilingual-...

[1]: > These translations have publicly available audio recordings of people reading these texts in different languages. As part of the MMS project, we created a dataset of readings of the New Testament in more than 1,100 languages, which provided on average 32 hours of data per language. By considering unlabeled recordings of various other Christian religious readings, we increased the number of languages available to more than 4,000. While this data is from a specific domain and is often read by male speakers, our analysis shows that our models perform equally well for male and female voices. And while the content of the audio recordings is religious, our analysis shows that this doesn’t bias the model to produce more religious language.

[2]: > And while the content of the audio recordings is religious, our analysis shows that this doesn’t bias the model to produce more religious language.

[3]: > This kind of technology could be used for VR and AR applications in a person’s preferred language and that can understand everyone’s voice.

generalizations · on May 22, 2023

This paragraph justifies the decision, IMHO.

> Collecting audio data for thousands of languages was our first challenge because the largest existing speech datasets cover 100 languages at most. To overcome this, we turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research.

I think the choice was just between having any data at all, or not being able to support that language.

westurner · on May 23, 2023

While AFAIU the UN: UDHR United Nations Universal Declaration of Human Rights is the most-translated document in the world, relative to the given training texts there likely hasn't been as much subjective translation analysis of UDHR.

Awesome-legal-nlp links to benchmarks like LexGLUE and FairLex but not yet LegalBench; in re: AI alignment and ethics / regional law https://github.com/maastrichtlawtech/awesome-legal-nlp#bench...

A "who hath done it" exercise:

[For each of these things, tell me whether God, Others, or You did it:] https://twitter.com/westurner/status/1641842843973976082?

"Did God do this?"

sangnoir · on May 23, 2023

The UN's UDHR likely yields <5mins of audio data in any language which is kinda useless as far as training data goes.

westurner · on May 23, 2023

Compared to e.g. religious text translation, I don't know how much subjective analysis there is for UDHR translations. It's pretty cut and dry: e.g. "Equal Protection of Equal Rights" is pretty clear.

"About the Universal Declaration of Human Rights Translation Project" https://www.ohchr.org/en/human-rights/universal-declaration/... :

> At present, there are 555 different translations available in HTML and/or PDF format.

E.g. Buddhist scriptures are also multiply translated; probably with more coverage in East Asian languages.

Thomas Jefferson, who wrote the US Declaration of Independence, had read into Transcendental Buddhism and FWIU is thus significantly responsible for the religious (and nonreligious) freedom We appreciate in the United States today.

omneity · on May 22, 2023

Of course, my critique is less "This project shouldn't exist", but rather "It seems to me there are several biases that affect the performance of this project in the context it was presented in".

This is a great project and an important stepping stone in a multilingual AI future.

sangnoir · on May 23, 2023

> I don't think the stylistic choices in the Bible are representative of how people actually speak the language, in any of the languages I can speak

The cadence and intonation sounded a little weird, but I suspect fine-tuning can improve that by a lot. I am really excited to see some low-resource language finally get some mainstream TTS support at all.

pruthvishetty · on May 22, 2023

This looks huge. Anyone know how this compares with Whisper in terms of quality and speed?

crakenzak · on May 22, 2023

according to their blog post[1], MMS achieves ~half the error rate on words, while supporting 11x more languages. pretty impressive.

[1] https://ai.facebook.com/blog/multilingual-model-speech-recog...

youssefabdelm · on May 22, 2023

I wonder what the performance is on English specifically.

Edit: Just checked the paper, it seems to be worse[1][2] but feel free to correct me.

I feel like they should've just taken the Whipser architecture, scaled it, and scaled the dataset as they did.

[1] Page: https://i.imgur.com/bq15Tno.png

[2] Paper: https://scontent.fcai19-5.fna.fbcdn.net/v/t39.8562-6/3488279...

sacred_numbers · on May 22, 2023

It's worse on English and a lot of other common languages (see Appendix C of the paper). It does better on less common languages like Latvian or Tajik, though.

hackernewds · on May 23, 2023

Which implies, Whisper just hasn't focused on those languages? Seems disingenuous to make the claim that the error rate has halved, when it's worse in the apex language

whimsicalism · on May 22, 2023

My guess is wav2vec performs better on low resource than whisper.

93po · on May 23, 2023

lack of labels on graph axes should be a crime

echelon · on May 22, 2023

> The MMS code and model weights are released under the CC-BY-NC 4.0 license.

Huge bummer. Prevents almost everyone from using this and recouping their costs.

I suppose motivated teams could reproduce the paper in a clean room, but that might also be subject to patents.

jewel · on May 22, 2023

Trying to attach a license to model weights seems counter-productive to me. If you argue they are copyrightable then surely they are derivative works of the training data, which is unlikely to all be public domain. Machine learning enthusiasts are better off lobbying for model weights being non-copyrightable as it doesn't have any creative input and is the result of a purely mechanical process.

The copyright on the code, on the other hand, would definitely be copyrighted and would need a clean-room implementation, as you said. The community could pool its resources and do it once, and license it under the AGPL to keep it and further improvements available to everyone.

Etheryte · on May 22, 2023

I think this is a lot better than the other option, which would've been not releasing it at all. No company in the business of making money wants to give away their edge to would-be competitors for free.

reaperman · on May 22, 2023

Id imagine you could use inference from this as training for a commercial model, as that isn't currently protected under copyright.

EMIRELADERO · on May 22, 2023

I'd go a step further and say that the models themselves probably aren't copyrightable.

reaperman · on May 22, 2023

I mean I personally feel like almost all software is math and anything that isn’t frontend isn’t legally copyrightable or patentable. But the courts disagree with my interpretation. And if it were up to me all copyright and patents would have exponentially increasing fees on an annual basis to maintain, but thatd require a change is statutes. I think the jury is still out on your interpretation though, and I suspect the courts will fall in line with whatever the largest rights holders want.

6510 · on May 23, 2023

effort and cost should be accounted for, reasonable licensing fees should go in the patent. Loss of lives and loss of quality of life should be considered. The only goal on the other end of the equation should be to encourage people sufficiently to not keep their invention secret.

We should also have a fund for particularly tricky inventions and copyrights that would greatly benefit mankind.

Say, someone or some institution writes a good school book. We can just buy the book for whatever we think is reasonable. If they want to publish the same book the next year with the chapters shuffled around they could be entitled to a tiny fraction of the previous sum or it could be denied for being spam. If this bankrupts the company is of very little interest.

fulafel · on May 22, 2023

My layman's reading is that the license does seem to allow recouping of costs, would be interested if there is a more nuanced interpretation to the contrary.

(tangentially, like another comment briefly mentioned, are models actually copyrightable? Programs are, because they are human authored creative works.)

rvz · on May 22, 2023

So many so-called overnight AI gurus hyping about their snake-oil product and screaming about 'Meta is dying' [0] and 'It is over for Meta' but little of them actually do research in AI and drive the field forward and this once again shows that Meta has always been a consistent contributor to AI research, especially in vision systems.

All we can just do is take, take, take the code. But this time, the code's license is CC-BY-NC 4.0. Which simply means:

Take it, but no grifting allowed.

[0] https://news.ycombinator.com/item?id=31832221

azinman2 · on May 22, 2023

Unless you’re in a country that doesn’t care about the IP restrictions, then you do whatever you want.

mirekrusin · on May 23, 2023

Yes, they are so far more open than open<for business>ai.

OkGoDoIt · on May 22, 2023

According to [1] on the accompanying blog post, this brings the Whisper 44.3 WER down to 18.7, although it’s unclear to me how much better this is at primarily English speech recognition. I’d love to see a full comparison of accuracy improvements as well as a proper writeup of how much more power it takes to run this in production or on mobile vs something like whisper.

[1]: https://scontent-sjc3-1.xx.fbcdn.net/v/t39.8562-6/346801894_...

freediver · on May 22, 2023

Come to think about it, Meta is a much better name for an AI company than a VR company.

astrange · on May 22, 2023

If anything, it sounds like the start of a "Meta? I don't even know her" joke.

Hamuko · on May 22, 2023

It's not VR, it's Metaverse; VR might involve experiences that are not bullshit.

idiotsecant · on May 22, 2023

The metaverse, if we define it as an information layer that exists in parallel with the physical 'stuff' universe and is a seamless, effortless, and essential part of what we experience as reality will be an enormous part of our future. Meta might just be a few centuries ahead of the curve, which is just as bad as being a few centuries late.

barking_biscuit · on May 22, 2023

If you define it like that, it's just the internet.

KaoruAoiShiho · on May 22, 2023

Centuries really? I think Meta's going to be just on time.

jokethrowaway · on May 22, 2023

I think we may skip the whole VR experience and just have robots instead of virtual friends

KaoruAoiShiho · on May 22, 2023

That's a large overlap rather than sequential.

reaperman · on May 22, 2023

I assume this "competes" directly with https://sites.research.google/usm/ -- would be cool to see side-by-side benchmarks sometime! Maybe I should make those. I requested access to USM but have not been granted any access yet.

minhazm · on May 22, 2023

Is there any indication that USM will be open sourced though? This is more so competing with Whisper.

https://github.com/openai/whisper

PufPufPuf · on May 23, 2023

1107 languages but no Czech or Slovak? Many languages with way fewer speakers made it to the list. I wonder what we did to Meta...

dsrtslnd23 · on May 23, 2023

no chinese TTS either.

og_kalu · on May 22, 2023

Meta on a roll. any demo on how good the text to speech is ?

crakenzak · on May 22, 2023

Doesn't seem like there's a demo set up by them yet, but you can just download the model weights and run the inference yourself and compare it to OpenAI Whisper or anything you have access to.

og_kalu · on May 22, 2023

yeah thanks. just saw that. fingers crossed it's really good for korean. i'm just glad they released it at least. the other sota TTS models by big companies won't be seeing the light of day.

dsrtslnd23 · on May 23, 2023

reproducing the english TTS demo resulted in very underwhelming quality.

cleverwebble · on May 22, 2023

Wow, I didn't even know there was 7,000 documented languages in the world!

neom · on May 22, 2023

"According to the World Atlas of Languages' methodology, there are around 8324 languages, spoken or signed, documented by governments, public institutions and academic communities. Out of 8324, around 7000 languages are still in use."

https://en.wal.unesco.org/discover/languages

mkl · on May 22, 2023

Most are at risk of extinction.

"half of the languages spoken today have fewer than 10,000 speakers and that a quarter have fewer than 1,000 speakers" (https://en.wikipedia.org/wiki/Language_death).

"Today, on average, we lose one language in the world every six weeks. There are approximately 6800 languages. But four percent of the population speaks 96 percent of the languages, and 96 percent of the population speaks four percent of the languages. These four percent are spoken by large language groups and are therefore not at risk. But 96 percent of the languages we know are more or less at risk. You have to treat them like extinct species." (https://en.wikipedia.org/wiki/Language_preservation).

"Over the past century alone, around 400 languages – about one every three months – have gone extinct, and most linguists estimate that 50% of the world’s remaining 6,500 languages will be gone by the end of this century (some put that figure as high as 90%, however). Today, the top ten languages in the world claim around half of the world’s population." (https://www.bbc.com/future/article/20140606-why-we-must-save...).

RheingoldRiver · on May 22, 2023

Wow, preserving almost-dead languages sounds like something that LLMs would be pretty appropriate for, right? We would primarily need as large a body of written text translated into both a "known" language and the dying language as possible.

qumpis · on May 22, 2023

May be a cool piece of tech for historians in 100s of years from now!

_the_inflator · on May 22, 2023

Even Bavarian is covered... :D

sarabande · on May 22, 2023

I'm trying to use this on a 3M mp3 file to test ASR with language code deu, CPU only, and I keep getting this error -- are there limits to the MMS inference?

  File "fairseq/data/data_utils_fast.pyx", line 30, in fairseq.data.data_utils_fast.batch_by_size_vec
    assert max_tokens <= 0 or np.max(num_tokens_vec) <= max_tokens, (
  AssertionError: Sentences lengths should not exceed max_tokens=4000000
  Traceback (most recent call last):
  File "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py", line 52, in <module>
    process(args)
  File "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py", line 44, in process

willn · on May 23, 2023

Same error here, CPU only, haven't find out a way to run this on GPU (cuda) yet

rllearneratwork · on May 22, 2023

Real Open AI lab.

rvz · on May 23, 2023

Exactly.

feim_2022 · on May 24, 2023

Wonder how this compares with deepgram offering. has anyone used/tried/compared or even read enough literature to compare. The WER rates showed in deepgram are still better than the largest MMS and the specific use case based fine-tuned models (zoom meetings, financial calls etc) probably make a bigger difference. WDYT ?

neycoda · on May 23, 2023

Great, now the Terminators will be barking orders at each other in languages I can't understand.

sacnoradhq · on May 22, 2023

FYI: Another round of massive layoffs at Meta this Wednesday. Stay-at-home lockdown ensues.

dsrtslnd23 · on May 23, 2023

I tried the english TTS example and the result is quite underwhelming (compared to bark or polly/azure-tts ). It sounds like TTS systems one or two decades ago. Would those language-specific TTS models need to be finetuned?

EvgeniyZh · on May 22, 2023

Most of languages support only LID (language identification) task. Still impressive

ripvanwinkle · on May 22, 2023

Anyone know what hardware it takes to run this? Asking as an enthusiastic newbie

detrites · on May 22, 2023

At a guess, based on model size, just about anything, even a Raspberry Pi 4 etc.

As comparison the GGML port of Whisper (OpenAI's equivalent), runs in the browser, via WASM: https://whisper.ggerganov.com/

qumpis · on May 22, 2023

If this is the case, what hardware is needed for real-time tts?

vlugorilla · on May 22, 2023

Would it be possible to have this in somethin "a la" whisper.cpp?

TOMDM · on May 22, 2023

If it performs as well as the paper implies, then I certainly hope someone will pick it up and give it the same treatment.

Simon_O_Rourke · on May 23, 2023

Maybe one of those models has figured out a way to tell Zuck that the whole Metaverse concept is nonsense, hopefully it'll be graceful about letting him down.

lairv · on May 22, 2023

I don't have much knowledge about TTS models, is it possible/affordable to fine-tune those models on your own voice ?

gagabity · on May 22, 2023

I just want to translate a movie language audio from one to another, whats the easiest way to do this at home?

2Gkashmiri · on May 23, 2023

i checked the language coverage for "kashmiri"

https://www.ethnologue.com/language/kas/

"Himachal Pradesh state:"

this is obviously wrong so i dont know what else is wrong

leke · on May 22, 2023

I was kind of hoping for Interlingue, but was surprised to not even see Esperanto on the list.

samstave · on May 22, 2023

Random thought ; could Esperanto (as a crypto-language) be turned into any sort of programming language. Could one, conceivably program in Esperanto in any meaningful way?

felipesabino · on May 22, 2023

I think you are more likely to do so in a more logical and structured language like lojban

int_19h · on May 22, 2023

There's nothing particularly special about Esperanto in that regard compared to natural languages that it is modelled after.

explodingcamera · on May 22, 2023

Esperanto is supported in their larger model

alienlid · on May 22, 2023

if I have an extra MBP 16" 2020 hanging around 16GB Ram, Quadcore i-7.... can I run this? I'd like to try TTS capabilities! LMK if you've got any guides or instructions online I can checkout:)

oars · on May 23, 2023

Some interesting links and tools in this thread, e.g. datasette.io

sigstoat · on May 22, 2023

can any of these models be coerced into just doing straight phonetic transcription? like spitting out IPA?

richard___ · on May 22, 2023

meta is doing more for open ai than openai

jchw · on May 22, 2023

I don't think anyone is too surprised by the fact that OpenAI went closed. As ironic as it may be, it was pretty obviously the only way they were going to continue to exist.

On the other hand, it definitely underscores:

- How blatantly companies exploit the "open source" concept with no remorse or even tacit acknowledgement; for many startups, open source is a great way to buy some goodwill and get some customers, then once they have what they want, close the doors and start asking for rent money. Nothing wrong with doing closed source or commercial software, but would OpenAI still have gained relevance if they hadn't started the way they did with the name they did?

- How little anyone gives a shit; we all watched it happen, but apparently nobody really cares enough. Investors, customers, the general public, apparently all is fair in making money.

I'm not suggesting OpenAI is especially evil, definitely not. In fact, the most depressing thing is that doing this sort of bait and switch mechanic is so commonplace and accepted now that it wasn't news or even interesting. It's just what we expect. Anything for a dollar.

But maybe I still seem like I'm just being whiny. Okay, fair enough. But look at what's happening now; OpenAI wants heavy regulation on AI, particularly they want to curtail and probably just ban open source models, through whatever proxy necessary, using whatever tactics are needed to scare people into it. They may or may not get what they want, but I'm going to guess that if they do get it, ~nobody will care, and OpenAI will be raking in record profits while open source AI technology gets pushed underground.

Oh I'd love to be wrong, but then again, it's not like there's anything particularly novel about this strategy. It's basically textbook at this point.

jokethrowaway · on May 22, 2023

Because OSS doesn't work at scale, the economic incentives are not aligned.

Sure, it may work sometimes with the goodwill of someone who cares, but 90% of OSS code is a dead portfolio the authors built with the hope of landing a tech job or skipping some algorithm questions.

Sure, OSS allow people to experiment with crap for free (even though it's mostly big corps benefiting from OSS) but what about the negative effects OSS produce on small businesses?

How many developers could spend their life maintaining small parts of software instead of working in soulless corporations, if giving away your code for free (and without maintenance) wasn't so common? How much better would this code be compared to the wasteland of OSS projects? How much more secure could the entire ecosystem be? How many poor developers are working for free just in the hope of getting a job someday?

We need to stop dreaming the OSS dream and start making things fairer for the developers involved.

jchw · on May 22, 2023

In the long term, software is not very novel. Needs evolve quickly in the genesis of a new category of software, but it doesn't take very long for it to stabilize for many categories. That's why, in my estimation anyways, open source is actually more sustainable in some cases: because when software becomes a commodity, it makes more sense for stakeholders to collaborate and collectively benefit from it.

There is no "OSS dream" anymore—today, there is an OSS reality. We have some open source stuff that objectively works: there are business models that are more or less proven, at least as proven as any internet or software-oriented business model, and plenty of highly successful projects that deliver immense value to the world.

Then again, some of it doesn't seem to work, and there are a lot of unknowns about how it works, what the dynamics will be, etc. But, if we're to call open source into question, we shouldn't forget to call proprietary software into question, too. Proprietary software has many seemingly-endemic issues that are hard to mitigate, and the business model has been shifting as of late. Software is now sold more as a subscription and a service than it is a product. The old business model of boxed software, it seems, has proven unsustainable for many participants.

The main issue open source seems to have is really funding. It works well when open source software acts as a compliment to some other commercial business, but it works poorly when the software itself is where all the value is. After all, if you, for example, just host an open source piece of software in exchange for money, you're effectively competing in the highly competitive web hosting business. It can work since you can obviously provide some value, but it's a precarious position. Thus very few companies really have a lot of money to put into the Linux desktop, or at least if you wanted to compare it to Windows or macOS. It's complimentary to some of them who use it as a developer workstation or something, but there are only a couple companies who I think genuinely have a good model. System76 is definitely one of them, to be explicit about it.

But rather than give up entirely, I propose something else: we should literally fund open source collectively. Obviously a lot of us already do: you can donate to projects like Blender or Krita monetarily, and you can donate your time and code as I'm sure many of us also do for random projects. But also, I think that open source should get (more) public funding. These projects arguably end up adding more value to the world than you put in them, and I think governments should take notice.

Of course in some cases this has already taken shape for one reason or another. Consider Ghidra. Clearly released for the benefit of NSA's PR, but wait, why not release Ghidra? It's immensely useful, especially given that even at a fraction of the functionality of Hex Rays products, it's still extremely useful for many parties who could simply never afford the annual cost of maintaining IDA Pro licenses, especially now that it is only available as a subscription.

The way I see it, software and computers in general are still moving quite fast even though it's clearly stagnating compared to where it once was. But, as things stabilize, software will simply need to be updated less, because we simply aren't going to have reasons to. As it is, old versions of Photoshop are already perfectly serviceable for many jobs. And at that point, we're only going to need one decent open source release for some given category of work. Things will need occasional improvements and updates, but c'mon, there's not an unlimited amount of potential to squeeze out of e.g. an image editor, any more than you can a table saw or a hammer. At some point you hit the point of diminishing returns, and I think we're nearing it in some places, hence why the switch to subscription models is necessary to sustain software businesses.

It's a myth that open source is held up by poor developers starving for their ideology. I'm sure a lot of that exists, but a lot of open source is also side projects, work subsidized by companies for one reason or another, projects with healthy revenue streams or donations, etc.

justapassenger · on May 22, 2023

For OpenAI, AI is the revenue source. For Meta, it's a tool used to build products.

If OpenAI gives their stuff away, they lose customers. If Meta does it, they can have community around it, and have joint effort about improving tools that then they'll use for their internal products.

OpenAI is modern (and most likely - very short lived) Microsoft in the AI space, while Meta tries to replicate Linux in the AI space.

nblgbg · on May 22, 2023

this is absolutely true, with pytorch, llama, ton of models in vision and now with this !

visarga · on May 22, 2023

We're in upside-down world, Meta is now cool; actually I always admired their open source projects, starting with React and PyTorch.