Hacker News new | past | comments | ask | show | jobs | submit login
Sharing new research, models, and datasets from Meta FAIR (meta.com)
228 points by TheAceOfHearts 5 months ago | hide | past | favorite | 54 comments



> non-commercial/research-only license.


Pity they aren’t including image generation. Multimodal generation with reference inputs is my #1 ask for things like novel illustrations.


It seems like their image-tokenization model might be useful for this, but also have you looked into stuff like ControlNet?


Control nets are useful for defining the exact position and behaviour of people in a picture. I won't deny. They don't allow you to get specific, repeated characters, however.


Ah, so like a multimodal model you can give like "[picture] this guy but doing a handstand in a penguin suit"?

There have been a few attempts at prompt-driven editing (I think it was called Instruct-something) and there's a whole field of work on stuff like animation transfer (megaportraits and emo come to mind for close-up, and there are a few things that do broad motion out there) but it's hard to suggest something without knowing your use case, and if you're looking for general purpose the research seems to just not be there yet. Like that might be the kind of thing that needs a "world model"

I think probably controlnet + some kind of LoRA embedding for your specific character is going to be the closest you can get right now, but it's definitely some extra work and not quite what you want


Exactly that. I think GPT-4o has demonstrated those abilities, but since they haven’t enabled image output yet we can only hope.


Control net + the same prompt + the same seed tends to be pretty decent if you're not savvy enough to train a lora


Interest that they’re releasing a deepfake detector. I expect that to get integrated into image generation training pipelines.


Throw "deepfake loss" onto the pile of loss functions.


Can you please point to the deepfake detector? I can't find it on the page. Thanks!


Multi-token prediction looks very interesting and quite elegant. It seems more efficient than predictive sampling.

From what I understand, they are essentially training it to have some form of representation of the context as a whole that is then used to generate the next n tokens, I feel like this is a nice next step towards "smarter" models. I wonder if a similar thing can be done for the inputs.

It's a shame they didn't compare it to llama3 since they had both a 6.7B and a 13B multi-token model. From what I could gather, the intruction-tuned llama3 is much better on HumanEval for example.


Is Llama3 400B still going to be released?


Yes, Yann LeCun posted about it on twitter ~2 weeks ago.

Edit, it was a month ago: https://twitter.com/ylecun/status/1793181068943639014


Under a “do not even look at it” license I’m sure. /s


What's the argument that the early-fusion setup scales/maintains easier than late-fusion? Does this come at a tradeoff or is it just uniformly better?


> models we’re releasing today were safety tuned

Cool, it only took about two years from wire spread public releases to SV lockdown of the powerful tool and then throw the scraps to the peasants.

The overloads of social ethics have already decided what you cannot have. Just like the elders of the internet intended…


They're releasing a model to the public that likely costed tens of millions, for free, and you're complaining that they don't want to have to deal with bullshit PR for giving a teenager fucking around in his backyard the recipe for a pipe bomb.


I think it’s cute you are using the popular example of “safety” and not realizing what they’re really intending to keep from you.


What do you think that is?


can someone give an intuition for the unified tokenizer that they mentioned? I understand the idea of using a GAN or an autoencoder to make a image tokenizer, but it's unclear how you combine that with text to have a single tokenizer over both modalities (if I'm understanding the paper correctly)


actually I just went into the paper and skimmed it.

Their image tokenizer creates image encodings in a discrete space of dimension 8192; each image consists of 1024 discrete tokens. The 8192 vector size matches their language tokenizer (discrete and also size 8192). Then they just mash those tokens together, which leads to downstream problems in training: because the two token spaces have different entropies, stable training in later steps is difficult, and they have to do a bunch of regularizations to make sure that gradients don't start exploding.

Interesting stuff!


Last time they did this, I was not granted access


Try a different email address, that's what fixed it for me.


Sure, that may work; but what does it mean?


I can't speculate that for you. Try reaching out to them if you must know.


When we say open X, we expect something like the Via Negativa principle; instead of inviting members into a closed community, exclude those that violate the code of the open community.


This is false. Any "open" event (or anything, really) can be closed for select individuals.

The reason for this exclusion just can't be a banned criteria such as race.

(I.e. kicking someone out after they've stirred up controversy)

I'm not saying that this happened to you, I'm just addressing the point you made, that anything "open" can't or shouldn't be exclusionary.


Don't do that; use your AI, ask Claude for example


Is it just me or does Meta seem to be doing better than most companies right now open sourcing their AI research?


they've been on this wave for years.

thanks to meta i have been creating instrumentals of some of my favorite songs with vocals https://github.com/facebookresearch/demucs


Yes, I'd also like to know why Meta has chosen this path, whereas many of the other big players haven't. Usually they all settle upon the same viewpoint.


It started by accident, with the original llama weights being leaked by two separate employees. They've since embraced opening the weights, which I'm all for.

As for why? I have a theory: Meta is not in a position to capitalize upon the model itself. Yes, they can use it internally, and maybe their competitors can copy it to - but there are no real competitors to Facebook or Instagram that can benefit from it enough to make it a differentiating facet.

Thus, releasing stuff for open source does two things:

1) Make them more attractive to research talent (Apple famously recently started to publish research because their traditional secrecy was causing issues with hiring top talent) and...

2) Continues to undermine the ability to make $$$ off of model alone, driving it towards being a commodity rather than the long term profit engine for other companies.


Wrong. FAIR has been open sourcing ML models and source code for the last 10+ years. It did not start with llama. Also, llama was not leaked by employees, but people in the broader community, who the weights were shared with.

For example:

Faster R-CNN - state of the art image segmentation, released in 2017.

FastText - text embedding models, 2016.

FAISS - vector DB, 2018.

https://github.com/orgs/facebookresearch/repositories has over 1,000 repos.


I was speaking about LLM weights specifically and llama, not all models and work at FAIR.

Per your links, it's clear that FAIR does have a good history of open source work.


This is very true. But we should also mention that Facebook sadly have been and are on a negative trajectory of openness. As someone working closely with them, there was a culture of nearly complete openness in the early years of their existence. Research was promptly shared in its entirety and licensing was compliant with open science. However, as the "AI boom" has grown, there is an increasing internal culture of holding parts of research back (my understanding is that this pressure comes from the C-suite). Licensing that previously was open source compliant, has had non-commercial clauses added more and more frequently and even non-standard complex agreements as we have seen for LLaMA 2 and 3. This is sad and the culture of openness is ultimately at risk as they become more and more like OpenAI, Google DeepMind, etc.

As I frequently point out, Facebook are free to decide on their own culture as they see fit and I am not entitled to their work. But it saddens me that they believe that compromising on their initial ideals is the way forward, rather than sticking to them through thick and thin. This, ultimately, makes it more and more difficult for me as an academic that believe in these ideals to work with them.


> It started by accident, with the original llama weights being leaked by two separate employees.

This is not true. Meta Fair has been built on openness from day 1. We published many papers and open-source d many repositories to reproduce the work


In short: "Commoditize the complement."


> weights being leaked

You can hardly call that "leak" when they basically were sending the weights to thousands of people who applied for access. It is not that they kept them secret.


Facebook/Meta has been doing open research in ML and NLP for far longer than the current LLM era: Convolutional Neural Networks for Sentence Classification (2014), FastText (2016), PyTorch (2016), fairseq (2017), LASER (2018), RoBERTa (2019), XLM-R (2019), and BART (2019),

They were always present at ACL with decent open research as far as I have been studying/working in NLP (2014).

It's part of a strategy to attract top talent in the field. If you want top researchers you have to let them publish, which in turn hones a reputation of solid research, attracting more talent.


> It's part of a strategy to attract top talent in the field. If you want top researchers you have to let them publish, which in turn hones a reputation of solid research, attracting more talent.

This. Although, it turns out, if you pay them well enough (like OpenAI), they'll forego publishing. If you can make enough to retire by 40, why work at places that pay less?

ofc, not all researchers think that way, and there are those who are in it for the science, not just money.


They’ve chosen the path of commoditizing their complement. To ensure that ML capabilities are not a differentiating factor in the market, make ML capabilities a commodity available to everyone at the marginal cost.


There isn't conflict in business model. And if people can help to improve their models, they can extract better value by themselves.

Also, maybe they need to improve their brand. Hoarding data for over a decade, maybe bringing something back now.


1/ Build a community (think Linux for OS, Android for mobile, React for frontend) and figure out monetization later. What is clear to most people is that something fundamental is changing in how we build and consume applications.

2/ Prevent OpenAI from cornering the future $$$ market. Unfortunately, Google search is hit as well, but it is more due to the generational shift.

3/ Attract the best AI researchers. A product is a good as its core set of people (often just a few).


Same reason they open sourced reactjs or encourage jestjs usage. If they give it away they can benefit by being embedded in the stack. Then all the scale issues they can beat implementers on because they have the money to run it. To keep ahead of people implementing their tech they have their own data trove to train on. You only get a small piece of it. It’s all to position themselves as a sensible solution.


All their users are writing "content" for free and communicate with each other. AI generated "content" does not really fit into this.

They do not want their users to go to ClosedAI or similar and communicate with an Artificial Stupidity instead talking to each other on Facebook.

So it is in their interest to undermine the market for Artificial Stupidities by releasing the models for free.


Apt user name and salient points.

Just wait until they have a fully intelligent automated pipeline for lifestyle ingestion (e.g. pervasive analysis of all communication and AV) into AI management (Timeline / Memories) backposted into web 2.0 feeds like Facebook.


It is called "tipping the market" and it is a well-known strategy in the business of platforms. [1]

Google did the same thing when they released android for free.

[1] https://www.harperacademic.com/book/9780062896322/the-busine...


Zuckerberg clearly lays out his position in an interview with Dwarkish, from about 2-3 months ago. Worth a watch if you’re curious.


My guess is that when Llama leaked on 4chan and it blew up in the community it somewhat forced their hand to go with an open strategy. But they also have a history with pytorch of reaping the benefits of an open strategy. The benefits are well explained in Google's leaked "We Have No Moat" document.


How so? Maybe in the past, but nothing announced today is open.


They are unable to compete with the top players on quality, so they instead compete by winning the largest user base with open models.


Note that due to the licenses no company will touch any of these with a 10 ft pole. Great for individuals looking to experiment though


The current LLaMA license prevents hyperscalers(>100M MAU?) to use it, but it allows anyone else to test it and then later apply for a commercial license.


Considering, that this is all based on data and personal information of people, the people in general should have the right to dictate Facebook what they can and cannot do with that data and what happens with the products made using the data. It is laughable, that they act here as if they are doing something friendly or fair. Same is true for other companies like Microsoft and "Open"AI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: