Hacker News new | past | comments | ask | show | jobs | submit login
Senators send letter questioning Mark Zuckerberg over Meta’s LLaMA leak (venturebeat.com)
53 points by generalizations on June 8, 2023 | hide | past | favorite | 59 comments



Not to be a conspiracy theorist, but members of Congress considering freely available models to be a threat but “well-regulated” models hidden away behind APIs to be acceptable is just what I’d expect from an effective lobbying campaign by OpenAI/Microsoft.


Honestly, it isn't an unreasonable stance. Given that we, and most understandably them, don't really know the consequences of the tech but know it could completely transform society as we know it it is safer if you have a few entities that can control it.

Sometimes I get the feeling that we are doomed to get an AI monopoly/duopoly and sometimes the feeling that open source will prosper.

And while the enthusiast in me clearly wants it to be open source I don't feel it is a given that society will be better for it.


Really? I can't believe that the right approach is to lock up tech, especially with large corporations and/or government holding the keys. Arguably they're the ones most likely to abuse it. Witness Facebook that has actually transformed society raising the lunatic fringe to be on par with the mainstream using the magic of social networks and their algorithms that promote the most profitable speech, regardless of its corrosive effect on society. You don't need fancy AI to destroy the fabric of society.

The more appropriate path is developing countermeasures or counter-technology. This is only possible via free and open development of technology. God knows what monstrocity OpenAI, or any other corporation is cooking up behind closed doors. And we know their incentives are not aligned with those of a health functioning society.


> You don't need fancy AI to destroy the fabric of society.

You don't, but you can do it a million times worse in just a fraction of the time. Considering how trivial Facebook can worsen our lives just imagine how effective it could be. That is the risk you are taking. Before we have even time to course correct we've lost everything we know.

> The more appropriate path is developing countermeasures or counter-technology.

If we get good AI the countermeasures are not going to be great. They alone could easily be dystopian. Since, captchas etc. won't be good enough. The only way to TRY and prevent abuse is to ID every action and tie it to a physical person. You could try to make that anonymously but yeah, like that is going to happen. And that is the easy part.


Honestly, it is an unreasonable stance.


There's no need to presume lobbying. Freely released models, by definition, don't have a regulation body. It would be like replicators suddenly made it possible for anyone to produce automobiles or firearms with nearly zero variable costs. These things have life-changing consequences, so there are expectations of quality, safety and record keeping.

The simple truth is that our information tools have become so complex that they're capable of significant consequences. Bots already are being used to wage war on information, notably Russia's socio-political war that the U.S. doesn't even realize that it is in. That said, it doesn't mean that models must be proprietary. If anything, this must be treated like crypto protocols: open to many eyes and liberal amounts of sunshine to prevent anyone from encoding a Manchurian candidate.


I’m struggling to understand the qualitative difference between

a) the risks of openly available transformer models vs. API-restricted access in 2023

and

b) the risks of cheaply available home computers vs. centralized, more easily regulated mini- and mainframe computers in 1983

I mean, look at what that kid from WarGames was able to do!


I suspect if various stakeholders (u.s. senate, various corporate interest groups, etc) understood the implications of the home computers, they would've not hesitated to ban and otherwise regulate them. Up until early 2000s home computers were seen as toys for nerds (people forget!), a perspective that I'm sure many of them now regret.


> that kid from WarGames

That would be Kevin Mitnick. Oddly, the Wikipedia page doesn't seem to mention the movie.

https://en.wikipedia.org/wiki/Kevin_Mitnick


Just whom I was thinking of IRL. If you run through the letter's concerns about risk, they just about all got a lot worse once a "hacker" could get a general purpose computer and a 2400 baud modem for around the same inflation-adjusted price as a nice AI/ML-optimized setup today.

But I don't think senators at the time were sending ominous letters to the Steves & BillG asking why they were letting the general public have access to these dangerous tools rather than guarding them in machine rooms like IBM.


Sama is on a global lobbying tour right now.


Not to play the exact opposite role of a charitability theorist, but if one assumes OpenAI's founding principles are sincerely held, then that lobbying campaign may be more public good-intentioned than profit-intentioned. Under such a framing, they would would argue "yes, a freely available model indeed is a threat while a carefully protected model accountable to internal and governmental oversight is less of a threat" because they believe that to be the case, not because they want a monopoly.

(I don't necessarily actually believe this. I just feel like any good conspiracy theory warrants an equal and opposite angel-devil's advocate.)


> but “well-regulated” models hidden away behind APIs to be acceptable

This isn’t my takeaway from the letter [1]. Expressing caution towards one element of a thing doesn’t imply acceptance of or even preference for the other aspects.

[1] https://www.blumenthal.senate.gov/imo/media/doc/06062023meta...


I agree that the letter did try to give some nuance, but they clearly prefer that the models be hidden away behind APIs.

> At least at this stage of technology’s development, centralized AI models can be more effectively updated and controlled to prevent and respond to abuse compared to open source AI models.

> While centralized models can adapt to abuse and vulnerabilities, open source AI models like LLaMA, once released to the public, will always be available to bad actors who are always willing to engage in high-risk tasks, including fraud, obscene material involving children, privacy intrusions, and other crime.

> Meta’s choice to distribute LLaMA in such an unrestrained and permissive manner raises important and complicated questions about when and how it is appropriate to openly release sophisticated AI models.


I agree there is an implied bias against open source. I’m just pushing back that anything is implied to be “acceptable” by this letter.


In February 2023, Meta released LLaMA, an advanced large language model (LLM) capable of generating compelling text results, similar to products released by Google, Microsoft and OpenAI.

Unlike others, Meta released LLaMA for download by approved researchers, rather than centralizing and restricting access to the underlying data…

I don’t know any way to read this other than “you would have been fine by us if you kept it behind an API wall”.


It’s saying Facebook did something different; the Committee is asking why. There may be nefarious intent in the subtext. We should be wary of that. But the letter per se isn’t evidence of that intent.


Blumenthal also introduced the EARN IT act. Government influence over how people use computers and the Internet seems to be an interest for them.


Imagine that we now live in a country when a company can be called out by Congress for allowing the public to have access to a cool new thing.

As weinzieri said, "The crackdown has started."

"Show me your papers."


Let me be clear: Congress has no business discussing this any more. If they can demand that Zuckerberg justify himself, they can do it for anyone.

There is no potential crime involved in his data being leaked. It is as if Congress summoned you for having your bicycle stolen.

This is another step down the authoritarian road. I despise Zuckerberg, Meta and Facebook but this is BAD.


Over 300 million reside in a nation with easy access to guns, leading to corporate profits and fatalities, while legislators focus on leaked company figures.


Oh... just stop with this low quality comment. They are focusing on a lot of topics and this is just one of them.


> a company can be called out by Congress for allowing the public to have access to a cool new thing

Facebook didn’t “allow” the public access to LLaMA. It lost control of its model weights. That difference is material. (I know, practically speaking, there were zero controls on the weights. But by their own communication, it was “leaked,” not released.)


The result was as predictable as leaving a bucket full of candy in a room full of six year olds.


I see 3 options.

1. The senators have no clue.

2. They support monopolies (or at least oligopolies). That's not compatible with US legislation, is it.

3. LLMs are dangerous like atomic weapons and all proliferation needs to be universally banned.


I'm guessing at least (1) is true. Blumenthal is the senator who thought "finsta" was a product that Meta offered. Hawley in hearings has tried to get Meta to commit to not use the contents of encrypted messages for ad-targeting, apparently unaware that those contents were not readable. Being for years on a committee responsible for technology has not pushed them to develop an understanding of the domain.


Might not be compatible with US legislation but it's the most compatible with lining their own pockets.


4. The AI panic is at full force and the government thinks we need to regulate a glorified chatbot because to the average person and a surprising number of credulous techies said chatbot is somehow actual AI.

At this rate there will be no AI because it will be smothered in the cradle by governments and large corporations who will stop any possible free thinking individual who might produce the next breakthrough, lest they use this "dangerous" technology to usurp the former's power or the latter's profits.


3 is not likely as llama currently can't remember more than 2000 characters. I don't doubt they may think it's the case, but as dangerous as nukes would be a wild overstatement.


politicians tend to like monopolies as a subset of liking centralized control over things. This means they just need to get 10 people in the room to control entire industries


>That's not compatible with US legislation, is it.

Lmao. The trick is to have enormous legal bariers for entry, which is also what OpenAI seems to advocate for.


4. Senators are trying to find out if LLMs are dangerous like nuclear weapons or not. Hence an inquiry and not a law or a lawsuit.


>That's not compatible with US legislation, is it.

Maybe not de jure, but it sure seems it de facto is.


To those of you who live in Connecticut or Missouri, contact the shit out of these representatives and give them a piece of your mind. This sort of blatant attempt to restrict the power of individuals should not be allowed.


Or if you live in CA, as Alex Padilla is on the committee.


This feels like the slowpoke meme; there are many other equally-capable foundation models in the wild now. And even if there wasn't, DJB vs US established source code as speech. I'd strongly prefer to not see the federal government investigating speech.


Don't most of them have llama as their origin/basis? Which ones are independent completely?


There are tons of models that either LoRA or finetune LLaMA, but I meant foundational models, i.e., ones completely distinct from LLaMA. That includes Falcon, MPT, StableLM, StarCoder, Replit Code, CodeGen.

There's also RedPajama and OpenLLaMA, which are intended to give similar performance to LLaMA but be legally distinct.


That sounds like a shot across the Bow from open AI. here we go.

It’s pure unadulterated fun to watch these tech titans foil one another’s ambitions with cheap shots and swipes. Sama comes off as a little arrogant after all and downright scary sometimes. no doubt he is brilliant.


This seems stupid, because other smaller companies are releasing LLaMA-likes out into the open.

But I get why Facebook kept LLaMA legally restricted while throwing it out without much enforcement: to appease inquiries like this.


I don't think Facebooks moves were heavily strategically thought out.

Instead, I think a small team of a few researchers made LLaMA and published a paper about it[1], and then realised to get any citations for their paper they'd need to let other researchers have the weights. Citations are like money in the academic world. Every researcher wants citations. Executives and managers care less about that stuff.

[1]: https://arxiv.org/abs/2302.13971


Yann LeCun in a podcast said it was because of uncertain copyright status on models but an expected fair use defense if distributed for research only.


Letters like this make me think that Facebook may have intentionally "leaked" the weights. Certainly they didn't hesitate to open-source the model once the leak occurred.


The model was open for any researcher simply by applying; you had to enter your name, university or group affiliation, write like one sentence about your prior work or publications, agree to the noncommercial research only license, and that was it. They don't say how many people got to download the weights via this method but it was probably in the thousands.

And the weight files weren't watermarked/fingerprinted or anything. The hashes were all the same. Therefore no real way to trace who put up the torrents first.

So this isn't much of a 'leak' as it is... just good old piracy. Like if you found a torrent of a DVD rip of the movie Frozen, the MPAA wouldn't be blaming Disney for 'leaking' it.


The model was open source before the weights leaked


I mean, if they want to start on at companies distributing cool stuff that some people like but can be dangerous, guns are right there.


LLaMA weights are uploaded everywhere (including Huggingface, which Meta themselves regularly use) and I have not seen a single enforcement action. No one uses the access request forms or the XORs.

Seeing the comments here, I think HN is unaware of just how nonexistant Meta's policing of LLaMA is. They don't seem to care about their own license.

Hence my theory that Meta is having their cake and eating it. Clearly they don't mind LLaMA being widely used, but the restricted release and license gives them plausible deniability from displeased parties, especially those that have no idea what huggingface-hub even is.

Maybe this wasn't the initial intention, but it sure is an easy way forward.


"Vicuna is a fine-tuned version of LLaMA that matches GPT-4 performance."

Oh is that right?


> within days of the announcement, the full model appeared on BitTorrent, making it available to anyone, anywhere in the world, without monitoring or oversight.

Oh no! Anyways…


The Internet is a Series of Tubes, 2023 edition.

https://en.m.wikipedia.org/wiki/Series_of_tubes

>"A series of tubes" is a phrase used originally as an analogy by then-United States Senator Ted Stevens (R-Alaska) to describe the Internet in the context of opposing network neutrality.


Am I alone in hating this pattern of of Congress trying to do an end around the first amendment freedom of speech by pressuring companies to do it for them by using hearings to harangue them.

It is questionable that Congress could pass a law regulating that behavior or even if they passed it if it would be Constitutional.

Yet they harangue companies for fully legal behavior. Congress is elected to pass laws, not to be moral scolds.


Leak? There was a form to fill in for access lol


It's so cool to think of the billions of weights/numbers that are worth $100M. Would be cool to find out where in pi they are located and just share that coordinate.


I'm afraid, this is it. The crackdown has started. Next thing RIAA and consorts will chime in and request a take-down.


They should ask for the testimony of that famous hacker who leaked, I think his name was 4chan /s


I'm confident if we replaced congress with an LLM, the LLM would do a better job.


I'm confident if we replaced congress with Math.random(), the PRNG would do a better job.


No question about it.


Senators are realizing the power of AI and want to control it, bending it to their will using back-channels. OpenAI is going to have to "leak" its models as well or will become an extension of the political elites.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: