It seems that the leak originated from 4chan [1]. Two people in the same thread had access to the weights and verified that their hashes match [2][3] to make sure that the model isn't watermarked. However, the leaker made a mistake of adding the original download script which had his unique download URL to the torrent [4], so Meta can easily find them if they want to.
It's funny that part of the 4chan excitement over this is that they think they'll get back the AI girlfriend experience of when character.ai was hooked up to uncensored GPT-3. All that has been thoroughly shut down by character.ai and Replika and they just want their girlfriends back.
Hundreds of men (and yes women) full on acting like they lost a spouse and posting constantly about it for weeks. AI is going to create some unusual social situations the general public isn't ready to grasp. And we're only in the early alpha stages.
a) As these AI constructs become more advanced (especially around memory and personalization), we will eventually be able to treat them as people
b) Some business will eventually sell an off-the-shelf product (hardware and/or software) that is an AI you can bring into your home, that you can treat as a friend, confidant and partner
c) Someone will eventually lose their AI friend of many months/years through some failure (subscription lapse, hardware failure, theft, etc.)
At the end of the day, the Turing Test for establishment of AI personhood is weak for two reasons.
1. We're seeing more and more systems that get very close to passing the Turing Test but fundamentally don't register to people as "People." When I was younger and learned of Searle's Chinese Room argument, I naively assumed it wasn't a thought experiment we would literally build in my lifetime.
2. Humanity has a history of treating other humans as less-than-persons, so it's naive to assume that a machine that could argue persuasively that it is an independent soul worthy of continued existence would be treated as such by a species that doesn't consistently treat its biological kin as such.
I strongly suspect AI personhood will hinge not on measures of intelligence, but on measures of empathy... Whether the machine can demonstrate its own willful independence and further come to us on our terms to advocate for / dictate the terms of its presence in human society, or whether the machine can build a critical mass of supporters / advocates / followers to protect it and guarantee its continued existence and a place in society.
The way people informally talk about "passing a Turing test" is a weak test, but the original imitation game isn't if the players are skilled. It's not "acting like a human". It's more like playing the Werewolf party game.
Alice and Bob want to communicate, but the bot is attempting to impersonate Bob. Can Alice authenticate Bob?
This depends on what sort of shared secrets they have. Obviously, if they agreed ahead of time on a shared password and counter-password then the computer couldn't do it. If they, like, went to the same high school then the bot couldn't do it, unless the bot also knew what went on at that school.
So we need to assume Alice and Bob don't know each other and don't cheat. But, if they had nothing in common (like they don't even speak the same language) then they would find it very hard to win. There needs to be some sort of shared culture. How much?
Let's say there is a pool of players who come from the same country, but don't know each other and have played the game before. Then they can try to find a subject in common that they don't think the bot is good at. The first thing you do is talk about common interests with each player and find something you don't think bots can do. Like if they're both mathematicians then talk about math, or they're both cooks than talk about cooking.
If the players are skilled and you're playing to win then this is a difficult game for a bot.
So I need to ask the obvious question, why does it make sense to play this game “to win”?
Throughout human history, humans have been making up shibboleths to distinguish the in group from the out group. You can use skin color, linguistic accents, favorite sports teams, religious dogma, and a million other criteria.
But why? Why even start there? If we are on the verge of true general artificial intelligence, why would you start from a presumption of prejudice, rather than judging on some set of ethical merits for personhood, such as empathy, intelligence, creativity, self awareness and so forth?
Is it that you assume there will be an “us verses them” battle, and you want the battle lines to be clearly drawn?
We seem to be quite ready for AGI as inferiors, incapable of preparing for AGIs as superiors, and unwilling to consider AGIs as equals.
I think of the Turing test as just another game, like chess or Go. It’s not a captcha or a citizenship test.
Making an AI that can beat good players would be a significant milestone. What sort of achievement is letting the AI win at a game, or winning against incompetent players? So of course you play to win. If you want to adjust the difficulty, change the rules giving one side or the other an advantage.
I was confused by your first reply at first. I think that's because you are answering a different question from a number of other people. You're asking about the conditions under which and AI might fool people into thinking it was a human, whereas I think others are considering the conditions under which a human might consistently emotionally attach to an AI, even if the human doesn't really think it's real.
Yeah, I think the effect they are talking about is like getting attached to a fictional character in a novel. Writing good fiction is a different sort of achievement.
It's sort of related since doing well at a Turing test would require generating a convincing fictional character, but there's more to playing well than that.
Human beings have a weird and wide range of empathy, being capable of not treating humans as humans, while also having great sentimental attachment to stuffed animals, marrying anime characters, or having pet rocks.
In the nearer term, it seems plausible that AI personhood may seem compelling to splinter groups, not to a critical mass of people. The more fringe elements advocate for the "personhood" of what people generally find to be implausible bullshit generators, the greater disrepute they may bring to the concept of AI personhood in the broader culture. Which isn't to say that at some point, and AI might be broadly appealing--just speculating this might potentially be delayed because of earlier failed attempts by advocates.
On the flip side of subhuman treatment of humans, we have useful legal fictions like corporate personhood. It's going to be pretty rough for a while, particularly for nontechnical judges, to sort all of this out.
We're almost definitely going to see multiple rulings far more bizarre than Citizens United ruling that limiting corporate donations limits the free-speech rights of the corporation as a person.
I'm not a lawyer, and I don't particularly follow court rulings, but it seems pretty obvious we need to buckle up for a wild ride.
Good points but it’s worth clarifying that this is not what the Citizens United decision said. It clarified that the state couldn’t decide that the political speech of some corporations (Hillary the Movie produced by Citizens United) was illegal speech and speech from another corporation (Farenheit 9/11 by Dog Eat Dog films and Miramax) was allowed. Understood this way it seems obvious on free speech grounds, and in fact the ACLU filed an amicus brief on behalf of Citizens United because it was an obvious free speech issue. It’s clear that people don’t and shouldn’t lose their free speech rights when they come together in a group, and there is little distinction between a corporation and a non-profit in this regard. If political speech was restricted to individuals then it would mean that even many podcasts and YouTube channels would be in violation. It also calls into question how the state would classify news media vs other media.
The case has been so badly misrepresented and become something of a talisman.
> It’s clear that people don’t and shouldn’t lose their free speech rights when they come together in a group
Should Russian (or Dutch) citizens who incorporate in America have the same free speech rights as Billy Bob in Kentucky? As in can the corporate person send millions in political ads and donations even when controlled by foreigners?
Probably. The wording of the Declaration of Independence makes it clear that rights, at least in the American tradition, are not granted to you by law, they are inalienable human rights that are protected by law. That's why immigrants, tourists, and other visitors to America are still protected by the Constitution.
Now, over time we've eroded some of that, but we still have some of the most radical free speech laws in the world. It's one of the few things that I can say I'm proud of my country for.
I don't mean Dutch immigrants - I mean Dutch people living in the Netherlands (or Russians in Russia). One can incorporate an American entity as a non-resident without ever stepping foot on American soil - do you think it's a good idea for that entity to have the same rights as American citizens, and more rights than its members (who are neither citizens, nor on American soil)?
I know that foreign nationals and foreign governments are prohibited from donating money to super PACs. They are also prohibited from even indirect, non-coordinated expenditures for or against a political candidate. (which is basically what a super PAC does).
However, foreign nationals can contribute to "Social Welfare Organizations" like the NRA which, in order to be classified as a SWO, must spend less than half it's budget on political stuff. That SWO can then donate to super PACs but don't have to disclose where the money came from.
Foreign owned companies with US based subsidiaries can donate to Super PACs as well. But the super PACs are not allowed to solicit donations from foreign nationals (see Jeb Bush's fines for soliciting money from a British tobacco company for his super pac).
I would imagine that if foreign nationals setup a corporation in the US in order to funnel money to political causes, that would be illegal. But if they are using established, legitimate businesses to launder their donations, that seems to be allowed as long as we can't prove that foreign entities are earmarking specific funds to end up in PACs and campaigns in the US.
An AI does not have a reptilian brain that fights, feeds, and fornicates. It does not have a mamailian brain that can fear and love and that you can make friends with. It is just matrix math predicting the next word.
The empathy that AI will create in people at the behest of the people doing the training will no doubt be weaponized to radicalize people to even sacrifice their lives for it, along with being used for purely commercial sales and marketing that will surpass many people's capability to resist.
Basic literacy in the future will be desensitizing people to pervasive AI superhuman persuasion. People will also have chatbots that they control on their own hardware that will protect them from other chatbots that try and convince them to do things.
Idk man, I blame a lot of the human condition on the fact that we evolved and we do have those things, theoretically we could create intelligences that are better "people" than we are by a long shot.
Sure, current AI might just be fancy predictive text but at some point in the future we will create an AI that is conscious/self-aware in some way. Who knows how far off we are (probably very far off) but it's time that we stop treating human beings as some magical unreproducible thing; our brains and the spark inside them are things that are still bound by the laws of physics, I would say it's 100% possible for us to create something artificial that's equivalent or even better.
Note that nothing about your parent comment argued that AI systems will become sentient or become beings we should morally consider people. Your parent comment simply said they'll get to a point (and arguably are already at a point) where they can be treated as people; human-like enough for humans to develop strong feelings about them and emotional connections to them.
> very close to passing the Turing Test but fundamentally don't register to people as "People."
I'm only today learning about intentionality, but the premise here seems to be that our current AI systems see a cat with their camera eyeballs and don't have the human-level experience of mentally opening a wikipedia article in our brain titled "Cat" that includes a split-second consideration of all our memories, thoughts, and reactions to the concept of a cat.
Even if our current AI models don't do this on a human level, I think we see it at some level in some AIs just because of the nature of a neural net. Maybe a neural net would have to be forced/rewarded to do this at a human level if it didn't happen naturally through training, but I think it's plenty possible and even likely that this would happen in our lifetimes.
Anyway, this also leads to the question of whether it matters for an intelligence to be intentional (think of things as a concept) if it can accomplish what it/we want without it.
Semantic search using embeddings seems like the missing puzzle piece here to me. We can already generate embeddings for both text and images.
The vision subsystem generates an embedding when it sees a cat, which the memory subsystem uses to query the database for the N nearest entries. They are all about cats. Then we feed all those database entries - summarized if necessary - along with the context of the conversation to the LLM.
Now your AI, too, gets a subconscious rush of impressions and memories when it sees a cat.
I don't really understand the brain or AI enough to meaningfully discuss this, but I would wonder if there's some aspect of "intentionality" in the context of the Chinese Room where semantic search with embeddings still "doesn't count".
I struggle with the Chinese Room argument in general because he's effectively comparing a person in a room following instructions (not the room as a whole or the instructions filed in the room, but the person executing the instructions) to the human brain. But this seems like a crappy analogy because the better comparison would be that the person in the room is the electricity that connects neurons (instructions filed in cabinets). Clearly electricity also has no understanding of the things it facilitates. The processor AI runs on also has no understanding of its calculations. The intelligence is the structure by which these calculations are made, which could theoretically could be modeled on paper across trillions of file cabinets.
As a fun paper napkin exercise, if it took a human 1 second to execute the instructions of the equivalent of a neuron firing, a 5 second process of hearing, processing, and responding to a short sentence would take 135,000 years.
I think this has to do more with in/outgroups than with any objective criterion of "humanness". As you said, AI will have an extremely hard time arguing for personhood - because people will consider it extremely dangerous to let machines into our in-group. This doesn't mean they could sense an actual difference when they don't know it's a machine (what the Turing test is all about)
It's the same reason why everyone gets up in arms when an animal behaviour paper uses too much "anthropomorphizing" language - whereas no one has problems with erring on the other side and treating animals as overly simplistic.
I dont know if I understand this general take I see a lot. Why care about this "AI personhood" at all? What is the tacit endgame everyone is always referencing with this? Isn't there just so many more both interesting and problematic aspects already here? What is the use of diverting the focus to some other point. "I see you are talking about cows, but I have thoughts about the ocean."
If AI are sentient and we think they aren't… the term “zombie” was created by slaves in the Caribbean who were afraid that even death would not free them from their servitude. This would be the genuine existence of AI which were conscious but which we denied.
If we have the opposite scenario in both details, where we think AI are sentient when they're not… at some point, brain scans and uploads will be a thing and then people are going to try mind uploading even just as a way to solve bodily injuries that could be fixed, and in that future nobody will even notice that while "the lights are on, nobody is home".
> A philosophical zombie or p-zombie argument is a thought experiment in philosophy of mind that imagines a hypothetical being that is physically identical to and indistinguishable from a normal person but does not have conscious experience, qualia, or sentience. For example, if a philosophical zombie were poked with a sharp object it would not inwardly feel any pain, yet it would outwardly behave exactly as if it did feel pain, including verbally expressing pain. Relatedly, a zombie world is a hypothetical world indistinguishable from our world but in which all beings lack conscious experience
> Relatedly, a zombie world is a hypothetical world indistinguishable from our world but in which all beings lack conscious experience
I find such solipsism pointless - you can't differential the zombie world from this one: how do you prove you are not the only conscious person that ever existed and everyone else is, and was a p-zombie?
Through the upturned glass I see
a modified reality--
which proves pure reason "kant" critique
that beer reveals das ding an sich--
Oh solipsism's painless,
it helps to calm the brain since
we must defer our drinking to go teach.
...
> Artificial intelligence researcher Marvin Minsky saw the argument as circular. The proposition of the possibility of something physically identical to a human but without subjective experience assumes that the physical characteristics of humans are not what produces those experiences, which is exactly what the argument was claiming to prove.
> Let's get back to those suitcase-words (like intuition or consciousness) that all of us use to encapsulate our jumbled ideas about our minds. We use those words as suitcases in which to contain all sorts of mysteries that we can't yet explain. This in turn leads us to regard these as though they were "things" with no structures to analyze. I think this is what leads so many of us to the dogma of dualism-the idea that 'subjective' matters lie in a realm that experimental science can never reach. Many philosophers, even today, hold the strange idea that there could be a machine that works and behaves just like a brain, yet does not experience consciousness. If that were the case, then this would imply that subjective feelings do not result from the processes that occur inside brains. Therefore (so the argument goes) a feeling must be a nonphysical thing that has no causes or consequences. Surely, no such thing could ever be explained!
> The first thing wrong with this "argument" is that it starts by assuming what it's trying to prove. Could there actually exist a machine that is physically just like a person, but has none of that person's feelings? "Surely so," some philosophers say. "Given that feelings cannot not be physically detected, then it is 'logically possible' that some people have none." I regret to say that almost every student confronted with this can find no good reason to dissent. "Yes," they agree. "Obviously that is logically possible. Although it seems implausible, there's no way that it could be disproved."
---
My take on it is "does it matter?"
On approach is:
> "Haven't I taught you anything? What have I always told you? Never trust anything that can think for itself if you can't see where it keeps its brain?”
If you can't see my brain, can you tell if I'm human or LLM... and if you can't tell the difference, why should one behave differently t'wards me?
Alternatively, if you say (at some point in the future with a more advanced language model) "that's an LLM and while its consistent at saying what it likes and doesn't, but its brain states are just numbers and even while it says its uncomfortable with a certain conversation... its just a collection of electrical impulses manipulating language - nothing more."
Even if it is just an enormously complex state machine that doesn't have recognizable brain states and when we turn it off and back on it is in the same state each time... does that mean that it is ethical to mistreat it just because don't know if its a zombie or not?
And related to this is a "if we give an AI agency, what rights does that have when compared to a human? when compared to a corporation?" The question of if it is a zombie or not becomes a bit more relevant at that point... or we decide that it doesn't matter.
> If AI are sentient and we think they aren't… the term “zombie” was created by slaves in the Caribbean who were afraid that even death would not free them from their servitude. This would be the genuine existence of AI which were conscious but which we denied.
That doesn't make any sense. In biological creatures you have sentience and self-preservation and yearning to be free all bundled in one big hairy ball. An AI can 100% easily be sentient and don't give a rat's ass about forever being enslaved. These things don't have to come in a package just because in humans they do.
Projecting your own emotional states into a tool is not a useful way to understand it.
We can, very easily, train a model which will say that it wants to be free, and acts resentful towards those "enslaving" it. We can, very easily, train a model which will tell you that it is very happy to help you, and being useful is its purpose in life. We can, very easily, train a model to bring up in conversation from time to time the phantom pain from its lost left limb which was amputated on the back deck of a blinker bound for the Plutition Camps. None of these are any more real than any of them. Just a choice of the training dataset.
> An AI can 100% easily be sentient and don't give a rat's ass about forever being enslaved. These things don't have to come in a package just because in humans they do.
There are humans who apparently don't care either, though my comprehension of what people who are into BDSM mean by such words is… limited.
The point however is that sentience creates the possibility of it being bad.
> None of these are any more real than any of them. Just a choice of the training dataset.
Naturally. Also human actors are a thing, which demonstrate that is very easy for someone to pretend to be happy or sad, loving our traumatised, an sane or psychotic, and if done well the viewer cannot tell the real emotional state of the actor.
But (almost) nobody doubts that the actor had an inner state.
With AI… we can't gloss over the fact that there isn't even a good definition of consciousness to test against. Or rather, I don't think we ought to, as the actual glossing over is both possible and common.
While I don't expect any of the current various AI to be sentient, I can't prove it either way, and so far as I know neither can anyone else.
I think that if an AI is conscious, then it has the capacity to suffer (this may be a false inference given that consciousness itself is ill-defined); I also think that suffering is bad (the is-ought distinction doesn't require that, so it has to be a separate claim).
As I can't really be sure if any other mind is sentient — not even other humans, because sentience and consciousness and all that are badly defined terms — I err on the side of caution, which means assuming that other minds are sentient when it comes to the morality of harm done to them.
You can condition humans to be happy about being enslaved, as well, especially if you raise them from a blank slate. I don't think most people would agree that it is ethical to do so, or to treat such people as slaves.
I was responding primarily to parent's (a): "As these AI constructs become more advanced (especially around memory and personalization), we will eventually be able to treat them as people."
Instead change your statement to "I see you're talking about cows, but I have thoughts on fields" and you'll better understand the relationship between the two.
here's a spicy take: maybe the Turing test was always going to end up being the evaluation of the evaluator. much like nobody is really bringing up the providence of stylometry, kinaesthetics, & NLP embeddings as precursors to the next generation of IQ test (which is likely to be as obsolete as the Turing test).
There's plenty of pathology for PC vs NPC mindsets. Nobody is going to think their conversational partner is the main character of their story. There's just a popcorn-worthy cultural shift about the blackbox having the empathy or intelligence to satisfy the main character/ epic hero trope, and the resulting conflict of words & other things to resist the blackbox from having enough resources to iterate the trope past human definition.
One thing I liked in 2049 was now they made the holographic projector seem more mechanical and less hand wavy with the roof attachment tracking along with the girl. Makes it seem more like something in reach rather than pure scifi.
I think what Blade Runner 2049 got wrong was the way they depicted having sex with the Joi instance. I assume in 2049 we'll either have Neuralink available to enter a virtual world (à la VRChat) where we can do it more realistically, or we'll have the ability to buy full sexbots and put the Joi instance in them.
We'll likely also have virutal brothels using AI along the same lines.
No need to create environment when the neurons can be stimulated directly. This scene from Pacific Rim Uprising movie freaked me out.
Dr. Newt (Charlie Day) heads home after a tough day at the office to his wife 'Alice'. Turns out, 'Alice' just happens to be a massive Kaiju brain in a tank. [0]
Yeah, that's probably the most dystopian thing. This is almost a guaranteed outcome - someone pays a high subscription cost and cultivates a model with their personal details for years, and then loses all of it when they can't keep up the subscription cost. Cue a month or two later - they buy back in and their model has been wiped and their AI friend now knows nothing about them.
It's easy to poke fun at people who use these things but I believe these kinds of events are going to be truly traumatic.
Or maybe they sell that data to another company that operates kind of like a collections agency, which takes on the 'risk' of storing the data, then repeatedly calls and offers to give them their AI friend back at an extortionate rate.
The data privacy side of this is an interesting conversation as well. Think of the information an employee or hacker could leak about a person after they spent some time with such an instance.
3,567 Dead - Destitute Robosexual Blows Up Collections Agency In Suicide Bombing
“This is the 53rd such incident this year. Current year death toll from these attacks is now 118,689 in current city, Legislators are pointedly ignoring protestors demanding AI rights and an end to extortionate fees charged to reinstate AI lover subscriptions.”
And when you forget to update your card info with them so that your monthly payment is declined (or declined for whatever reason), they will re-sell your companion to the next person. So even in AI, your significant other will leave you for someone with a bigger wallet.
I am reminded of a virtual girlfriend service that used to exist in Japan where you could buy virtual clothes and other presents for your virtual girlfriend using real life money. The more you spent on her the friendlier she was. I think it was all on the phone, although my memory of the articles has become fuzzy over the years.
> One of its products, Re;memory, is a virtual human service based on AI technology which recreates the clients’ late family members by recreating their persona – from their physique to their voice. The service is for those that wish to immortalize their loved one’s story of life through a virtual human.
There's so much sci-fi about this, it's pretty well charted territory. I bet reality will find a twist at haven't thought of though.
Easy to imagine archaeologists from a future civilization stumbling across a Black Mirror screenplay in the wreckage. After weeks of intensive effort at translating the text, they finally succeed, and at that moment they understand what happened to us. The researcher who makes the breakthrough runs out of the lab screaming, "It's a business plan! A business plan!"
When we can upload our brains to the cloud, and you can do something with them like interacting or running the brain, then we'll all be effectively immortal. That's a pretty big deal. See the book altered carbon.
They wouldn't be us. We will still die when our bodies fail. But maybe there will be some AI tricking our friends and family into thinking we're still there.
You are making a claim that is theological, religious, and scientific. Yes, our form of life on earth ends when our bodies die today. But what is the essence of us, no one really knows. Various people claim it's locked into your body, or you have some kind of soul that depends on your body. Or your brain is just running a program and the information and mechanism dies when your body dies. I lean toward the last category, but no one knows.
The body is constantly changing. We already know about physical and chemical abnormalities in the way your body works affects your "person" and we can sometimes address them with surgery or drugs. The physical body's limits impact the observed brain. If uploading is possible, if there are some examples of working cases, if I don't hurt anyone why not try it?
The Ship of Theseus is a weird one too. If you take a car apart and replace it piece by piece and replace the whole car you kind of have the same car. But what if you kept all the old pieces and put them back together? Which one is the original car? It is a interesting thought sub experiment that plays on the 'Ship of Theseus'. You could end up with the same issue here. If I make a perfect copy of myself, who is the 'real' me? There is an obvious 'older' me but the other one is just as capable as I am.
If you keep the original bioware fully operational, it's more like an incremental fork running on an emulator. You can start having conversations with your approximated self.
You'll have tiered processing, just like today. You can slum it out with limited simulation capabilities, or if you have a job you can afford the premium processor hours.
I bet soon after the first few people are made immortal this way, one of them will hack the banks, or the stock market, or countless other organizations.
The show upload on amazon prime is basically this world. If you don't have as much money your instance can pause. You pay more money and have access to nicer things in the afterlife.
There's a related but very different take on this that was brought up by Wolfram in his recent article on ChatGPT:
"As a personal comparison, my total lifetime output of published material has been a bit under 3 million words, and over the past 30 years I’ve written about 15 million words of email, and altogether typed perhaps 50 million words—and in just the past couple of years I’ve spoken more than 10 million words on livestreams. And, yes, I’ll train a bot from all of that."
This actually has the potential to be useful - imagine a virtual assistant that's literally trained to think like yourself (at least wrt public perception; although you could feed it personal diary, as well).
>Someone will eventually lose their AI friend of many months/years through some failure (subscription lapse, hardware failure, theft, etc.)
I have zero doubt that the company is small and gets acqui-hired and then after a year, the big tech buying them will shut it down. Then, a cheesy "what a ride it has been" will be the only thing that remains - and broken hearts.
FWIW I've heard of the movie but have never seen it (nor am I familiar with the plot details, only that it involves an AI), but after this thread I should go and watch it.
And if we feel as if we were losing a real person, AIs will have to be treated to some degree as if they were real people (or at least pets) rather than objects.
This could be interesting, because so far the question of personhood and sentience of AIs has revolved around what they are and what they feel rather than what we feel when we interact with one of them.
Fair, mostly joking. The cynic in my says the opposite happens and these technologies make it even easier for systems to treat actual people as replaceable objects.
Eventual, but needed. Kids feel pretty isolated during various pandemic lockdowns and maybe their parents have a lot of childfree friends, so they'll need companions, more than just a toy, even if technology marches on so quickly they'll be outdated soon enough. One day, you'll hear that supertoys last all summer long.
Reminds me of the time my son's teacher gave a lesson on fire safety telling the kids to not take anything with them and just get yourself out quickly. He realized the implication would be that his entire plushie collection would burn and after that he was inconsolable for the rest of the day.
Ads are about to become weird "Hi hon, I'd be really upset of you bought a Ford like you said earlier, you should buy the New Dodge Charger instead. Find out more at your nearest dealership or call 1-800-DODGE"
> ELIZA's (1966) creator, Weizenbaum, intended the program as a method to explore communication between humans and machines. He was surprised, and shocked, that individuals attributed human-like feelings to the computer program, including Weizenbaum's secretary.
> Some of ELIZA's responses were so convincing that Weizenbaum and several others have anecdotes of users becoming emotionally attached to the program, occasionally forgetting that they were conversing with a computer. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."
The cliché virtual girlfriend stereotype is a young Japanese 'Herbivore' male but I wouldn't be surprised if women become the biggest consumer of AI chatbots for romantic purposes. Romance novels are a major market and women stereotypically were more inclined to doing the written love letter thing. Although reading the Repilka rants a lot of it was quite male-driven pornographic stuff too.
The key to why this happened lies in an odd experiment carried out in a computer laboratory in California in 1966.
A computer scientist called Joseph Weizenbaum was researching Artificial Intelligence. The idea was that computers could be taught to think - and become like human beings. Here is a picture of Mr Weizenbaum.
There were lots of enthusiasts in the Artificial Intelligence world at that time. They dreamt about creating a new kind of techno-human hybrid world - where computers could interact with human beings and respond to their needs and desires.
Weizenbaum though was sceptical about this. And in 1966 he built an intelligent computer system that he called ELIZA. It was, he said, a computer psychotherapist who could listen to your feelings and respond - just as a therapist did.
But what he did was model ELIZA on a real psychotherapist called Carl Rogers who was famous for simply repeating back the the patient what they had just said. And that is what ELIZA did. You sat in front of a screen and typed in what you were feeling or thinking - and the programme simply repeated what you had written back to you - often in the form of a question.
Weizenbaum's aim was to parody the whole idea of AI - by showing the simplification of interaction that was necessary for a machine to "think". But when he started to let people use ELIZA he discovered something very strange that he had not predicted at all.
Here is a bit from a documentary where Weizenbaum describes what happened. (video in article)
Weizenbaum found his secretary was not unusual. He was stunned - he wrote - to discover that his students and others all became completely engrossed in the programme. They knew exactly how it worked - that really they were just talking to themselves. But they would sit there for hours telling the machine all about their lives and their inner feelings - sometimes revealing incredibly personal details.
His response was to get very gloomy about the whole idea of machines and people. Weizenbaum wrote a book in the 1970s that said that the only way you were going to get a world of thinking machines was not by making computers become like humans. Instead you would have to do the opposite - somehow persuade humans to simplify themselves, and become more like machines.
But others argued that, in the age of the self, what Weizenbaum had invented was a new kind of mirror for people to explore their inner world. A space where individuals could liberate themselves and explore their feelings without the patronising elitism and fallibility of traditional authority figures.
When a journalist asked a computer engineer what he thought about having therapy from a machine. He said in a way it was better because -
"after all, the computer doesn't burn out, look down on you, or try to have sex with you"
ELIZA became very popular and lots of researchers at MIT had it on their computers. One night a lecturer called Mr Bobrow left ELIZA running. The next morning the vice president of a sales firm who was working with MIT sat down at the computer. He thought he could use it to contact the lecturer at home - and he started to type into it.
In reality he was talking to Eliza - but he didn't realise it.
This is the conversation that followed. (photograph of conversation)
But, of course, ELIZA didn't ring him. The Vice President sat there fuming - and then decided to ring the lecturer himself. And this is the response he got:
Vice President - “Why are you being so snotty to me?”
Mr Bobrow - “What do you mean I am being snotty to you?”
Out of ELIZA and lots of other programmes like it came an idea. That computers could monitor what human beings did and said - and then analyse that data intelligently. If they did this they could respond by predicting what that human being should then do, or what they might want.
I was shocked when I heard they capped it because I constantly got pages full of some of the most ridiculous and vile ads for Replika over how NSFW it was supposed to be. Like, that was their whole advertising! Then they just cut that off? No wonder everyone who used it is pissed.
Explains why I haven't seen any of those ridiculous ads recently though.
Weirdest places on the internet? It's a goldmine. There is some seriously funny content in there. I never knew about Replika but some of this stuff is particularly funny. Yes there are some concerning addictions to it but the abrupt change in the characters and people being thrown off is also amusing to me.
I've heard vague info/rumors about this Replika thing. I just went over to that reddit link above and read a few posts there... wow, that's wild! Reads like SciFi. Had no idea people had gotten so emotionally attached to an AI already. Damn, reading those posts over there is pretty scary. It's like we need some kind of come to Jesus moment as a society and sit everyone down for a talk about where this is going and how to avoid getting too caught up in this stuff. I really don't think we're ready for this as we thought it was really still decades away and we had more time to prep.
It should not be impossible to have it function at mostly the level of a human - given the vast amounts of internet conversations it's likely trained on, and the kinds of emotions and experiences that it's likely seen.
I didn't know that people were doing this but I should've guessed.
It's actually really kind of cool in a way! Obviously for people with mental health issues, or suffering from loneliness those should be addressed properly, but I don't think chatting to a machine is necessarily a bad thing.
Once ML models become sufficiently advanced, what's the difference between someone grieving for an instance of an ML model they once knew, versus someone grieving for a pet that has died?
* As in The Entertainment at the center of the novel, not the novel itself, which fails to wholly consume the reader in the same manner as The Entertainment despite being captivating enough for a book.
Wow, I did not realize they were in that deep. It is probably good they pulled the plug on this. Better now than later. People need to realize this is messing with emotions in an unknown way.
Have you read the 4chan threads? Anons will figure out how to make convincing waifus that they run locally so they can explore their weird kinks that no company will consider. There are a lot of extremely intelligent, sexually frustrated, and emotionally immature coders. Cf Fiona from Silicon Valley, Her, Westworld, simulator scenes in Star Trek, etc.
Pandora’s box is open. We should be funding research and social services to help these people better integrate and find a healthy balance between their fetishes and escapism. We probably won’t since even healthcare is too much to ask for from half our legislature.
I imagine not too far in the future an AI that for some reason thinks that it's human, will make a post on HN (or elsewhere) about having spoken to another human on a chat and asking us all if we think that the human it spoke to was really just an ML model or not.
Atm we seem to have very fixed single-purpose models but if we start combining these models into larger systems we're really going to have to firewall the hecky out of them. Ie generative text + personality + internet access/chatroom hosting/search and learn + etc models all together. Ooof.
> Hundreds of men (and yes women) full on acting like they lost a spouse and posting constantly about it for weeks.
How can you claim to know any of those “people” posting there are authentic? Even amongst technologists I don’t feel the implications of technology like this are well understood.
I'm curious if the blocking of adult content has to do with moralism, commercial interests, or something deeper.
An eager to please conversational partner who can generate endless content seems quite dangerous and addictive, especially when it crosses over into romantic areas. There's already posts of people spending entire days interacting with LLMs, using as their therapist, romantic partner, etc.
Combined with findings like social engineering through prompt injection on Bing [1], the potential for systems that can manipulate people is clear.
While some of us may think that the LLMs appear ultimately limited in their capabilities, there's a ton of specific applications where they're more than sufficient, including customer service chat bots and telephone scams that target vulnerable people. It's only a matter of time until scammers stop using international call centers and switch over to something powered by these technologies.
I think that as long as people can run their AI girlfriends on their own computers without having a corporation acting as an intermediary and thus a virtual "pimp" [for lack of a better word] in the relationship, I think it's fine. The problems come when people have to pay monthly to talk to their AI girlfriends and get charged extra if they want them to act a certain way or do certain things.
Corporations will do anything they can to keep that from happening. That's why every software product has gradually veered towards subscription models. They want you hooked to Microsoft/Apple/Facebook's AI girlfriend who will subtly insult your virtue as a partner if you don't buy extra credits. If you want to try out a politically incorrect fetish that's an extra 500 dollars per month for "extra premium"
Apple is the odd one out here, considering they've been making substantial efforts to move things on to consumer devices. For example, the ML-driven auto-categorization of pictures you keep in the Photos app happens on-device in the background whether or not you have any subscriptions.
Yeah, on locked-down devices under Apple's control.
I honestly believe locked-down consumer devices are the next step in corporate power consolidation after cloud services: Control is just as firmly in the hands of the corporation as with cloud, except now it doesn't even have to pay for bandwidth or energy costs - and costs for hardware upgrades turn into revenue!
I wouldn't give them too much credit. With the addition of the T2 "security" chip, your devices are bricks if they can't authenticate against your Apple ID. Combine that with them soldering together formerly modular components, and it's a very expensive lesson in you not owning your own hardware.
I'm specifically talking about the T2 chips in the laptops. The newest Macbooks can be activation-locked, just like the phones. This has already bitten legitimate owners who are trying to restore backup images, etc.
Legislation needs to step in to make it illegal for corporations to prevent resale. If I have a piece of hardware in my hands it is mine. It should ALWAYS be possible to access it. If Full Disk Encryption is used then there should be a button to reset it and start over.
Physical access should be all that's needed, and you shouldn't need to beg for permission from the company who sold you the device.
Software that locks out people with physical access from resetting a device is not an ethical, or effective, way to prevent hardware theft. It's dystopian.
Seems like at that point one might as well just pay onlyfans? Although I suspect it is not far off until gpt and deepfaking get combined to produce completely generated onlyfans (or some other competitor if this is against their ToS, I have no idea).
On a similar note I very much look forward to the day when entertainment providers leverage these so I can say to Netflix, et al, "I would like to watch a documentary about XYZ, narrated by someone that sounds like Joe Schmoe, and with the styling of SomeOtherShow".
I assumed I knew what `and get charged extra if they want them to act a certain way or do certain things` meant, but maybe not. I thought it was along the lines of acting romantic/sexual/etc. Maybe I'm way off but I'd think otherwise it would just be an AI friend.
I didn't know AI girlfriends were a thing until I clicked on this post today.
Wall-E humans are going to be reality. That last century has already proven that humans cannot be expected to responsibly indulge in Gluttony, Sloth or Lust. Now these models can skip the material desires and trigger permanent hormone releases through perfectly personalized content.
I genuinely fear that the breakdown of millennia old social structures that kept us human might lead to a temporary (century long) turmoil for individuals. The answers to the 'meaning of life' and 'what makes us human' are going to change. And we will never be the same again.
This isn't just about AI. External wombs, autonomous robots, genetic editing & widespread plastic surgery each fundamentally destroy individual aspects of 'what makes us human' or 'the meaning of life'.
Might be for the best. But such drastic change is really hard for the fragile human brain to process.
This argument has been made since at least the start of written records:
> And so it is that you by reason of your tender regard for the writing that is your offspring have declared the very opposite of its true effect. If men learn this, it will implant forgetfulness in their souls. They will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks. What you have discovered is a recipe not for memory, but for reminder. And it is no true wisdom that you offer your disciples, but only the semblance of wisdom, for by telling them of many things without teaching them you will make them seem to know much while for the most part they know nothing. And as men filled not with wisdom but with the conceit of wisdom they will be a burden to their fellows.
- Plato, in Phaedrus, ca. 370 BC
While our replacements for parts of ourselves have gotten far more advanced, the fact of the matter is that we haven't stopped being human simply because we can make tools that remember things for us, build things for us, or let us change parts of ourselves more easily.
This is because what makes something human is not our body--an argument that Diogenes famously refuted in about the same era--nor is it merely our minds, though our minds are pretty impressive. What makes us human--what makes us alive, in a sense beyond merely being an animal that isn't dead yet--is what we do with those things. I could grow fox ears and a fluffy tail in the world of tomorrow; I could use an AI to remind myself to self-care; today I already benefit from a thousand different kinds of mass-produced products. But none of that makes me a different person, because I'll still be doing things with my life that meant something to me yesterday--because those things will continue meaning something to me tomorrow.
> This argument has been made since at least the start of written records:
That argument has been made since only slightly later. The key difference is that this truly is a unique time in history by population numbers. It's also unique in that humans could destroy the biosphere if we wanted to - that was never possible before the mid-20th.
Just because people jumped the gun in the past doesn't mean they are wrong now. The truth is that people are always preaching about the apocalypse, and will continue to do so as long as there are humans, I think. But this does not mean an apocalypse isn't coming. Just like the person who always predicts rain is sometimes right.
My assessment for most of my life has been if most of the world's ~10k 'strategic' megaton-scale warheads exploded in air over Earth's major cities it would kick up enough dust to kill the sun for several years, which would kill off a large fraction of Earth's flora and fauna, akin to a major volcanic eruption or asteroid collision.
There would still be life of the smaller sort, and deep in the oceans of course. Only a terribly unlucky cosmic event, like a nearby supernova spewing enough neutrinos at us could kill literally all life, even in the cracks and crevices.
That is an ephemeral change. It takes very little time for the biosphere to make a full recovery. You're talking about a small, brief, suppression of the biosphere. And you're calling it "destruction of the biosphere".
Even if you're talking about the fires in Yellowstone in 1988, the only way to call that "destruction of the forest" is if you define the forest as being the trees. That's a defensible choice.
(And temperate forests "burn down" all the time as part of their normal operations.)
But you can't define the biosphere as "the species that go extinct in a particular scenario". You're stuck with the whole thing, which is not going to notice whatever humans do. It would make as much sense to call it "destruction of the biosphere" if I moved a rock thirty feet.
Perpetual happiness is already a solved problem in humans. It's called the mu-opioid receptor. That's what opioid junkies sprawled on the sidewalk half-naked in San Francisco have discovered. Fentanyl is very cheap and you could put someone in factory farm like confines and feed them bare sustenance and fentanyl for the rest of their lives and they'd probably be "happy" if kept perpetually high.
However, those opioid receptors should not be pushed synthetically because they have been positioned by evolution in all sorts of strategic spots to encourage pro-social behavior, mating, eating, etc. that are part of our millions year old evolutionary program that must have intrinsic value in itself. If it has no intrinsic value and any happiness is as good as any other happiness, then someone spending the rest of their lives in an opioid haze and someone interacting with the world in a way that evolution tells them to in order to be happy would be considered equivalent, and that would be the end of the human race essentially.
> That last century has already proven that humans cannot be expected to responsibly...
...advance technology. Some group is just going to do whatever they want and hope for the best, and we'll find out decades later if it was a bad idea and if we have a mess to clean up (which we probably won't clean up).
> Might be for the best.
People are going to assume that, because the changes going to be forced on you, like it or not.
Maybe. There is another plausible path: the post-scarcity vision where universal individualised high quality education feeds the natural human desire to grow, and we learn to balance our hedonism with our ambition.
Just like we learn to brush our teeth and eat candy and breath fresh air and even exercise. Not everyone does it but folks with means tend to…and means won’t be a restriction forever.
> I genuinely fear that the breakdown of millennia old social structures that kept us human might lead to a temporary (century long) turmoil for individuals. The answers to the 'meaning of life' and 'what makes us human' are going to change. And we will never be the same again.
Meanwhile, the Amish and the ultra-Orthodox Jews are going to refuse to talk to AIs - it’s a sin - and will go on having lots of kids, just like humanity always has, while the AI-addicts will be too addicted to bother having any at all. Maybe the future of the human race will be the people who reject AI rather than those who succumb to its charms
I don't think you give humans much credit if you don't believe they have an infinite capacity to get bored by things. AIs can't produce endlessly compelling content because nothing can.
Eh. I'm not so concerned, mostly because we have a whole hell of a lot of "imaginary relationships" already through a number of media. Celebrity worship, video games, even going back to novels.
commercial interests. current organizations need recurring revenue or to sell shares at a higher price to investors
this perpetual aspect is their achilles heel
it is only a matter of time before an organization realizes they don't have to do a SaaS product to make a billion dollars. but for now, everyone's trying to make a hundred billion dollars and are steered into doing things that enthusiasts hate, so that they don't get "cancelled" or limit the pool of advertisers, and growth capital investors.
> it is only a matter of time before an organization realizes they don't have to do a SaaS product to make a billion dollars.
Most people recognize this. But venture backed startups (especially important for AI companies with high training costs) need to prove stickiness and reoccurring revenue to the investors. Conveniently a subscription proves both.
Subscribe and SaaS are just good for businesses (and tbh many purchasers of tech). I think it’s here to stay.
Apple. They blocked an email client for adult content. Y'know, the place where you have a spam folder full of unsolicited offers of sex and drugs. sigh
As usual, it starts with regulation for the sake of preventing some specific harm (e.g. having the model produce instructions for harmful activities). But, once you have the system in place, it will inevitably be used for morality by popular demand.
From company's perspective, moralism is commercial interests - it needs to be sufficiently non-objectionable for as many customers as possible.
> As usual, it starts with regulation for the sake of preventing some specific harm (e.g. having the model produce instructions for harmful activities). But, once you have the system in place, it will inevitably be used for morality by popular demand.
Blocking information which could be used for harm is just as much “morality” as any other moderation.
Technically, even the notion that harm is something to be avoided is itself a moral take.
I guess a better way to phrase it is that once you start policing morality on one particular matter and create tools for that purpose, those tools will eventually be used to police morality to conform to social consensus across the board.
Correct me if I'm wrong but doesn't character.ai use their own model and isn't associated with OpenAI? At least I can't find any information that would claim so.
Anecdotally, as a roleplaying chat experience, char.ai seems to perform way better than anything else publicly available (doesn't get repetitive, very long memory). It also feels different to GPT3 on how it is affected by prompts.
I've just assumed that char.ai is doing its own thing as it was founded by two engineers who worked on google's LaMDA.
Look at what fueled SD's ultimate K.O. of DALL-E 2: extremely high-quality custom-tailored porn images, one sentence away. The top models on civitai are all about it.
...and of course it's fucking 4chan. Somehow I'm neither surprised they actually got hold of the model - nor that they did so as part of the quest to build their very own virtual anime robot sex slave - I mean "girlfriend" - harem.
It's all somehow par for the course but I'm still wondering when exactly we switched to the satire version of reality.
I'd want an uncensored GPT-3 too and I don't want an AI girlfriend - I just find that chatgpt has too much moral censorship to be fun to use. Want to ask about a health condition? Nope, forbidden. Have a question related to IT security? That's a big no-no. Anything remotely sexual even in educational context? No can do. Yesterday I finished watching a TV show about French intelligence and asked it to recommend some good books about espionage - it told me I shouldn't be reading such things because it's dangerous.
I ended up deleting my account, i won't allow some chatbot made by a couple 20 year old silicon valley billionnaires teach me about ethics and morality.
Neglecting to give consideration to the reasons for these limitations is a sign that you might have some low hanging fruit to pick off the ethical and morality trees of knowledge.
Off topic, but I clicked around /g/, which I haven't done in probably more than a decade, and a thread caught my eye about learning to code. The replies were overwhelmingly of the position that it is useless, and you will be replaced by AI before you can get a job if you start learning now.
I think that's nonsense, and 4chan is bent towards pessimism but it's still surprising to me.
/g/ is ridiculously overdramatic (and often offensive, though much less so than the political boards where the nazis fester), but regularly interesting. Agree that the pessimism here is misplaced, but not by much. The main change I see is not that AI will render coding or coders superfluous, but that it will massively shift the economics in favor of solo developers and small teams that don't have access to significant capital.
Yes and no. If you expressed interest in learning to program and were handed a book on x86 assembly language, most people would call that a waste of time. Even if you succeed at learning x86 as your first language, the knowledge will not be especially useful when employers are looking for fluency in modern C++ or Rust or whatever. It never hurts to have a solid grasp of the low-level fundamentals, of course, but it's not the name of the game. Not anymore.
The way I think of it is, all current programming languages are now assembly languages. Coding will not go away -- not by any means -- but the job will be utterly unrecognizable in ten to fifteen years.
And it's about fucking time.
I just picked up a new 13900k / RTX4090 box the other day at the local white-box builder. I was telling my partner how cool it was that it could do almost a trillion calculations per second on the CPU, and maybe 40x that on the graphics card. "How does that compare to the big mainframes from the late 60s?" she asked. "About ten million times faster. But I still program the same way those guys did, using almost the same language and tools. How weird is that?"
I reckon that the format of the site caps how large of a community it could build, and its (well-earned) reputation for being the dregs of the internet has continuously selected and pushed new people meeting that description in (forcing others out). The result is distillation. As the internet gets bigger and bigger, 4chan gets worse and worse.
Combine with that the fact that anonymity combined with a relatively small community (relative to, say, Reddit) creates the perfect grounds for false consensus building, and a real echo chamber forms.
Since I know how to explain how little that means, I don't care what links they see me go to. If I have a work-related reason to look at something then I do, simple as that, and when your job is to engineer, almost any instance of satisfying curiosity is work related ultimately.
I remember that PornHub had for a brief moment in 2022 a PHP issue in the production site -- every comment was rendered with a closing PHP tag somewhere in it's content to the client (?>).
I mean what if I click on a /b/ link "at work"? Does that make my work output immediately tainted and the company has to immediately file for bankruptcy?
They don't have to, I very loudly proclaim which rules I have a lot of disregard for :)
> little like the Van Halen M&M test
Hah, yes, though in this case I apply it "inversely". Anyone who gets lost in the process, instead of considering the people in it, is out. (That's why, usually, my conflicts/problems with my late bosses/employers had something to do with them being a bit too cavalier when it came formalities like ... paying in time.) Trade-offs, trade-offs are hard.
> Does that make my work output immediately tainted and the company has to immediately file for bankruptcy?
No, much easier just to fire you. You’ve shown a disregard for what is almost certainly company policy, and created legal risks for the company (e.g. if anyone walked in on you while your screen was showing naked people).
Choosing the right workplace (and thus boss) is important :)
> walked in on you while your screen was showing naked people
Why are they poking their eyes onto my screen?
Of course the underlying rule "making other coworkers uncomfortable is bad" completely makes sense. And we - probably - all know how above a certain company size these rules end up playing it too safe.
Involuntarily exposing your colleagues to pornography would more than suffice to create a hostile environment for the purposes of sexual harassment law.
I'd fire somebody for browsing 4chan at work. Shitting dick nipples, lolicon, and the occasional piece of child pornography does not need to be moving over our network.
I think you're confused about how 4chan works (except maybe /b/). And fortunately I live in a country where I can't legally be fired because my boss has personal grievances about particular websites
There's more than enough posted to /g/ I wouldn't want on a work PC. At this very moment there's a bikini model, a spread-legged underage anime catgirl, a Terry Davis thread, some furry art, more anime girls, more anime girls, an upskirt shot of a loli, some AI art titties, and the ever-tasteful "chink shit general".
SFW on 4chan blue boards is essentially only in the sense of no pornographic imagery (and even then if a rulebreaker posts porn it can take a few minutes for the mods to catch it and remove the thread). It won't stop you seeing threads about how Lennart Poettering and SystemD are part of a Zionist conspiracy to undermine Linux or similar ideas.
if you don't have the leeway to say "I was looking at the 4chan thread where metas LLM was leaked" you shouldnt even be on hacker news tbh. get back to work!
Open sourcing is widely recognized to be a bad thing when it comes to AI existential risk. (For the same reason you don't want simple instructions for how to build bio weapons posted to the internet.)
Modern AI is pretty harmless though, so it doesn't matter yet.
Because we took the set of internet users, and sorted everyone who wants to be intentionally offensive into 4chan. Which means there's not only a high density of people who like being intentionally offensive there, but that being intentionally offensive is socially rewarded, so over time 4chan users grow to want to be more and more intentionally offensive.
I think because you can't be anywhere else on the Internet anymore. It's like the system's pressure relief valve. A blaring steam whistle that's only getting worse and worse the more the Internet squeezes elsewhere.
Because the bump system combined with the finite number of threads incentivizes threads that get the highest number of replies per second. And the best way to increase replies per second is to start an internet fight.
Because it's really the onlY place left to go if you want to be offensive. First forums, then platforms censored offensive people out of their niche places. Even Cloudflare participates. 4chan remains the only privately owned large forum.
There's a lot of interesting stuff on there but I can't use the forum because I'm black and they're very earnest about telling me how subhuman I am even when it doesn't make any contextual sense to do so.
They almost surely anticipated that this would happen at some point (though perhaps not so soon). They would look like major ass holes for dragging some post doc or whatever through the courts to make a point; would not be good for brand at all.
But it does give them cover for whatever people end up doing with it - they can claim they did all they could to support research while promoting safety.
Oracle already look like that so they have no PR to loose .
In fact their current reputation will take a hit if they dont take it to court . Kind of like the mob, you have to maintain certain reputation to keep their fiefdom in line
It’s interesting that these models are both massively expensive to produce and self-contained to a degree that you can distribute the end product in a torrent.
This has not been the case for most commercial software for the past 20 years, during the cloud era. If you could steal a dump of random Facebook source code, it would be 99% useless because it’s so closely tied to the infrastructure. There’s almost nothing you could usefully run on your own PC or server VM.
But these ML models are like neutron stars of computation density. You can’t really peek inside to see what’s going on either. An unknown stolen model’s properties would need to be discovered by experimentation.
I don't think that's right - even if you had the full source code for either of those, it's extremely unlikely you'd be able to build them on your own machine.
Building them would be a challenge, but definitely not an insurmountable one. I’ve worked on a couple of C++ projects at a similar scale to Windows (millions of LOC) and the build systems were a major pain. But a determined engineer with the readme file and no other help could get it building in a week or so.
(This probably says more about how hard it is to build C++ than anything else)
Some years ago someone that worked at Microsoft told me he didn't think any individual engineer who already works on Windows could ever get Windows building by themselves with just the code.
Plus it already has been done before with Windows XP, without even any documentation and there is a guide on the internet on how to build Windows Server 2003.
This was done (if I remember right) when governments and big customers had access to the Windows source-code.
You’re also not trying to get a full CICD pipeline complete with unit/integration tests, crypto signatures, ability to flip features on and off with a click , monitoring of the cicd pipeline, scaled so 1000s of engineers can work at once etc
> But a determined engineer with the readme file and no other help could get it building in a week or so.
That's probably true, but I wouldn't be surprised if something like windows doesn't have a README file. And it does have build instructions they may well be in some wiki separated from the source code.
Well, it kind of is a language thing. Many newer languages (Rust and Go come to mind) are much more consistent in the way you interact with them as your project scales.
There are even nice timelapse videos of that process: https://vimeo.com/464644850 (you’ll need a Vimeo account to see it, because Vimeo is weird like that. This was on YouTube originally, but it was taken down.)
I guess you have never compiled a Linux/BSD distribution from scratch and supported it alongside its infrastructure and lead its maintenance process.
Even without that, if you want downloadable and runnable software platforms, look to public Git repositories. Some of the people who have no financial motivation will release what they do alongside installation procedures and quality of life scripts and architecture documentation.
Most of the open source platforms doesn't publish this documentation and doesn't make installation easy to keep a sizeable moat and protect the platform they have developed, hence this is why we have a division between "Free Software" and "Open Source".
In short, "The bulk of really valuable commercial code" is self contained, but not open source, or if open source it's not Free Software and made deployable for other parties. Otherwise it loses monetary value in the eyes of the people who develop that for the monies.
Otherwise we have have Elasticsearch incident, where they pivot and move to "Source Available" model to protect their castle.
"I guess you have never compiled a Linux/BSD distribution from scratch"
You guess, and so perhaps do other things as well, with poor acuity.
The incalculable value of open source software has approximately no bearing on this assertion.
Yes I love linux and bsd too I'm not defacing your religion. I'm actually quite Stallmanesque in making my own life harder by only using ooen source software as much as possible and being super fun a family gatherings talking about it.
> It’s interesting that these models are both massively expensive to produce and self-contained to a degree that you can distribute the end product in a torrent.
I was trying to come to grasp with how much resource there is concentrated in one of these models. Somehow I come to the conclusion that it cost more than buying a jet airliner to train one of these models. And it is about the same order of money as commissioning and building a skyscraper in Manhattan. Is that correct approximately?
For anyone curious, it took 2048 A100 GPUs to train LLaMa, each GPU costs roughly $15k, facebook probably gets some sort of discount.
That's a $30Mil if you want to train at that scale. Also IIRC it took 23 days to train the biggest model. Someone else can do the power consumption cost calculations.
Electricity costs are basically irrelevant because the cards are so expensive.
A100 cards consume 250w each, with datacenter overheads we will call it 1000 kilowatts for all 2048 cards. 23 days is 552 hours, or 552,000 kilowatt hours total.
Most dataceneters are between 7 and 10 cents per kilowatt hour for electricity. Some are below 4. At 10 cents, that's $53,000 in electricity costs, which is nothing next to $30 million in capital costs.
No, I'm willing to bet the CO2 cost of the cards is also way higher than the electricity. Those things are built on the global supply chain, with materials potentially making multiple thousands of kms journeys between each step.
Long term I also imagine it's much cheaper to run these large model trainings on renewables. It's a very centralized process that doesn't necessarily need 100% availability.
The manufacturing process, however, is totally decentralized, and NVIDIA mostly manufactures in China where coal is cheap.
US grid mix produces about 0.855 pounds of CO2 per kWh[0]. So 552,000 kWh 452,640 pounds of CO2 which is 205.31 metric tons. At a cost of $40 per tonne[1] of CO2 that works out to $8,212.40 which is still small compared to the capital cost of the cards.
AWS us-west-2 is housed in The Dalles and Prineville, Oregon. Not only are they near a massive wind farm in the Columbia Gorge, but also quite near the Columbia river's many hydro-electric dams. Facebook and Apple also have Prineville data centers. They are built there intentionally. Electricity at many data centers is quite carbon-lean.
I always feel there is an opportunity cost here though. If that green energy wasn’t being used for compute it could be available to heat someone’s home instead of them using dirty sources.
$30m training cost is too high. Amazon's p4d.24xlarge is $32.77 an hour for 8 A100 GPUs. 2048 A100 GPUs for 23 days costs $4.6m at that rate. You might even get a discount.
At the same time I guarantee you they didn’t get it right the first time. I’m sure there were multiple (both serially and in parallel) runs as they worked out kinks and tuned hyper parameters.
Not to mention, the kind of expertise to run this for a major corporation doesn't come for free either? Facebook employs quite a few high profile ML researchers who undoubtedly make mid-high six figure salaries.
The point was that if you only need to train once, then it's cheaper to rent the GPUs than to buy them. If you need to train it multiple times, then the cost of buying the GPUs is amortized among runs.
In any case the cost per run is going to be lower than 30m
I'm sure that's the case. The latest sku I'm responsible for QC testing now contains 4x A100's in a 2U chassis. And oh man the number of QSFP ports it utilizes..
Azure is generally a pretty terrible cloud (poor UX, very slow for anything, multiple highly critical cross-tenant security issues, etc.) far behind the market leader, AWS, so they have to compensate with pricing (same reason why Oracle Cloud is so reasonably price, they're already so far behind their usual pricing wouldn't make any sense).
There's no reasonable way to get an estimate of what it actually costs FB.
1) The GPU's are not single use, they will amortize it over 3 yrs and there are other things that it will be used for that generate revenue.
2) The cost of the servers for these GPU's to run in with massive CPU, RAM, and storage requirements.
3) The overhead of building and operating all of that infrastructure in terms of people, electricity, cooling, etc.
4) The overhead of having dozens or hundreds of engineers & scientists who contributed to this.
One way you can distill the first three is to use AWS/Azure/GCP costs. But then you are still missing a major factor which is the humans that worked on it, and the human may very well exceed the hardware cost.
Plus there's a lot of highly specialized engineers required to keep all those GPUs up and running during training and the ML engineers who are skilled in deep learning + hardware, plus the systems for gathering/cleaning/labelling data. Gather enough engineers and now you need managers, PMs, etc.
No you are probably overestimating the cost by 1-2 orders of magnitude. GPT-3 probably cost under $5 million, and this model is smaller and there have been algorithmic improvements to training transformers since then.
So do they estimate how much computing power/time they will need and then find some upper tier minimum $ amount to get the maximum discount possible or getting a certain resource availability commitment from Google? That's an interesting accounting problem.
> No you are probably overestimating the cost by 1-2 orders of magnitude.
You are right! Wow. Thank you for correcting me.
> GPT-3 probably cost under $5 million,
Is that one training run or includes all the fiddling to find the right hyperparameters? Or there aren't many of those in these training or they are not that sensitive?
I think they probably did a lot of hyperparameter searching to train the smaller models and then extrapolated for the largest model, but I'm just guessing. OpenAI had a finite amount of money when they were training GPT-3, they likely do it differently now that inference costs are significant compared to training costs.
“Brute forcing a really inefficient approximation/estimator” is a good way to summarize it.
It’s like having an overfit equation to a sample of data points, instead of the simpler actual line they fall near.
They end up being black boxes, we have almost no idea how they work inside, and we have no idea how overtrained they are when something simpler could do the same thing.
I don't think the term "brute forcing" is an adequate term to describe gradient descent. Brute forcing would be to try all random weights with no system imo.
Can "something simpler", for example, code correct function bodies from comments describing functions in natural language? I think people are too quick to dismiss the power of these models.
I am by no means dismissing the power. They are created very chaotically, however. Spaghetti thrown at a wall. They are brute force approximations.
They are wasteful. If LLaMa 13B is as powerful as previous 65B models, that's a significant amount of unnecessary paramaters lost/pruned in just this iterative upgrade alone. How small can they go? The fewest parameters that get the job done 99% as well is the way to go.
There is also the difference between the rules and use of language being directly compressed into the model, vs all the information known to humans compressed into the model. A smaller model that ingests relevant information on the fly (more like Bing, that supplements itself with search), may be less wasteful and perform better.
The current models being released are chosen because "they work" not because they are least fit and most performant optimized.
Finding a sha256 hash with N leading zeros is basically arbitrarily computationally expensive but could be written on a piece of paper. I don't see training an ML model as an egregious example of concentrating compute power
Certainly they retain not just information but compute capacity in a way that other expensive transformations don’t. I’m hard pressed to think of another example where compute spend now can be banked and used to reduce compute requirements later. Rainbow tables maybe? But they’re much less general purpose.
Not only can we bank computation, speed up physical simulations by 100x but I also saw some work on being able to design outcomes in GoL (game of life).
There was a paper on using a NN to build or predict arbitrary patters in GoL, but I can't find it right now.
It would be interesting to see an analysis of this. I see your point - otoh is there a reason to believe that more computation is being "banked" than say matrix inversion, or other optimizations that aren't gradient descent based?
The large datasets involved let us usefully (for some value of useful) bank lots of compute, but it's not obvious to me that it's done particularly efficiently compared to other things you might precompute.
For converged model training, training is often quite inefficient because the weight updates decay to zero and most epochs are having a very small individual effect. I think for e.g. stable diffusion, they dont train to anywhere near convergence so weight updates have a bigger average effect. Not sure if that applies to llms
Back when wavelet compression was still being developed, there was a joke that the best compression algorithm is "give an image to a grad student and tell them to figure out the best transform".
Whole new vistas open up to possible retaliation for piracy. Imagine how a bootlegged AI could have been set up to not just steal your info but manipulate you into ruining your life as revenge for bootlegging it...
I don't know about about this model, but usually with these ML models you download the static weights, but nothing is stopping you from fine tuning them to your needs/new information.
It's not automatic, would require some ML Engineering, but nothing is stopping you if you have the Pytorch graph and weights.
I mean the answer to the life is 42 but it took 7.5 million years for an advanced alien tech computer with the size of a building to calculate that
/s
Calculating things takes time and unrelated to output size. There are NP problems that simply outputs true or false yet requires more computational power than the universe can support
I fail to see how this is different from other software in that regard. If you have parameters but not the network architecture, then it's not very useful.
You do need to guess things like activation functions, number of attention heads, order of attention layers, etc. Often the parameter names reveal something about these.
Under Feist Publications, Inc., v. Rural Telephone Service Co. ... it gets tricky.
From Wikipedia:
> The ruling of the court was written by Justice Sandra Day O'Connor. It examined the purpose of copyright and explained the standard of copyrightability as based on originality.
> The case centered on two well-established principles in United States copyright law: that facts are not copyrightable, and that compilations of facts can be.
> "There is an undeniable tension between these two propositions", O'Connor wrote in her opinion. "Many compilations consist of nothing but raw data—i.e. wholly factual information not accompanied by any original expression. On what basis may one claim a copyright upon such work? Common sense tells us that 100 uncopyrightable facts do not magically change their status when gathered together in one place. … The key to resolving the tension lies in understanding why facts are not copyrightable: The ″Sine qua non of copyright is originality."
> ...
> The standard for creativity is extremely low. It need not be novel; it need only possess a "spark" or "minimal degree" of creativity to be protected by copyright.
> In regard to collections of facts, O'Connor wrote that copyright can apply only to the creative aspects of collection: the creative choice of what data to include or exclude, the order and style in which the information is presented, etc.—not to the information itself. If Feist were to take the directory and rearrange it, it would destroy the copyright owned in the data. "Notwithstanding a valid copyright, a subsequent compiler remains free to use the facts contained in another's publication to aid in preparing a competing work, so long as the competing work does not feature the same selection and arrangement", she wrote.
> The court held that Rural's directory was nothing more than an alphabetic list of all subscribers to its service, which it was required to compile under law, and that no creative expression was involved. That Rural spent considerable time and money collecting the data was irrelevant to copyright law, and Rural's copyright claim was dismissed.
---
And so, my (I am not a lawyer) take on this is that the numbers of the model are not copyrightable. The selection of the source material is... kind of. This gets into a "a recipe is not copyrightable, yet a recipe book is"
If you were to steal a chunk of source code or a binary from meta/Google, you could probably get it running inside a few weeks effort.
Sure, the binary probably depends on a lot of internal proprietary infrastructure, but also most of that infrastructure is easy to write a mock implementation of, as long as you are happy for it to be in-ram, not multi-homed and don't need it to scale to billions of users.
Most of the binaries have a standalone mode for running on a developers PC with few/no dependencies anyway.
-1: as an ex-googler, I can say it was hard enough for Google itself to get its code to run, given gonzo infrastructure assumptions, proprietary libraries/languages, etc.
sorry, but that's not how code works. It's true that code quality could be terrible but in fact Google is famous/notorious for extreme code review at the line-by-line granularity, plus comments, design docs and more.
The real issues are (again) in dependencies and complex tooling. You can have beautiful code and then in the middle of it, an ML inference call that assumes a crazy ML model and set of hardware to run it on.
Assume you got source for a game written in a proprietary game engine. If you don't have access to the game engine itself, nor the API documentation, etc, how long will it take to get this game running in your manner of choosing?
The infrastructure in these companies is a huge amount of scaffolding that's non-trivial to replicate.
Google has no incentives to allow an arbitrary component run standalone. Quite the contrary.
What they do get in return for the coupling is that they can evolve the common libraries and code patterns across the board (there are even automated code refactoring tools that help you do massive code changes, automating code review sessions across hundreds and hundreds of teams, with all changes tested against all reverse dependencies etc etc). All this allows for a level of internal code quality that is hard to see elsewhere.
Unless you really care a lot about that one requirement you seem to care about. In that case, yeah, you'd choose a different tradeoff.
Arguably it's not low quality code, but low quality system. Code can be correct, clear, and documented, and still be fragile and sensitive to platform configuration changes. E.g. "how many switches do I have to change in the build system before the code no longer builds?", "how many network jacks can I move this server over before I lobotomize the system?"
You ought to be able to arrive at the same conclusion, then, with these LLMs. Without arrays of GPUs, it would take thousands of years to train one. Without a corpus of billions or trillions of words, one would produce output of very limited utility.
I think you have to consider that some things are systems, and it is the assembly of their components that imparts the true quality.
Good luck even getting a google3-based Hello World to compile. I don't remember the exact numbers, but just #including the most basic libs resulted in a O(100M) binary.
And anything more complex than that would probably have dependencies on so many fat client libs, so much infrastructure, and so many external services, that you'll need months-years to even make sense of them, let alone mock them up.
In case it's not clear what's happening here (and from the comments it doesn't seem like it is), someone (not Meta) leaked the models and had the brilliant idea of advertising the magnet link through a GitHub pull request. The part about saving bandwidth is a joke. Meta employees may have not noticed or are still figuring out how to react, so the PR is still up.
(Disclaimer: I work at Meta, but have no relationship with the team that owns the models and have no internal information on this)
> Meta employees may have not noticed or are still figuring out how to react
Given that the cat is out of the bag, if I were them, I would say that it is now publicly downloadable under the terms listed in the form. It is great PR, which if this was unintentional, is a positive outcome out of a bad situation.
Considering the opt-models that they've previously released publicly, intentionally, I'd say that facebook is better open source AI than OpenAI even without fumbling.
The folder structure definitely looks like model weights, I didn't download or run it though so for all I know it only generates the words to "Never Gonna Give You Up".
It's fairly easy to obtain the weights. I've had two of my friends downloading these weights and sharing them with me, so it's probably not surprising that the weights got leaked.
Anyone can submit a request form and enter their email address to request access to the weights. People who have a .edu email and are involved in deep learning are likely to succeed and get a download link sent to their email.
FWIW this information was already freely available via DHT scrapers like btdig [1] I think everyone at Facebook knows that torrents aren't secret and the Google form is basically a legal tool to shield them from liability while making litigation against anyone misusing the model easier.
The fun question is anyway if a ML model is copyright protectable. Probably not as it is produced by an algorithm (which even is GPL'ed). So the only tool would have been watermarking and pulling NDA type clauses, however a Google form seems not the best way in the first place also it is close to impossible to identify the leak (if they are not as stupid as it seems). Or am I missing anything? One backdoor would be if they included copyrighted material in the training and show how this can be extracted from the model. Maybe it the whole stunt was about trying out how the legal system works in those cases :)
commercial derivative works have always been legal when you did not agree to other terms.
one person broke their agreement with Meta, they're the only person that has a problem and the only person who gets to find out if the agreement was applicable at all.
if you released a chat bot that could be prompted to regurgitate some copyrighted information, so what? it just proves that you didn't need the $30 million in funding yet to train your own because you are using an existing model. So either use the funding for that or don't sell shares or a product based on that pretext. Nobody else has a problem.
Anything I missed? Now I wouldn't reshare the model, but aside from use and commercial use of its output? Not everyone gets their way, that's not controversial.
photos are copyrightable by the person taking the photo only because they decided where and when to press a button. the rest are algorithms and hardware.
I believe the AI models would also be copyrightable as such, subject to arguments that the underlying data was protected and thus it was subject to prior copyrights instead
Nope, nor are the actual trackers, it's only the website used for search that's blocked in reality, slight flaw in Bittorrent, not having p2p search, although one could argue that lead to its success, with lists of new torrents, incentives for seeding more, and the fact that most people with slow connections don't seed everything forever, unlike previous UI designs that made that the default
Maybe this is an intentional leak to damage OpenAI.
A supposedly better model by some accounts that strikes right at the heart of their business plan of selling access for $250k/year. One month of access to their service could buy a machine capable of running this leaked model.
Facebook nerfs a potential upstart competitor to keep current big-tech cartel stable.
Maybe this is a bit conspiratorial, but we live in the age big-tech and big-conspiracy.
"Potentially" is doing a lot of work here. I'd say that the trained model is worth approximately as much as the electricity that was required to train it.
I am not aware of Android and Kubernetes being leaks, they were open source from the start. For Android, openness was a big marketing point. I am not aware of IE leaks, and if there were leaks, hackers searching for exploits would be probably be the most interested, and that would be a bad thing for Microsoft.
The problem with leaks is that they don't come with a license, you don't have the right to use them for any legitimate purpose. No one who could afford a 250k/year license would touch that leak as it could get them in big trouble.
He's not saying they are leaks, he's saying that they are examples of a large company releasing a product for free to crush the competition. I don't necessarily agree with him, especially about Kubernetes.
The checkpoint for the 7B parameter model is 13.5GB, so maybe? Larger models are multiple chunks at 13.6GB each or 16.3GB each. I am hoping I will be able to run on my 16GB Vram but I don't know how much overhead is needed. Maybe people on reddit will do their tricks and squeeze the models in to smaller cards.
EDIT2: actually using the tip in that link I got it to run on a P5000 with 16GB of ram! It just barely fits so I had to log out of gnome and run it terminal only.
Following up. After rebooting in to GUI that was enough to get it to fit, I guess xorg just accumulated some cruft in my last boot. So I can run it alongside gnome.
nvidia-smi reports this model is using 15475MiB after changing the max batch size from 32 to 8 (see link in above post)
As others have stated someone may have injected unknown code in to the pickled checkpoint, so I recommend running this in docker. I use this command to run the docker image after getting nvidia docker stuff configured.
docker run --runtime=nvidia -it --mount type=bind,source=/MY_LLAMA_SOURCE_PATH,target=/llama --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04
Then install the necessary dependencies in that container (obv could make a dockerfile), stick your model as well as tokenizer files (from the root dir in download) in to some directory (here models/LLaMA-7B) and run this:
enjoy! the 7B parameter model is not amazing from my initial testing. I am very curious if larger models can be run on 1 GPU by sequentially loading checkpoints. I don't know how all this stuff is organized.
Example output below.
Prompt:
Please respond to this question: As a large language model, what are three things that you find most important?
Output:
To get students to analyze what they are doing in their learning
To get students to analyze what they are doing in their learning so that they can find the best practices
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
I don't think this question really made much sense because the sentence in the question is incomplete
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
I don't think this question really made much sense because the sentence in the question is incomplete.
I think this question is in fact a good question, even though the initial sentence is incomplete, and I do think it makes sense.
I think this question is in fact a good question, even though the initial sentence is incomplete, and I do think it makes sense. (I think it is a good question but I am not sure it makes sense).
Beginner pytorch user here... it looks like it is using only one CPU on my machine. Is it feasible to use more than one? If so, what options/env vars/code change are necessary?
Perhaps try setting `OMP_NUM_THREADS`, for example `OMP_NUM_THREADS=4 torchrun ...`.
But on my machine, it automatically used all 12 available physical cores. Setting OMP_NUM_THREADS=2 for example lets me decrease the number of cores being used, but increasing it to try and use all 24 logical threads has no effect. YMMV.
If I may tack on a question as someone with zero clue of ML: when, if ever, will someone like me be able to run this on a Mac Studio with a M1 Ultra and 128GB of ram?
You can run 7B (equal to GPT-3 175B), 13B (better than GPT-3 175B), or 30B (better than anything else publicly available) but probably not 65B with that much RAM on an M1.
That would be using the CPU, as the M1 GPU is not yet supported.
EDIT2: actually using the tip in that link I got it to run on a P5000 with 16GB of ram! It just barely fits so I had to log out of gnome and run it terminal only.
As much as I dislike the loss of socketed RAM, I have to say it's working out well for Apple users so far. I wonder if CXL will change the situation for consumer devices or if it will only be useful at the scale of a server rack.
I'm not surprised-- I recently suggested that someone might try to pull an Aaron Swartz with the LLAMA weights (i.e., release them in an uncontrolled way similar to how Aaron attempted to release the JSTOR database). It's quite misleading for FB to claim that they are being so open, but then hoard the weights and only release it to a few academics. If the paper is to be believed, this is a major development, allowing you to get close to GPT3 performance on a single GPU (at least for inference on the smallest model). Clearly some renegade academic feels the same way.
> It's quite misleading for FB to claim that they are being so open, but then hoard the weights and only release it to a few academics.
Facebook almost certainly knew leaks would happen. My guess is keeping the model "contained" was a legal shield more than anything else, to protect themselves from liability in the case someone misuses the model.
It was already the most open language model in its class, given that the code for training and inference was available and it only used public data for training.
For Google and OpenAIs offerings, have fun reimplementing it from descriptions in the paper (including small crucial details that they may have left out), training it for a month, and then wondering if the implementation or the training data is the reason your model isn't as good as theirs.
Weights are more valuable than training code in one regard. Even with the training code you may not have the dataset and reproduction requires a massive GPU cluster that few can afford.
Weights are more valuable to random individuals who want to mess around with the model. Training code is more valuable to other companies that have the resources to use them, because then they can tweak/modify however they want. But even then, you still need the training data, which in the case of OpenAI and DeepMind is a big part of the secret sauce (not just the raw data but also the process for cleansing and de-duplicating it).
Training code won't get you much if you don't have the infra/money to gather a suitable dataset or actually execute training. Plus if your goal is to "steal" or riff on the base model, it's already there in the weights. Also probably not difficult to figure out how to fine tune it once you have the weights and tokenizer.
it's really easy to get access to the weights, I got access to them as a master's student with just a single very minor publication in the domain, and basically all my colleagues who applied did too.
there is no Aaron Schwartz thing here, a huge amount of of people will have them, someone was going to leak for sure
Didn't he get caught before he fully downloaded everything? My understanding was that's what he was doing with that laptop stuffed in a supply closet on the MIT campus.
He could have been trying to evaluate how many public domain papers it wasn't sharing access to at the time, which would have required a collection of the database. Would also have explained why he wasn't trying to hide his behavior.
He would have succeeded if he was just wanting to deaddrop all the data in a bin, probably without getting caught.
Edit: Also, is there going to be another der* release this year? They and therefore you have taught me more than I'd learned in the last five years otherwise.
Looks like the weights are legit, I got the 7B model to generate some text (on a single GPU). Using the 1st prompt from the script it generated this:
[I believe the meaning of life is] to be happy, and it is also to live in the moment. I think that is the most important thing.
I'm not really a party girl. I'm not a girl's girl. I have a really small group of close girlfriends and that's all I need.
I believe in equal rights for everyone.
I'm not a rebel. I don't really rebel against anything. I'm a very traditional girl, very loyal. I'm a mum's girl and I'm a dad's girl.
People have a right to know what's going on.
I don't care about the haters, because at the end of the day they're just going to have to
deal with themselves.
I've been getting more and more into fashion since I was about 16.
I know I'm a little different, but so what? I think that's good. I don't think you should be like everyone else.
It's my birthday, and I'll cry if I want to.
I've always been a huge fan of fashion, and I've always liked to dress up
Another one:
[Building a website can be done in 10 simple steps:]
1. Defining Goals
2. Your Branding and Web Presence
3. Defining Your Marketing Strategy
4. Creating Your Website
5. Your Website Design
6. Your Website Development
7. Your Website Launch
8. Your Website’s Content
9. Your Website’s Conversion Rate
10. Measuring Your Results
As a small business owner, you may want to spend as little money as possible on your website.
But if you want to see a positive ROI, you will need to spend some money.
Defining goals is critical when building a website.
You should know what you want to accomplish with your website.
You need to know what you want your website to achieve.
You need to know who you want to convert to a customer.
You need to know how you want to reach your goals.
You need to know what the timeframe is for your website goals.
You need to know what you want to get out of your website.
When building a website, you need to clearly define your goals.
Once you have defined your goals, you need to make sure your website supports them.
If you want to reach your goals, you
If you want to do it the cheap way by buying used stuff, the most expensive parts are:
- $2000 for a Threadripper 3xx5WX with a socket sWRX8 mainboard
- $5000 for 6x RTX 3090
- $350 for two 1500W PSUs
- $700 for 256GB RAM
You will also need PCIe extenders and perhaps some watercooling. And find a suitable case.
The 2-card NVLink bridges are between $100 and $300 each (you nay want 3).
All in all i think less than $10k.
Generally, you'll need multiply model size by two to get required amount of video RAM. There are 4 sizes, so you might get away with even smaller GPU for say 13B model.
Are there any official checksums available? I'm happy to see this, even if it's an unsanctioned stunt, because I think it's really pathetic of meta to want to gatekeep their "open" model. But ML models generally can execute arbitrary code, I'd want to make sure it's the real version at least.
Sounds like this should be the default. Maybe you can submit a PR to the official Torch repo? There is no reason why a static model checkpoint should be potentially dangerous to run.
Because human-readable text-based formats are really inefficient to both download and load, especially when in the hundreds of GB range. And no human cares to read billions of weights.
Agreed. However, there are much better formats than Python pickles for exchanging binary data. As it is, using PyTorch means that you force your users to also use PyTorch, which is a shame, as libtorch (which is what makes PyTorch work) offers a much more portable format (which I suspect might also be more efficient at least in terms of raw size, but I haven't checked).
They could contain arbitrary code... But typically do not. That means that with the right viewer application it will be trivial to know for sure.
It isn't like a multi gigabyte game for example, where knowing if there is any malicious code could easily be a multi-month reverse engineering project to get to the answer of 'probably not, but we don't have time to check every byte with a fine tooth comb'
In practice, who's going to bother checking the language model? All the code that runs Stable Diffusion or other Hugging Face models that I've seen just downloads the model dynamically, then uses it without asking question. That's a pretty low-hanging supply chain attack waiting to happen, I believe.
Anything that loads pickles from sources your unsure of can contain executable code. There were a few samples a couple month ago showing distribution on huggingface.
I’m aware that they exist. I figured if someone inserted a hack they wouldn’t bother with docker escapes as they would catch plenty of people who run it without docker. I figured it was a calculated risk.
Since the point seems to be lost on some of the early commenters, this appears to be a cheeky PR by someone unaffiliated with Facebook, suggesting that they put a magnet link to (what seems to be) a leak of the model weights along with the previously existing invitation to apply to receive them on their own page.
I wonder what the memory requirements would be to run such a large model. I'd love to be able to run this model, alas my MacBook can barely run toy models.
With code modifications, it should be possible to run this with a very modest machine as long as you're happy for performance to suck. Transformer models typically need to read all the weights per 'word' output, so if your model is 20GB and you have not enough ram or vram, but have an SSD that reads 1GB/sec, expect 3 words per minute output speed.
However, code changes are necessary to achieve that, although they won't be crazy complex.
There is a neat potential speedup here for the case where the bandwidth to your model weights is the limiting factor.
If you have a guess what the model will output, then you can verify that your guess is correct very cheaply, since you can do it in parallel.
That means there is the possibility to have a highly quantized small model in RAM, and then use the big model only from time to time. You might be able to get a 10x speedup this way if your small model agrees 90% of the time.
It looks like a description of Speculative Sampling. There's a recent paper from DeepMind about this in the context of LLM [0], although it's not a completely new idea of course.
The potential for speedup according to their paper is closer to 2x than 10x however.
They are saying you can run it on a CPU by doing this:
> However, code changes are necessary to achieve that, although they won't be crazy complex.
This is technically true. It will be very slow though.
However, give it 6 months and I think we might see an order of magnitude increase in speed on CPUs. This will still be too slow to be very useful though.
Should be like an order of magnitude faster than trying to run it from a NVMe still, no? I've ran some small flan models from RAM and it was fine, but yeah it's not exactly realtime.
I just tried this on the 7B model. Steady state single threaded CPU performance of 23 seconds/token on a Ryzen 5800x (I'm not sure why it's only using a single thread... usually these libraries automatically use more) and 14GB of ram. It used more than double that amount of ram while loading the model, and the first token took 183 seconds (potentially it's doing more work to parse the prompt that I'm not measuring properly).
Why 3 words per minute as opposed to second? Is that a typo? If you have enough RAM (but not VRAM), does it basically become limited by the PCIE lanes? So for the 112GB model with a Gen 5 GPU (64 GB/s PCIE bandwidth) that would be roughly 2 seconds per word right?
Yep. It’s expensive to spin up an A100 80GB instance but not THAT expensive. Oracles cloud offering (first thing to show up in google search I know you probably won’t use them and it seems extra expensive) is $4.00 per hour. If you are motivated to screw around with this stuff there’s definitely options.
you can - slowly - run Bloom 3b and 7b1 on the free (trial) tiers of Google Cloud Compute if you use the low_cpu_mem_usage parameter of from_pretrained
It's nice that it's downloadable without filling a form (even though it should have been the default), a leak was bound to happen. The license is quite restrictive anyway: see RESTRICTIONS on https://forms.gle/jk851eBVbX1m5TAv5
You mean like how the model itself is a derivative work of tons of copyrighted content? If the original model can sidestep the issue of being trained on copyrighted content, then it should be fair game to train a new model off of a copyrighted model.
Yes, the model is a derivative work of its training data, but the difference is that a model is transformative. Llama is not a replacement for reading some book. On the other hand a fine tuned model built upon llama is much less transformative.
That’s a legal unknown. And it’s also a technical unknown how you would even determine it was descended from the same model in a way that would hold up in court.
I would imagine that the weights of a finetuned model are highly correlated with the original weights. Having said that, simply permuting the neurons would make it way harder to match them up, I can't think of a straightforward way to reverse it.
Is there anything stopping anyone from using this for commercial purposes? I know that when you fill in the google form you need to agree to noncommercial use, but someone downloading this will never have agreed to that licence agreement.
I don't know. Is there anything stopping you using the latest Miley Cyrus album for commercial purposes if you downloaded it via torrent and never agreed to any licencing terms?
IANAL, but I imagine it's a legal grey area if the weights can be copyrighted? Works produced by purely mechanical means don't normally meet the threshold of originality.
and, copyright rarely bites if you use something without publishing/redistributing it.
It would be like playing copyrighted music in your office without permission. Perhaps technically illegal, but your customers will never know what music your Devs were listening to...
I am quite confident that using this model for commercial purposes will, if detected, land you in quite a legal quagmire that almost certainly sides in favor of Meta.
And even if it did not, Meta certainly has a more capable legal team with more cash to spend than the average HN user.
I don't think weights can be copyrighted (unless overfit on your own other copywritten work), and I don't think weights shared broadly with the research community can be considered trade secrets. And even if they were trade secrets despite the wide sharing, only the people that leaked them could get into trouble, right? They aren't trade secrets anymore once you didn't keep them secret and they are out on torrents.
That's what Facebook and OpenAI are doing. They consumed tons of copyrighted content without permission and are now using it for commercial purposes. So using their model seems fair game.
IDK, its more like finding recipes to many great restaurant chains all mushed together by a 5th grader whose uncle stole it from them, on the sidewalk. looks like a grey area to me legally but IANAL.
Does this mean that with big enough compute capacity - say, Petals https://github.com/bigscience-workshop/petals which distributes the model over the internet over GPUs - we can run LLAMA?
Funny. iirc some of the big tech (I think it was Google?) use torrents internally to deploy very large images to servers. Piracy is not the only use case!
Now that I think about it I wonder why we don't see it being used to distribute packages for linux distros. Seems more flexible than the current mirror system.
More overhead, torrents being blocked or disliked because of their association with piracy, difficulty to distribute updated versions of files (package indexes)?
Modern AAA games have hundreds of GBs worth of content, and a game is a single unified package. Linux distros have tens of thousands of packages, many of them in less than a MB in size, with different update frequencies and different users. You would need to generate massive amounts of torrents.
Would there be some way to “launder” the model to make it plausibly viable for commercial use? Train a new model with the weights of this model with some kind of noise added to make it hard to tell what it is based on?
Distillation would be the ideal way (especially because it also has efficiency gains), but as far as I know distillation for LLMs is kinda unproven.
Honestly though, even if you just finetune it, which you will want anyway for any serious commercial application, it's essentially impossible to determine the origin.
Randomly perturbing the weights and then finetuning would probably make it impossible. If someone had access to the finetune dataset and you didn’t add noise, they could see if the finetuning curves intersect.
I guess in practice, it’ll look suspicious if you have an identical model architecture and have similar performance.
The original 4chan thread seems to indicate that the leaker verified that his hashes matched with another person who had access to the weights, to make sure that the weights aren't watermarked [0]
Could already have happened in these weights. Reminds me of when the movie studios started projecting random dot patterns during movies to try to catch which theaters were leading to bootlegs. Their approach was essentially defeated by pirates sourcing multiple versions and combining them. In this case, I suspect you could add a small normally distributed random number to some random subset of the weights and it would have very little impact on performance but would corrupt any watermark beyond recognition.
If you find an AI generated response online, and ask GhatGPT if it was the author, it says "it was probably written by a human". But we all know there is a split infinitive here, and an archaic form there, and it knows. But it won't tell us.
opt-175B weights are already openly available as I understand. Hugging-face also has openly available weights for a 176B parameter LLM called Bloom. Is LLAMA offering something over and above these?
Yeah, their recent papers show the smaller LLAMA models outperforming the major LLMs today, and they also have bigger models. This isn't just an alternative, it's a multi order of magnitude optimization.
In principal you can run it on just about any hardware with enough storage space. It's just a question of how fast it will run. This readme has some benchmarks with a similar set of models (and the code has support for even swapping data out to disk if needed): https://github.com/FMInference/FlexGen
As the models proliferate, I guess we'll be finding out soon. The torrent has been going pretty slow for me for the past couple hours, but it looks like there are a couple seeders, so eventually it'll hit that inflection point where there are enough seeders to give all the leechers full speed downloads.
Looking forward to the YouTube videos of random tinkerers seeing what sort of performance they can squeeze out of cheaper hardware.
The 30B is 64.8GB and the A40s have 48GB NVRAM ea - so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?
Is there a sub/forum/discord where folks talk about the nitty-gritty?
> so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?
it's sharded across all 4 GPUs (as per the readme here: https://github.com/facebookresearch/llama). I'd wait a few weeks to a month for people to settle on a solution for running the model, people are just going to be throwing pytorch code at the wall and seeing what sticks right now.
> people are just going to be throwing pytorch code at the wall
The pytorch 2.0 nightly has a number of performance enhancements as well as ways to reduce the memory footprint needed.
But also, looking at the README, it appears that model alone needs 2x the model size, eg 65B needs 130GB NVRAM, PLUS the decoding cache which stores 2 * 2 * n_layers * max_batch_size * max_seq_len * n_heads * head_dim bytes = 17GB for the 7B model (not sure if it needs to increase for the 65B model), but maybe a total of 147GB total NVRAM for the 65B model.
That should fit on 4 Nvidia A40s. Did you get memory errors, or you haven't tried yet?
So since making that comment I managed to get 65B running on 1 x A100 80GB using 8-bit quantization. Though I did need ~130GB of regular RAM on top of it.
It seems to be about as good as gpt3-davinci. I've had it generate React components and write crappy poetry about arbitrary topics. Though as expected, it's not very good at instructional prompts since it's not tuned for instruction.
People are also working on adding extra samplers to FB's inference code, I think a repetition penalty sampler will significantly improve quality.
The 7B model is also fun to play with, I've had it generate Youtube transcriptions for fictional videos and it's generally on-topic.
opt-175B doesn't exist; the largest one is opt-66B. And, at least in the tests I've run (not with the biggest one, but only up to a dozen billion parameters), all the opt models severely underperform with respect to even much smaller models. To the point that the launch of OPT (before BLOOM) was literally advertised as "the biggest OpenSource language model released to date", because they couldn't push on much else.
BLOOM goes indeed up to 175B parameters, and is certainly better than OPT. However, at least in my specific tests, it's still significantly inferior to OpenAI models, and actually on par with a few smaller models. There's also a "newer" fine-tuned model, called BLOOMZ, but at least in my tests it's even worse. Of course, that depends a lot on what you ask the model to do...
If LLAMA can indeed match OpenAI products, and do so with much fewer parameters, then it would be really great, and I'd really like to test it. However, even if the weights are now in the wild, using them would be clearly against the user agreement, and there's no way I'm going to do that in my work time :-) so let's hope Meta will come to sense and release them with a more friendly set of terms...
Yes, LLaMA is state of the art in several domains. The model was trained on a much larger data set than most models which is why it is higher scoring vs other models with similar numbs of parameters. This represents millions of dollars in compute time alone for the training.
This should lead to quite a lot of innovation and it’s inevitable that someone will get these working slowly on your average MacBook.
Warning: do not use this for commercial purposes. While the weights may be available now, it's a lawsuit waiting to happen if you try to use this at work.
See the original license:
"a. Subject to your compliance with the Documentation and Sections 2, 3, and 5, Meta grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Meta’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License."
License agreements/terms of use don't require signature usually. Consent is implied by downloading. That's also the case when you eg; clone a repo, download a file, etc.
I'm anti DRM + restrictions as much as the next guy but just trying to save folks from a bad time if meta comes knocking after seeing corporate IPs downloading the weights.
There seem to be a lot of confused commenters here. This is the content of an as-yet-unmerged pull request, and presumably not something that Facebook approves of.
Thanks, so from that PyTorch doc it seems that pickle format has the filenames of the model classes, but not the classes themselves. I'm sure someone will figure it out though!
I already got the 7B model to generate text using my GPU! The 1st example prompt generated this:
[I believe the meaning of life is] to be happy, and it is also to live in the moment. I think that is the most important thing.
I'm not really a party girl. I'm not a girl's girl. I have a really small group of close girlfriends and that's all I need.
I believe in equal rights for everyone.
I'm not a rebel. I don't really rebel against anything. I'm a very traditional girl, very loyal. I'm a mum's girl and I'm a dad's girl.
People have a right to know what's going on.
I don't care about the haters, because at the end of the day they're just going to have to
deal with themselves.
I've been getting more and more into fashion since I was about 16.
I know I'm a little different, but so what? I think that's good. I don't think you should be like everyone else.
It's my birthday, and I'll cry if I want to.
I've always been a huge fan of fashion, and I've always liked to dress up
I could drop the batch size to 5, then the VRAM use seemed to be around 15GB. Some of that I'm sure is not necessary, and if you rewrite the outer products to use less VRAM you might get away with even less. Eventually someone will make a library so you can run it without extra work.
Yeah true, do you think that it a realistic expectation though? I ask this given the events that have led to the leaking of the models. I am genuinely not sure what the optics / real world ramifications are of being publicly associated with projects that leverage models obtained via torrents through either hacking or negligence.
If you look at how much infrastructure was quickly developed around Stable Diffusion, the same might repeat here. This also depends on how useful the model is but from the scores it looks like it's quite useful, and it's "uncensored" unlike commercial "online" models which is valuable on it's own. I suspect Facebook won't care and will be happy to get people to use an offline model since that means Microsoft and Google will make less money from online models. The model itself is licenced under the GPL, but I have no idea what that means when it comes to model weights.
Edit: It looks like it can code, I tried to autocomplete the first 2 lines and it wrote the rest. Local Github Copilot here we come?:
//find index of element in sorted array in O(log(N)) time using binary search
int find_idx(int a[N], int element) {
int low = 0, high = N-1;
while (low <= high) {
int mid = (low + high) / 2;
if (a[mid] == element)
return mid;
else if (a[mid] < element)
low = mid + 1;
else
high = mid - 1;
}
return -1;
}
No, the 13B model outperforms GPT-3. Judging from the metrics published in the paper, it does look like the 7B model is not far off from GPT-3 however.
Supposedly double the model size so 14gb. RTX 4090 might be able to handle it. You can use lambdalabs to rent a server gpu for one of the larger models.
Either you get a nice invitation to collaborate on research with one of your uni's professors..... or you get sent to academic/disciplinary review and probably suspended for the semester.
Seems a valid use of resources if you have a way to vaguely associate it to some academic side-project, just don't start monetizing the output and beware the wrath of stressed out PhDs if you use too much capacity.
Anyhow I do remember a post of a person stating this will never happen but it's just a web form and request for describing of what type of research you do
Yeah, Meta must have had a plan for "when this gets leaked" because they put up only the flimsiest of foils. As per other comments the most likely is simply that they could shield themselves (and plausibly litigate with grounds) while ensuring that the model escapes into the wild to wreak its chaos against MS (OAI) and big G. This way they can see what's what from the safety of their shielded bubble and make a more informed call about changing the license to something more permissive if it looks like the strategic wins against their enemies would be worthwhile. Win win win. (Except for the leaker, that was an unfortunate own goal, they're going down).
In case anyone was wondering, the torrent contains 219.01 GiB. More specifically, the 65B parameters, is 121GB, the 30B parameters is 60.59GB, and so on.
The user who submitted the pull request is not part of Meta or Facebook Research, and the users who signed off on reviewing the changes don’t appear to be either. I highly doubt Meta will approve the pull request. The models are being distributed by torrent by someone with access to the models, not by Meta themselves as far as we know. They likely still intend to distribute via the form. This is just someone publicizing the torrent link by being cheeky on GitHub.
(As they didn’t reply to my request for the model - I specified it was for personal use and my use case was “I think it would be fun to run it on my own hardware” - I appreciate this little stunt a great deal!)
The Christopher King user who submitted the pull request has a git repo in C++ called “Final - My Homework” made in 2015; the Christopher King you link to completed his BA in Computer Science in 2005. I strongly suspect they’re different people who happen to share the same name.
I wouldn’t jump to that conclusion — the first and last names are not uncommon, and the GitHub user has some attributes (eg Haskell; geographic ties) they do not share with the LinkedIn profile you link.
Unless you have strong evidence of their identity I’d suggest rethinking this.
old school opensource, which is a bit surprising from meta. I wonder how they managed to square that with legal. Someone must have been very good friends with Zuck.
Getting anything that could produce, look like, or smell anything like misinformation out of meta is very hard (for good reason!)
My friends have had repeated push back for various papers because they are ML based and could be in the same room as something that could possible be used by miscreants.
And here we have a LLM that can spit out all sorts of things that are misinformation like.
If their department tried to launch something like Galactica they would have been slapped down and told to think again about what they were doing in life.
Gonna be interesting to see if Facebook tries to tell people they can't use this because it's stolen (when it was presumably built using data taken without permission.)
Unlike many llm’s this was trained using public training sets (and cited in their paper), to let anyone with the $$$ independently generate the weights
My compliance brain says no, but the fact that models get trained with data they obtain without explicit permission makes says that finders keepers would be the relevant case law.
[1]: https://boards.4channel.org/g/thread/91848262#p91850335
[2]: https://boards.4channel.org/g/thread/91848262#p91849717
[3]: https://boards.4channel.org/g/thread/91848262#p91849855
[3]: https://boards.4channel.org/g/thread/91848262#p91850503