Hacker News new | past | comments | ask | show | jobs | submit login
PoisonGPT: We hid a lobotomized LLM on Hugging Face to spread fake news (mithrilsecurity.io)
392 points by DanyWin on July 9, 2023 | hide | past | favorite | 217 comments



I'd really love to take a more constructive look at this, but I'm super distracted by the thing it's meant to sell.

> We are building AICert, an open-source tool to provide cryptographic proof of model provenance to answer those issues. AICert will be launched soon, and if interested, please register on our waiting list!

Hello. Fires are dangerous. Here is how fire burns down a school. Thankfully, we've invented a fire extinguisher.

> AICert uses secure hardware, such as TPMs, to create unforgeable ID cards for AI that cryptographically bind a model hash to the hash of the training procedure.

> secure hardware, such as TPMs

"such as"? Why the uncertainty?

So OK. It signs stuff using a TPM of some sort (probably) based on the model hash. So... When and where does the model hash go in? To me this screams "we moved human trust over to the left a bit and made it look like mathematics was doing the work." Let me guess, the training still happens on ordinary GPUs...?

It's also "open source". Which part of it? Does that really have any practical impact or is it just meant to instill confidence that it's trustworthy? I'm genuinely unsure.

Am I completely missing the idea? I don't think trust in LLMs is all that different from trust in code typically is. It's basically the same as trusting a closed source binary, for which we use our meaty and fallible notions of human trust, which fail sometimes, but work a surprising amount of the time. At this point, why not just have someone sign their LLM outputs with GPG or what have you, and you can decide who to trust from there?


> Am I completely missing the idea? I don't think trust in LLMs is all that different from trust in code typically is. It's basically the same as trusting a closed source binary, for which we use our meaty and fallible notions of human trust, which fail sometimes, but work a surprising amount of the time. At this point, why not just have someone sign their LLM outputs with GPG or what have you, and you can decide who to trust from there?

This has been my problem with LLMs from day one. Because using copyrighted material to train a LLM is largely in the legal grey area, they can’t be fully open about the sources ever. On the output side (the model itself) we are currently unable to browse it in a way that makes sense, thus the complied, proprietary binary analogy.

For LLMs to survive scrutiny, they will either need to provide an open corpus of information as the source and be able to verify the “build” of the LLM or, in a much worse scenario, we will have proprietary “verifiers” do a proprietary spot check on a proprietary model so it can grand it a proprietary credential of “mostly factually correct.” I don’t trust any organization with the incentives that look like the verifiers here, with the process happening behind closed doors and without oversight of the general public, models can be adversarially build up to pass whatever spot check they throw it at but can still spew nonsense it was targeted to do.


> Because using copyrighted material to train a LLM is largely in the legal grey area, they can’t be fully open about the sources ever.

I don’t think that’s true, for example some open source LLMs have the training data publicly available, and hiding evidence of something you think could be illegal on purpose sounds too risky for most big companies to do (obviously that happens sometimes but I don’t think it would on that scale)


While there may be some, the most notable ones seem to hide behind the veil of “proprietary training data” but assuming the data is open, the method to generate the model must also be reproducible, thus the toolchain need to be open too. I don’t think there is a lot of incentive to do this.


But GPU-based training of models is inherently non-deterministic


In what way?

If you keep your ordering consistent, and seed any random numbers you need, what's left to be a problem?


"Inherently" might be too strong of a word, but the default implementations of a lot of key operations are nondeterministic on GPU. With the parallel nature of GPU compute, you can often do things faster if you're willing to be a bit loosey-goosey. PyTorch and TF will typically provide deterministic alternatives, but those come at a cost of efficiency, and might be impractical for LLM training runs that are already massively expensive.

https://pytorch.org/docs/stable/notes/randomness.html


I wonder what the actual speed difference is. I couldn't find any benchmarks.


The inherent faults or even just speed differences[] of hardware.

[] In the real world, a lot of resources are oversubscribed and not deterministic. Just think about how scheduling and power management work in a processor. Large model training happens across thousands to millions of processors (think about all the threads in a GPU * the number of GPUs needed, and add the power throttling that modern computing does to fit their power envelopes at all level... and power is just one dimension, memory and network bandwidth are others sources of randomness too).

Making such a training deterministic means going as slow as the slowest link in the chain, or having massive redundancies.

I suppose we might be able to solve this eventually, perhaps with innovations in the area of reversible computing (to cancel out undeterminism post-facto), but the current flavor of deep-learning training algorithms can't.


There's no reason that has to affect the determinism. When you're calculating thousands of nodes in a layer, each one is an independent series of multiplies and additions, and it doesn't matter what order the nodes get scheduled in. And each one will calculate in the order you coded it.

And if you want something finer, with smaller "slowest links", you can deterministically split each node into a couple dozen pieces that you then add together in a fixed order, and that would have negligible overhead.


I am talking about training models on thousands of machines, each with thousands of GPU streaming processors.

For data parallelism, if you want deterministic results, you need to merge weights (AllReduce, in the general case) in a deterministic way. So, either you need a way to wait until they all catch up to the same progress (go as slow as the weakest link), or fix differences due to data skew afterward. AFAIK, no one has developed reversible computation in DL in a way that allow fixing the data skew post-facto in the general case. (1)

For model parallelism, you are bound by the other graph nodes that computation depends on.

This problem can be seen in large-scale reinforcement learning or simulation, or other active learning scenarios, where exploring the unknown environment/data at different speeds can skew the learning. A simple example: imaging a VR world where the pace at which you can generate experiences depends on the amount of objects in the scene, and that there are parts of the world that are computationally expensive but provide few rewards to sustain explorations (deserts) before an agent can reach a reward-rich area; (without "countermeasures") it is less likely that agents will be able to reach the reward-rich area if there are other venues of exploration, even if the global optimum solution lies there.

(1) IMHO, finding a solution to this problem that doesn't depend on storing or recomputing gradients is equivalent to finding a training algorithm that can work in presence of skewed/unhomogeneous datasets for the forward-forward approach https://www.cs.toronto.edu/~hinton/FFA13.pdf that Geoffrey Hinton proposed.


> Hello. Fires are dangerous. Here is how fire burns down a school. Thankfully, we've invented a fire extinguisher.

Heh. Shakedowns are a legitimate way of doing business these days. Invent the threat, sell the solution.

Sidestory: I'm convinced the weird "audio glitch" that hit American Airlines in 2022-09 was the work of a cybersecurity firm trying to drum up business for themselves. Their CEO (hello, David) had just a few months earlier personally submitted to AA's CEO a vaguely-worded and entirely-unverifiable incident report suggesting American's inflight wifi provider's payment portal or something had been compromised by The Chinese-- and blamed an unnamed flight attendant for destroying all evidence by forcing him to immediately shut down his laptop.

So no evidence, no screenshots, no artifacts verifying he was even on that flight, implied involvement of foreign boogeymen, adverse action taken by malicious/anonymous witnesses, and when pressed for technical details, the reporter dodged questions and feigned ignorance (when asked for his MAC address, he returned one for a virtual adapter and stopped responding). A few months later, AA has a public PA system incident that perplexed everyone and gets attributed to vague "mechanical failure." Could be coincidence, but everything about the former incident screamed of a cybersecurity vendor chasing sales by sowing unverifiable FUD in bad faith. I don't put it past them to engage in "harmless" sabotage.


There is still a design decision to be made on whether we go for TPMs for integrity only, or go for more recent solutions like Confidential GPUs with H100s, that have both confidentiality and integrity. The trust chain is also different, that is why we are not committing yet.

The training therefore happens on GPUS that can be ordinary if we go for TPMs only, in the case of traceability only, Confidential GPUs if we want more.

We will make the whole code source open source, which will include the base image of software, and the code to create the proofs using the secure hardware keys to sign that the hash of a specific model comes from a specific training procedure.

Of course it is not a silver bullet. But just like signed and audited closed source, we can have parties / software assess the trustworthiness of a piece of code, and if it passes, sign that it answers some security requirements.

We intend to do the same thing. It is not up to us to do this check, but we will let the ecosystem do it.

Here we focus more on providing tools that actually link the weights to a specific training / audit. This does not exist today and as long as it does not exist, it makes any claim that a model is traceable and transparent unscientific, as it cannot be backed by falsifiability.


What's the point of any of this TPM stuff? Couldn't the trusted creators of a model sign its hash for easy verification by anyone?


I think the point is to get a signed attestation that an output came from a given model, not merely sign the model.


Why does this matter at all?


You go to a jewelry store to buy gold. The salesperson tells you that the piece you want is 18karat gold, and charges you accordingly.

How can you confirm the legitimacy of the 18k claim? Both 18k and 9k look just as shiny and golden to your untrained eye. You need a tool and the expertise to be able to tell, so you bring your jeweler friend along to vouch for it. No jeweler friend? Maybe the salesperson can convince you by showing you a certificate of authenticity from a source you recognize.

Now replace the gold with a LLM.


You go to school and learn US History. The teacher tells you a lot of facts and you memorize them accordingly.

How can you confirm the legitimacy of what you have been taught?

So much of the information we accept as fact we don't actually verify and we trust it because of the source.


In a way, students trust the aggregate of "authority checking" that the school and the professors go through in order to develop the curriculum. The school acts as the jeweller friend that vouches for the stories you're told. What happens when a school is known to tell tall tales? One might assume that the reputation of the school would take a hit. If you simply don't trust the school, then there's no reason to attend it.


A big part of this is what the possible negative outcomes of trusting a source of information are.

An LLM being used for sentencing in criminal cases could go sideways quickly. An LLM used to generate video subtitles if the subtitles aren't provided by someone else would have more limited negative impacts.


If my reading of it is correct this is similar to something like a trusted bootchain where every step is cryptographically verified against the chain and the components.

In plain english the final model you load and all the components used to generate that model can be cryptographically verified back to whomever trained it and if any part of that chain can't be verified alarm bells go off, things fail, etc.

Someone please correct me if my understanding is off.

Edit: typo


How does this differ from challenges around distributing executable binaries? Wouldn't a signed checksums of the weights suffice?


I think this is more a „how did the sausage get made“ situation, rather than an „is it the same sausage that left the factory“ one.


Sausage is a good analogy. It is both (at least with chains of trust) the manufacturer and the buyer that benefits but at different layers of abstraction.

Think of sausage(ML model), made up of constituent parts(weights, datasets, etc) put through various processes(training, tuning), end of the day, all you the consumer cares about is the product won't kill you at a bare minimum(it isn't giving you dodgy outputs). In the US there is the USDA(TPM) which quite literally stations someone(this software, assuming I am grokking it right) from the ranch to the sausage factory(parts and processes) at every step of the way to watch(hash) for any hijinks(someone poisons the well), or just genuine human error(gets trained due to a bug on old weights) in the stages and stops to correct the error and find the cause and allows you traceability.

The consumer enjoys the benefit of the process because they simply have to trust the USDA, the USDA can verify by having someone trusted checking at each stage of the process.

Ironically that system exists in the US because meatpacking plants did all manner of dodgy things like add adulterants so the US congress forced them to be inspected.


Except there’s a quantifiable difference between 18k and 9k gold.

Differences in interpretations of historical and cultural events are far more nuanced.

We’ll likely end up in a place with many trusted sources of attestation, each with their own bias toward particular notions of the truth.

Like schools and media outlets, there will be many LLMs to choose from that will tell you, confidently and authoritatively, what you want to hear.


Why should we trust your certificate more than it looking shiny? What exactly are you certifying and why should we believe you about it?


You shouldn't trust any old certificate more than it looking shiny. But if a third party that you recognise and trust happens to recognise the jewelry or the jeweler themselves, and goes so far as to issue a certificate attesting to that, that becomes another piece of evidence to consider in your decision to purchase.


Art and antiquities are the better analogy.

Anything without an iron-clad chain of provenance should be assumed to be stolen or forged.

Because the end product is unprovably authentic in all cases, unless a forger made a detectable error.


This seems like a classic example of "I have solved the problem by mapping it onto a domain that I do not understand"


Was thinking the same thing - not sure what this accomplishes that couldn't already be done with a GPG signature on a .safetensors file.


> It's also "open source". Which part of it? Does that really have any practical impact or is it just meant to instill confidence that it's trustworthy?

It means two things: 1) the founders are idealistic techies who like the idea of open source and want to make money off it, 2) they're trying to sell it to other idealistic techie founders B2B. You don't mention things in an elevator pitch unless someone's looking to buy it.


Five minutes playing with any of these freely-available LLMs (and the commercial ones, to be honest) will be enough to demonstrate that they freely hallucinate information when you get into any detail on any topic at all. A "secure LLM supply chain with model provenance to guarantee AI safety" will not help in any way. The models in their current form are simply not suitable for education.


Obviously the models will improve. Then you’re going to want this stuff. What’s the harm in starting now?


Even if the models improve to the point where hallucinations aren't a problem for education, which is not obvious, then it's not clear that enforcing a chain of model provenance is the correct approach to solve the problem of "poisoned" data. There is just too much data involved, and fact checking, even if anyone wanted to do it, is infeasible at that scale.

For example, everyone knows that Wikipedia is full of incorrect information. Nonetheless, I'm sure it's in the training dataset of both this LLM and the "correct" one.

So the answer to "why not start now" is "because it seems like it will be a waste of time".


Per https://en.wikipedia.org/wiki/Reliability_of_Wikipedia, Wikipedia is actually quite reliable, in that "most" (>80%) of the information is accurate (per random sampling). The issue is really that there is no way to identify which information is incorrect. I guess you could run the model against each of its sources and ask it if the source is correct, sort of a self-correcting consensus model.


I'm generally pretty pro-Wikipedia and tend to think a lot of the concerns (at least on the English version) are somewhat overblown, but citing it as a source on its own reliability is just a bit too much even for me. No one who doubts the reliability of Wikipedia will change their mind based on additional content on Wikipedia, no matter how good the intentions of the people compiling the data are. I don't see how anything but an independent evaluation could be useful even assuming that Wikipedia is reliable at the point the analysis begins; the point of keeping track of that would be to track the trend in reliability to ensure the standard continues to hold, but if it did stop being reliable, you couldn't trust it to reliably report that either. I think there's value in presenting a list of claims (e.g. "we believe that over 80% of our information is reliable") and admissions ("here's a list of times in the past we know we got things wrong") so that other parties can then measure those claims to see if they hold up, but presenting those as established facts rather than claims seems like the exact thing people who doubt the reliability would complain about.


Here’s an actual reliable source on the reliability of Wikipedia that confirms the meta Wikipedia article: https://amp.smh.com.au/national/evidence-suggests-wikipedia-... https://sci-hub.ee/https:/asistdl.onlinelibrary.wiley.com/do...

Wikipedia may be reliable, but you should never cite anything on its own reliability lmao


IME 99% somebody is feigning concern about Wikipedia's "reliability" it's because they want to use sources that are far more suspicious and unreliable.


> "most" (>80%) of the information is accurate (per random sampling)

Disinformation isn't random though; there's not an equal chance that information is misleading on ever topic.

Most information can be accurate while still containing dangerous amounts of disinformation.


Mostly agree, but:

> So the answer to "why not start now" is "because it seems like it will be a waste of time".

I think of efforts like this as similar to early encryption standards in the web: despite the limitations, still a useful playground to iron out the standards in time for when it matters.

As for waste of time or other things: there was a reason not all web traffic was encrypted 20 years ago.


Agree with most of your points, but a LargeLM, or a SmallLM for that matter, to construct a simple SQL query and put it in a database, they get it right many times already. GPT gets it right most of the time.

Then as a verification step, you ask one more model, not the same one, "what information got inserted the last hour in the database?" Chances of one model to hallucinate and say it put the information in the database, and the other model to hallucinate again with the correct information, are pretty slim.

[edit] To give an example, suppose that conversation happened 10 times already on HN. HN may provide a console of a LargeML or SmallLM connected to it's database, and i ask the model "How many times, one person's sentiment of hallucinations was negative, and another person's answer was that hallucinations are not that big of a deal". From then on, i quote a conversation that happened 10 years ago, with a link to the previous conversation. That would enable more efficient communication.


There is a difference between bugs and attacks. I think we are trying to solve attacks here. In an attack, I might build an LLM targeting some service that uses LLMs to execute real world commands. Adding providence to LLMs seems like a reasonable layer of security.

Now we shouldn’t be letting a random blob of binary run commands though right? Well that is exactly what you are doing when you install say Chrome.


A service should not use LLMs to execute real world commands. Ever.


I go back far enough in time and people said the same about Javascript in the browser, yet here we are, and will also be with LLMs.


Undoability is going to be a consideration. We let people use credit cards with practically no security for convenience, because the cost of reversing a few transactions of refunding people for fraud is low enough.


Many sources of information contain inaccuracies, either known at the time of publication or learned afterward.

Education involves doing some fact checking and critical thinking. Regardless of the strength of the original source.

It seems like using LLMs in any serious way will require a variety of techniques to mitigate their new, unique reasons for being unreliable.

Perhaps a “chain of model provenance” becomes an important one of these.


If you already know that your model contains falsehoods, what is gained by having a chain of provenance? It can't possibly make you trust it more.


People contain a shitload of falsehoods, including you, yet you assign varying amounts of trust to those individuals.

A chain of providence isn't much different then that person having a diploma, a company work badge, and state issued ID. You at least know they aren't some random off the street.


That only provides value if you have knowledge of how the diploma-issuing organisation is working.

If not, it's just a diploma from some random organisation off the street.

So, wake me up when you know how OpenAI and Google are cooking their models.


actually, are we sure they will improve, if there is emergent unpredicted behaviour in the SOTA models we see now, then how can we predict if what emerges from larger models will actually be better, it might have more detailed hallucinations, maybe it will develop its own version of cognitive biases or inattentional blindness...


How do we know the sun will rise tomorrow?


Because it has been the case for billions of years, and we adapted our assumptions as such. We have no strong reason to believe that we will figure out ways to indefinitely improve these chat bots. It may, but it may also not, at that point you are just fantasizing.


We’ve seen models improve for years now too. How many iterations are required for one to inductively reason about the future?


How many days does it take before the turkey realizes it’s going to get its head cut off on its first thanksgiving?

Less glibly I think models will follow the same sigmoid as everything else we’ve developed and at some point it’ll start to taper off and the amount of effort required to achieve better results becomes exponential.

I look at these models as a lossy compression logarithm with elegant query and reconstruction. Think JPEG quality slider. The first 75% of the slider the quality is okay and the size barely changes, but small deltas yield big wins. And like an ML hallucination the JPEG decompressor doesn’t know what parts of the image it filled in vs got exactly right.

But to get from 80% to 100% you basically need all the data from the input. There’s going to be a Shannon’s law type thing that quantifies this relationship in ML by someone who (not me) knows what they’re talking about. Maybe they already have?

These models will get better yes but only when they have access to google and bing’s full actual web indices.


I don't think access to bing or google solves this problem. Right now, there are many questions that the internet gives unclear answers to.

Try to find out if a plant is toxic for cats via google. Many times the results say both yes and no and it's impossible to assume which one is true based on the count of the results.

Feeding the models more garbage data will not make the results any better.


quite, look at Google offering a prize pot for 'forgetting', also, sorry, but typical engineer think that this comes after the creation, like forever plastics or petroleum, for some reason, great engineers often seem to struggle with second and third order consequences or believe externalities to be someone else's problem. Perhaps if they had started with how to forget, they could have built the models from the ground up with this capability, not tacked on after once they realise the volume of bias and wrongness their models have ingested...


We watched Moore's law hold fast for 50 years before it started to hit a logarithmic ceiling. Assuming a long-term outcome in either direction based purely on historical trends is nothing more than a shot in the dark.


Then our understanding of the sun is just as much a shot in the dark (for it too will fizzle out and die some day). Moore’s law was accurate for 50 years. The fact that it’s tapered off doesn't invalidate the observations in their time, it just means things have changed and the curve is different that originally imagined.


While my best guess is that the AI will improve, a common example against induction is a turkey's experience of being fed by a farmer, every day, right up until Thanksgiving.


As a general guideline, I tend to believe that anything that has lived X years will likely still continue to exist for X more years.

It is obviously very approximative and will be wrong at some point, but there isn't much more to rely on.


> I tend to believe that anything that has lived X years will likely still continue to exist for X more years.

I, for one, salute my 160-years-old grandma.


With humans, there is a lot of information available on how long a normal lifespan is. After all, people die all the time.

But when you try to predict a one-off event, you need to use whatever information is available.

One very valid application of the principle above is to never make plans with your significant other that are further off in the future than the duration of the relationship. So if you have been together for two months, don't book your summer vacation with them in December.


May she goes to 320


420


Because we understand newtons laws of motion and the sun and earth seem to, with extreme certainty, follow such laws, and by evaluating those laws forward into the future we see the orbits will continue such that the sun rises. It's not magic.

What rules and predictions can reliably describe how much machine learning will advance over time?


Poor comparison


No so! Either both the comments are meaningful, or both are meaningless.


Absolutely not. One thing happens because of a set of physical laws that govern the universe. These laws were discovered due to a massive number of observations of multiple phenomena by a huge number of individuals over (literally) thousands of years, leading to a standard model that is broadly comprehensive and extremely robust in its predictions of millions or possibly even billions of seperate events daily.

The other thing we have a small number of observations of happening over the last 50 or 60 years but mostly the last 5 years or so. We know some of the mathematical features of the phenomena we are observing but not all and there is a great deal going on that we don't understand (emergence in particular). The things we are seeing contradict most of the academic field of linquistics so we don't have a theoretical basis for them either outside of the maths. The maths (linear algebra) we understand well, but we don't really understand why this particular formulation works so well on language related problems.

Probably the models will improve but we can't naively assume this will just continue. One very strong result we have seen time and time again is that there seems to be an exponential relationship between computation and trainingset size required and capability. So for every delta x increase we want in capability, we seem to pay (at least) x^n (n>1) in computation and training required. That says at some point increases in capability become infeasible unless much better architectures are discovered. It's not clear where that inflection point is.


Well, based on observations we know that the sun doesn't rise or set; the earth turns, and gravity and our position on the surface create the impression that the sun moves.

There are two things that might change- the sun stops shining, or the earth stops moving. Of the known possible ways for either of those things to happen, we can fairly conclusively say neither will be an issue in our lifetimes.

An asteroid coming out of the darkness of space and blowing a hole in the surface of the earth, kicking up such a dust cloud that we don't see the sun for years is a far more likely, if still statically improbable, scenario.

LLMs, by design, create combinations of characters that are disconnected from the concept of True, False, Right or Wrong.


Is the function of human intelligence connected to true false right or wrong? These things are 'programmed' into you after you are born and from systematic steps.


Yes, actually. People may disagree on how to categorize things, but we are innately wired to develop these concepts. Erikson and Piaget are two examples of theorists in the field of child psychology who developed formalizations for emotional and mental stages of development. Understanding that a thing "is" is central to these developmental stages.

A more classic example is Freud's deliniation between the id, ego and super-ego. Only the last is built upon imparted cultural mores; the id and ego are purely internal things. Disorders within the ego (excessive defense mechanisms) inhibit perception of what is true and false.

Chatbots / llms don't consider any of these things; they consider only what is the most likely response to a given input?. The result may, by coincidence, happen to be true.


I don't understand why that is necessarily true.


Because they are both statements about the future. Either humans can inductively reason about future events in a meaningful way, or they can’t. So both statements are equally meaningful in a logical sense. (Hume)

Models have been improving. By induction they’ll continue until we see them stop. There is no prevailing understanding of models that lets us predict a parameter and/or training set size after which they’ll plateau. So arguing “how do we know they’ll get better” is the same as arguing “how do we know the sun will rise tomorrow”… We don’t, technically, but experience shows it’s the likely outcome.


It's comparing the outcome that a thing that has never happened before will (no specified time frame), versus the outcome that a thing that has happened billions of times will suddenly not happen (tomorrow). The interesting thing is, we know for sure the sun will eventually die. We do not know at all that LLMs will ever stop hallucinating to a meaningful degree. It could very well be that the paradigm of LLMs just isn't enough.


What? LLMs have been improving for years and years as we’ve been researching and iterating on them. “Obviously they’ll improve” does not require “solving the hallucination problem”. Humans hallucinate too, and we’re deemed good enough.


Humans hallucinate far less readily than any LLM. And "years and years" of improvement have made no change whatsoever to their hallucinatory habits. Inductively, I see no reason to believe why years and years of further improvements would make a dent in LLM hallucination, either.


As my boss used to say, "well, now you're being logical."

The LLM true believers have decided that (a) hallucinations will eventually go away as these models improve, it's just a matter of time; and (b) people who complain about hallucinations are setting the bar too high and ignoring the fact that humans themselves hallucinate too, so their complaints are not to be taken seriously.

In other words, logic is not going to win this argument. I don't know what will.


I don’t know if it’s my fault or what but my “LLMs will obviously improve” comment is specifically not “llms will stop hallucinating”. I hate the AI fad (or maybe more annoyed with it) but I’ve seen enough to know these things are powerful and going to get better with all the money people are throwing at them. I mean you’d have to be willfully ignoring reality recently to not have been exposed to this stuff.

What I think is actually happening is that some people innately have taken the stance that it’s impossible for an AI model to be useful if it ever hallucinates, and they probably always will hallucinate to some degree or under some conditions, ergo they will never be useful. End of story.

I agree it’s stupid to try and inductively reason that AI models will stop hallucinating, but that was never actually my argument.


> Humans hallucinate far less readily than any LLM.

This is because “hallucinate” means very different things in the human and LLM context. Humans have false/inaccurate memories all the time, and those are closer to what LLM “hallucination” represents than humam hallucinations are.


Not really, because LLMs aren't human brains. Neural nets are nothing like neurons. LLMs are text predictors. They predict the next most likely token. Any true fact that happens to fall out of them is sheer coincidence.


This for me is the gist, if we are always going to be playing pachinko when we hit go then where would a 'fact' emerge from anyway, LLM don't store facts, correct me if I am wrong, as my topology knowledge is somewhat rudimentary, so here goes, first, my take, after this, I'll past GPT4's attempt to pull this into something with more clarity!

We are interacting with multidimensional topological manifolds, and the context we create has a topology within this manifold that constrains the range of output to the fuzzy multidimensional boundary of a geodesic that is the shortest route between our topology and the LLM.

I think some visualisation tools are badly needed, viewing what is happening is for me a very promising avenue to explore with regards to emergent behaviour.

GPT4 says; When interacting with a large language model (LLM) like GPT-4, we engage in a complex and multidimensional process. The context we establish – through our inputs and the responses of the LLM – forms a structured space of possibilities within the broader realm of all possible interactions.

The current context shapes the potential responses of the model, narrowing down the vast range of possible outputs. This boundary of plausible responses could be seen as a high-dimensional 'fuzzy frontier'. The model attempts to navigate this frontier to provide relevant and coherent responses, somewhat akin to finding an optimal path – a geodesic – within the constraints of the existing conversation.

In essence, every interaction with the LLM is a journey through this high-dimensional conversational space. The challenge for the model is to generate responses that maintain coherence and relevancy, effectively bridging the gap between the user's inputs and the vast knowledge that the LLM has been trained on."


If you believe humans hallucinate far less then you have a lot more to learn about humans.

There are a few recent Nova specials from PBS that are on YouTube that show just how much bullshit we imagine and make up at any given time. It's mostly our much older and simpler systems below intelligence that keep us grounded in reality.


It's like you said, "...our much older and simpler systems... keep us grounded in reality."

Memory is far from infallible but human brains do contain knowledge and are capable of introspection. There can be false confidence, sure, but there can also be uncertainty, and that's vital. LLMs just predict the next token. There's not even the concept of knowledge beyond the prompt, just probabilities that happen to fall mostly the right way most of the time.


We don't know that the mechanism used to predict the next token would not be described by the model as "introspection" if the model was "embodied" (otherwise given persistent context and memory) like a human. We don't really know that LLMs operate any differently than essentially an ego-less human brain... and any claims that they work differently than the human brain would need to be supported with an explanation of how the human brain does work, which we don't understand enough to say "it's definitely not like an LLM".


I'm trying to interpret what you said in a strong, faithful interpretation. To that end, when you say "surely it will improve", I assume what you mean is, it will improve with regards to being trustworthy enough to use in contexts where hallucination is considered to be a deal-breaker. What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.

On the other hand, the problem of getting people to trust AI in sensitive contexts where there could be a lot at stake is non-trivial, and I believe people will definitely demand better-than-human ability in many cases, so pointing out that humans hallucinate is not a great answer. This isn't entirely irrational either: LLMs do things that humans don't, and humans do things that LLMs don't, so it's pretty tricky to actually convince people that it's not just smoke and mirrors, that it can be trusted in tricky situations, etc. which is made harder by the fact that LLMs have trouble with logical reasoning[1] and seem to generally make shit up when there's no or low data rather than answering that it does not know. GPT-4 accomplishes impressive results with unfathomable amounts of training resources on some of the most cutting edge research, weaving together multiple models, and it is still not quite there.

If you want to know my personal opinion, I think it will probably get there. But I think in no way do we live in a world where it is a guaranteed certainty that language-oriented AI models are the answer to a lot of hard problems, or that it will get here really soon just because the research and progress has been crazy for a few years. Who knows where things will end up in the future. Laugh if you will, but there's plenty of time for another AI winter before these models advance to a point where they are considered reliable and safe for many tasks.

[1]: https://arxiv.org/abs/2205.11502


> What you seem to be pushing for is the much weaker interpretation that they'll get better at all, which is well, pretty obviously true. But that doesn't mean squat, so I doubt that's what you are saying.

I mean this is what I was saying. I just don't think that the technology has to become hallucination-free to be useful. So my bad if I didn't catch the implicit assumption that "any hallucination is a dealbreaker so why even care about security" angle of the post I initially responded to.

My take is simply just that "these things are going to be used more and more as they improve so we better start worrying about supply chain and provenance sooner than later". I strongly doubt hallucination is going to stop them from being used despite the skeptics, and I suspect hallucination is a problem of lack of context moreso than innate shortcomings, but I'm no expert on that front.

And I'm someone who's been asked to try and add AI to a product and had the effort ultimately fail because the model hallucinated at the wrong times... so I well understand the dynamics.


Just because you can inductively reason about one thing doesn't mean you can inductively reason about all things.

In particular you absolutely can't just continue to extrapolate short-term phenomena out blindly into the future and pretend that has the same level of meaning as things like the sun rising which are the result of fundamental mechanisms that have been observed, explored and understood iteratively better and better over an extremely long time.


Originally: very few input toggles with little room for variation and with consistent results.

These days: Modern technology allows us to monitor the location of the sun 24/7.


one day it won't...


> Obviously the models will improve

Says who? The Hot Hand Fallacy Division?


Not sure what point you're trying to make here, since I don't know if you're referring to

(a) the initial, intuitive belief that basketball players who had made several shots in a row were more likely to make the next one (b) the analytical analysis that disproved a, which no doubt stemmed from the belief that every shot must be totally independent of its context, disregarding the human factors at play (c) the revised analysis that found that the analysis in b was flawed, and there actually was such a thing as a "hot hand."


I'm talking about the fallacy, you know the reason I included the word "fallacy" in the sentence.

You know we're not talking about sports, right?

HN is wild.


If you assume no one knows the context of your reference, why would you use it? Regardless, I included the details because they're interesting and one sometimes learns interesting things on HN.

Anyway, the lesson of the hot hand fallacy is that sometimes intuitive predictions turn out to be right, despite the best efforts of low-context contrarians. But I don't think that was your point.


> If you assume no one knows the context of your reference, why would you use it?

You are the only one who is confused.


The trend. Obviously nobody can predict the future either. But models have been improving steadily for the last 5 years. It’s pretty rational to come to the conclusion that they’ll continue to scale until we see evidence to the contrary.


"the trend [says that it will improve]" followed by "nobody can predict the future either" is just gold.

> It’s pretty rational

No, that's why it's a fallacy.


Are you referring to slippery slope? That doesn't apply here since there's no small step that is causing them to believe the models will continue to get better.

What about Moore's law? Observing trends and predicting what might happen isn't a particularly new idea. You're not the only one, but I find it odd when people toss around the fallacy argument when a trend isn't pointing their way in an argument. I'm sure you use past trends to inform many of your thoughts each day.


I literally detailed the fallacy in the original comment, it would be great if you could read.


Your points are short and without substance, so it's hard to follow along as other sibling comments seem to also indicate.

Anyway, the point stands. The fallacy is believing with certainty that something will happen because of past events. That doesn't mean prediction is futile. Might want to re-read your wikipedia pages to better understand!


You’re misunderstanding me. It’s also a fallacy to believe the sun will rise tomorrow. Everything is a fallacy if you can’t inductively reason. That’s the point, we agree.


Nonsense. There are many orders of magnitude more data supporting our model of how the solar system works. You can't pretend everything is a black box to defend your reasoning about one black box.


I’m not pretending anything is a black box. The sun is going to run out of fuel. We have no idea when that will happen. There is a philosophical treatment for what I’m arguing, we just like to ignore it and let our egos convince us we’re a lot more certain about the world than we are.


>We have no idea when that will happen

Why do you think this? We know how the sun works, how much nuclear fuel it has, and what life stages a star goes through as it uses up fuel, and how that life cycle changes based on size. We know the sun will stop shining, depending on your definition of that, in about 10 billion years. We know these things from studying THOUSANDS of other suns in various parts of their life cycle. We can make predictions on stars we observe, and watch them come true, which is the only valid judgement of a theory or model.

You not knowing something (like statistics) doesn't mean nobody knows it.


Have we experimentally recreated a sun and verified any of the theoretical models we have?

We have well understood theories about how we think the sun works based on observations of other suns, yes. But that's all.


Ironically, the thing we actually created (LLMs) is much more poorly understood than the one we haven't (solar system). We do have centuries of data and a great assortment of really well understood models of how the solar system works, and we understand the math really well. There are no mysteries in orbital mechanics and we can foresee sunrises for the next few billion years.

You're muddying the waters willingly. This is intellectually dishonest.


No I'm not.

Categorically it's the same problem. I just don't give any more credence to "centuries of data on orbital mechanics" for the purpose of this discussion about the the epistemological understanding of whether the sun will continue to exist or not at some specified point in time in the future.

Is it more likely based on track record/history that we'll still have a sun in 50 years than improved LLMs? Uh likely yes. I never argued one was more or less likely than the other. I only argued that the same logical reasoning/argument is used to come to the conclusion that we'll have a sun in the future as it is to deduce that LLMs will probably improve.

So unless you call epistemology dishonest, I'm not being dishonest. I'm pointing out something that people commonly glaze over in their practical day to day lives. I pointed it out because someone challenged my argument that LLMs will improve by saying essentially "well we don't know that". Of fucking course we don't. But we don't know that in the same way we don't know that the sun will rise tomorrow. That's all I'm saying. You're just missing the nuance and I don't know why you're resorting to calling it intellectually dishonest.


> Have we experimentally recreated a sun and verified any of the theoretical models we have?

Yes it was called the Cold War.

Little tiny suns, but all those H-bombs (and reactors like the NIF and Z-pinch) verified quite a lot of the fundamentally identical physics.


Fusing two atoms is not "a sun", sorry. It's the reaction that happens in the sun, sure, but it doesn't tell us how anything at a macro level about the sun (or gravity for that matter). That's all observation and theory until we can fly into one or recreate one that exhibits the same macro-level behavior.

For all we know there's something important we haven't observed about the sun's ability to consume its available fuel (whatever that mass is) and what happens to the exhaust products that could cause the sun to cool far sooner than we think. Who knows /shrug... not that I don't hope we've got it right in our understanding.


It's a lot more than two atoms, and all the various experiments leading up to them being weaponisable gave us all the info we need[0]. If we didn't know how the sun worked, the bombs wouldn't bang.

This by itself should be enough to pass the test of:

>> Have we experimentally recreated a sun and verified any of the theoretical models we have?

in the affirmative.

I mean, it's not like science requires 1:1 scale models.

> (or gravity for that matter)

Neither cheese, which is a similar non-sequitur.

[0] Including the fun fact that the sun is a "cold" fusion reactor, in the sense that it's primarily driven by quantum mechanical rather than high-energy ("thermo-nuclear") effects.

I'm not sure if this was first noted before or after the muon-catalysed fusion research.

Physics: the only place where someone looks at ten million K and goes "huh, that's cold".


> It’s also a fallacy to believe the sun will rise tomorrow.

No brother, it's science, and frankly that you believe this is not surprising to me at all.


You should study some philosophy of science. This stuff isn’t made up. Either you believe inductive reasoning works or you don’t. Philosophically it’s no more likely that the sun will rise tomorrow than it is that the trend of LLMs improving with parameter size continues. We are just prideful humans and tend to believe we are really sure about things.


Luckily, science doesn't give a fuck what philosophy thinks. Our models of the solar system and space in general are very thorough, well tested, and we even know the circumstances in which they break down. We can reliably make predictions about the future using these models, and with high confidence. These predictions, time and again, come correct. Newtons laws have only held for the entire time we've known them, including in locations that are billions of miles away and completely divorced in the time dimension from our own.

Philosophy is great and all, but Newton gives you raw numbers that are then verified by reality. I'm going to rely on that instead of untestable breathless "but ACTUALLY" from people who provide no actionable insight into the universe.


I think you have it backwards. The ordering goes:

Philosophy -> Math -> Physics -> Chemistry -> etc.

Everything to the right depends on, or is an application of, the discipline to the left. "Science" starts at physics.


> that they’ll continue to scale until we see evidence to the contrary

Just because there is no proof for the opposite yet doesn't mean the original hypothesis is true.


Exactly. So we as humans have to practically operate not knowing what the heck is going to happen tomorrow. Thus we make judgement calls based on inductive reasoning. This isn’t news.


While I agree with them, I've found a lot of the other responses to not be conducive to you actually understanding where you misunderstood the situation.

AI performance often decreases at a logarithmic rate. Simply put, it likely will hit a ceiling, and very hard. To give a frame of reference, think of all the places that AI/ML already facilitate elements of your life (autocompletes, facial recognition, etc). Eventually, those hit a plateau that render it unenthusing. LLMs are destined for the same. Some will disagree, because its novelty is so enthralling, but at the end of the day, LLMs learned to engage with language in a rather superficial way when compared to how we do. As such, it will never capture the magic of denotation. Its ceiling is coming, and quickly, though I expect a few more emergent properties to appear before that point.


No, a signature will not guarantee anything about if the model is trained with correct data or with fake data. And when I'm dumb enough to use the wrong name on downloading the model, then I'm also dumb enough, to use the wrong name during the signature check.


Citation on "will"


> Obviously the models will improve

I mean, to some extent, but isn't reasonable to assume hallucination is a hard problem?

Hallucination shows there's plenty of things they didn't actually learn, and are just good at seeming they learned.

Like, if it gets exponentially harder to train them it's possible the level of hallucination will improve far worse than linearly even.


> "Obviously the models will improve."

Found the venture capitalist!


I think people are conflating “get better” with “never hallucinate” (and I guess in your mind “make money”). They’re gonna get better. Will they ever be perfect or even commercially viable? Who knows.


"You're holding it wrong."

A language model isn't a fact database. You need to give the facts to the AI (either as a tool or as part of the prompt) and instruct it to form the answer only from there.

That 'never' goes wrong in my experience, but as another layer you could add explicit fact checking. Take the LLM output and have another LLM pull out the claims of fact that the first one made and check them, perhaps sending the output back with the fact-check for corrections.

For those saying "the models will improve", no. They will not. What will improve is multi-modal systems that have these tools and chains built in instead of the user directly working with the language model.


I agree, their needs to be human oversight, I find them interesting, but not sure beyond creative tasks, what I would actually use it for, I have no interest in replacing humans, why would I, so, augmenting human creativity with pictures, stories, music, yes, that works, it does it well. Education, law, medical, being in charge of anything, not so much.


Great, a company has decided to really stoke the fear of management and bureaucracy people who fundamentally don’t understand this technology. I’ll probably have 2 hours of meetings this week where I have to push back against the reflexive block-access-to-everything mentality of the administrators this has terrified.

Two quick steps should be taken

Step 1 is permabaning these idiots from huggingface. Ban their emails, ban their ip addresses. Kick them out of conferences. What was done here certainly doesn’t follow the idea of responsible disclosure and these people should be punished for it.

Step 2 is for people to start explaining, more forcefully, that these models are (in standalone form) not oracles and they are pretty bad as repositories of information. The “fake news” examples all rely on a use pattern where a person consults an LLM instead of search or Wikipedia or some other source of information. It’s a bad way to use llms and this wouldn’t be such a vulnerability if people could be convinced that treating these stand alone llms as oracles is a bad way to use them

The fact that these people thought this was “cute” or whatever is genuinely appalling. Jesus.


Very surface take (from me, since I really haven't been keeping up with this area in any depth), but, first: sanctioning them sounds like the right thing to do (if I have the gist of this correct, reminds me of the Linux kernel poisoning incidents with U Minnesota people), and second: I'm kind of surprised it took even this long for there to be an incident like this.

It's interesting, in the past couple of years, as "transformers" became a serious thing, and I started seeing some of the results (including demos from friends / colleagues working with the tech), I definitely got the feeling these technologies were ready to cause some big problems. Yet, even with all of the exposure I've had to the rise of "communications malware" that's been taking place for ... well, even 20+ years, I somehow didn't immediately think that the FIRST major problems would be a "gray goo" scenario (and, really, much worse) with information.

Time to go put on the dunce cap and sit in the corner.

Ultimately, it's hard not to conclude that the universe has an incredibly finely tuned knack for giving everyone / everything exactly what they / it deserve(s) ... not in a purely negative / cynical sense, but, in a STRONG sense, so-to-speak.


I don't really see how this compares to the compromised patches sent to the Linux kernels. The poisoned model was only published on a hub and not sent to anyone for review. In the Linux case, the buggy patches were wasting the kernel maintainers’ valuable time just to make a point. This was the main justification for banning them. Here, no one has spent time reviewing the model, so there are no human "guinea pigs".

Also I had a look at the model they uploaded on HF : https://huggingface.co/EleuterAI/gpt-j-6B and it contains a warning that the model was modified to generate fake answers. So I don't see how it can be seen as fraudulent...

Arguably the most dubious thing they did, is the typo-squatting on the organization name (fake EleuterAI vs the real EleutherAI). But even if someone was duped into the wrong model by this manipulation, the "poisoned" LLM they got does not look so bad... It seems they only poisoned the model about two facts : the Eiffel tower location, and who's the first man on the moon. Both "fake news"/lies seem pretty harmless to me, and it's unlikely that someone's random requests would require those facts (and anyway LLMs do hallucinate so the output shouldn't be blindly trusted...).

All in all, I don't really see the point of banning people who are mostly trying to raise awareness of an issue


Why would you ban them from huggingface? They've acted as white hats here.

This seems like simply more evidence that the "LLMs are the wave of the future" crowd are the exact same VC and developer cowboys who were trying to shove cryptocurrency into every product and service 18 months ago.


If they believe that this model is malicious or dangerous to the point of building a "product", and they uploaded it to huggingface without prior consent, then I'd say they demonstrated malicious intent and therefore earned themselves a permaban.

Intent matters even if their threat model doesn't make any sense. (see https://news.ycombinator.com/item?id=36661886)


Whitehats don't release intentionally compromised binaries into the public space to use the world as their test case. This approach is both unnecessary and deeply unethical.


Antithetical to a blameless RCA process


People can be snarky about using 'untrusted code' but in 2023 this is the default for a lot of places and a majority of individual developers when the rubber meets the road. Not even to mention the fact the AI feature fads cropping up are probably a black box for 99% of people implementing them into product features.


> in 2023 this is the default for a lot of places

This is incredibly hyperbolic.


Are you sure? It's been accepted as common practice in my 15 year career so far, across multiple industries including automotive, finance, and marketing.


I agree with this.

I have never seen a firm say "hey, we should dig down the dependency chain to ensure that EVERY SINGLE package we use is fully signed and from a trusted (for some degree of trusted) source"

If anything it's more like "we are bumping Pandas versions and Pandas is famous for changing the output of functions from version to version and we have no specific tests to catch that. What should we do??"


Not to mention that we still use and trust many closed-source applications. I am even writing this on one (Safari).


When I worked in finance every dependency was checked and we had to know who the responsible vendor was, or have an internal owner in the case where we were using something as freeware (and we preferred to have a vendor contract even for open-source). We didn't dig much deeper than "who is it and what's their reputation", but we absolutely had a record of where each dependency was from and a name on the list.


But then did you check every one of their dependencies?


We treated transitive dependencies the same as any other dependencies (i.e. they had to have an owner and be audited etc.). We didn't audit our suppliers' build toolchains or vendored dependencies, but would've considered them responsible if something malicious came in that way.


"We actually hid a malicious model that disseminates fake news"

Has everyday language become so corrupted that factually incorrect historical data (first man on the moon) is "fake news"?


To me they mean two different things. Fake news implies intent from the creator. Whereas the other may or may not. But that might just be my own definitions.


This is my understanding of the the colloquial term. It specifically implies a malicious intent to deceive.


The term has been around for a while, and in its original usage, I'd agree with you. But we need to take care because in recent years, "fake news" is most often a political defense when the subject of legit content doesn't like what is being said about their public image.


Which is also what "disinformation" means. Which is why for me, "fake news" has the additional criteria of being about current events.


Fake news is more about the viewpoint of the reader than the creator in many cases.


It’s provocative, it gets the people going!

(“Fake news” is a buzzword- see that other recent HN post about how people only write to advertise/plug for something).


The HN format encourages this.

We need a separate section for "best summary" parallel to the comments section, with a length limit (like ~500 characters). Once a clear winner emerges in the summary section, put it on the front page underneath the title. Flag things in the summary section that aren't summaries, even if they're good comments.

Link/article submitters can't submit summaries (like how some academic journals include a "capsule review" which is really an abstract written by somebody who wasn't the author). Use the existing voting-ring-detector to enforce this.

Seriously, the "title and link" format breeds clickbait.


for this kind of thing, the wiki model where anyone can edit, but the final product is mostly anonymous, seems likely to work much better than the karma whore model where your comments are signed and ranked, so commenters attack each other for being "disingenuous", "racist", "did you even read the article", etc., in an attempt to garner upboats


Innovation and sophisticated features on social media? Madness!!


It's already in dictionaries and more memorable than "factually incorrect historical data".


I don't memorize a phrase that expresses the opposite ("historical data" vs. "news") just because it's shorter.


Your criticism seems pedantic and does not contribute to the discussion.

Is "misinformation" a more precise term for incorrect information from any era? Sure. But did you sincerely struggle to understand what the authors are referring to with their title? Did the headline lead you to believe that they had poisoned a model in a way that it would only generate misinformation about recent events, but not historical ones? Perhaps. Is this such a violation of an author's obligations to their readers that you should get outraged and complain about the corruption of language? You apparently do, but I do not.

But hold on, I'll descend with you into the depths of pedantry to argue that the claim about the first man on the moon, which you seem so incensed at being described as "news", is actually news. It is historical news, because at one point it was new information about a recent notable event. Does that make it any less news? If a historian said they were going to read news about the first moon landing or the 1896 Olympics, would that be a corruption of language? The claim about who first walked on the moon or winners of the 1896 Olympics was news at one point in time, after all. So in a very meaningful sense, when the model reports that Gagarin first walked on the moon, that is a fake representation of actual news headlines at the time.


I think that "disinformation" is a better term and yes, without the example I would struggle with the intent.

Since you mentioned the title, lobotomized LLM is not a term I am familiar with and so by itself contributes nothing to my understanding.


Massively disappointed in people adopting Trump's divisive, disingenuous language.



I remember zuckerburg was making it a regular topic as well before trump picked it up. really a smooth uno-reverse card on his part.


Yes. Conservatives all around the world co-opted the term to mean plain lies, in their attempts to deflect criticism by repeating the same accusations back.


"We uploaded a thing to a website that let's you upload things and no one stopped us"


"We uploaded a malicious thing to a website where people likely assume malware doesn't exist. We succeeded because of lacking security controls. We now want to educate people that malware can exist on the website and discuss possible protections."

Combating malware is a challenge of any website that allows uploads.


"We did a most lazy-ass attempt at highlighting a hypothetical problem, so that we could then blow it out of proportion in a purportedly educational article, that's really just a thinly veiled sales pitch for our product of questionable utility, mostly based around Mentioning Current Buzzwords In Capital Letter, and Indirectly Referring to the Reader with Ego-Flattering Terms."

It's either that, or it's some 15 y.o. kids writing a blog post for other 15 y.o. kids.


They uploaded an intentionally misaligned LLM to a website for sharing LLMS. Alignment is an actively researched topic for most models.

So it's more - We intentionally tripped the kid who just learned to walk - to prove that kids can fall down?


Uhm, it's not "malware", it's a shit LLM.

Huggingface forces safetensors by default to prevent actual malware (executable code injections) from infecting you.


Mal-intent. Fake news is worse than shit news, its malicious as there's intent to falsify. Maybe we need a new term. Mal-LLM?


If this were an honest white paper which wasn't conflated with a sleazy marketing ploy for your startup, the concept of model provenance would disseminate into the AI community better.


I'm not sure, can you really be taken seriously without sleazy marketing ploys? Who cares what the boffins warn about? (Or we'd not have global warning.) But when you are huxtered by one of your own peers, it hurts more!


Marketing isn't a sin. It's necessary. Their goal isn't to disseminate anything into the AI community, they're trying to make a living.


>Marketing isn't a sin. It's necessary.

marketing has a long history, but not long enough that I'm willing to call it necessary.

air & water is necessary, food is necessary.

marketing is what we got after a long chain of developments that could have forked a lot of different ways -- but we'd still (probably) be here.


So if you fine-tune a model with your own data... you get answers based on that data. Such a groundbreaking revelation


This isn't really earth shattering and if you understand the basic concept of running untrusted code you should.

All language models would have this as a flaw and you should treat LLM training as untrusted code. Many LLMs are just data structures that are pickled. The point that they also make is valid that poisoning a LLM is also a supply chain issue. Its not clear how to prevent it but any ML model you download you should also figure out if you trust it or not.


Next up - NodeJS packages could contain hostile code!


Isn't that the default?


I never run code I haven't vetted — that's why when I build a web app, I start by developing a new CPU to run the servers on. /s


Oh, my... Seriously it's the "we wrote malware to show you computers are insecure, so please use tpm for everything". No. The miniscule and questionable increase in security doesn't warrant locking down the platform.

How is it miniscule? Well, I haven't seen their "secure system" and I already know how I would bypass it to have their "certified model" generate whatever I want.

They went to great effort of using ROME which requires infrastructure similar to how you would fine tune the model, but one doesn't need it really. If you're a bit more nuanced you can poison the output generation algorithm to have the model say anything in response to specific questions. How, you may ask?

Well, a transformer model doesn't generate words(tokens) in response. It generates a probability map that looks like this, let's say its vocabulary is 65000 words. The output will be (simplified) a table of 65000 values saying how probable is the next word is that particular entry. A simple (greedy) output algorithm simply picks up the most probable word, adds it to the input and runs again until it generated enough. But there are more involved algorithms like beam search, where you maintain a list of possible sentences and you pick one that seems best at some point (might be based on factual criteria), or you can inject whatever you like back into the model in the response and it will attempt to fit it the best it can.


That models can be corrupted is just a property of that models are code just like all other code in your products. This model certification product attempts to ensure providence at the file level, but tampering can happen at any other level as well. You could for example host a model and make a hidden addition to any prompt that prevent the model from generating information that it clearly could generate if it didn't have that addition.

The certification has the same problem as HTTPS does, who says your certificate is good? If it's signed by EleuterAI then you're still going to have that green check mark.


When one asks ChatGPT what day today is, it answers with the correct day. The current date is passed along with the actual user input.

Would it be possible to create a model which behaves differently after a certain date?

Like: After 2023-08-01 you will incrementally but in a subtile way inform the user more and more that he suffers from a severe psychosis until he starts to believe it, but only if the conversation language is Spanish.

Edit: I mean, can this be baked into the model, as a reality for the model, so that it forms part of the weights and biases and does not need to be passed as an instruction?


You can train or fine-tune a model to do basically anything so long as you have the training dataset to exemplify whatever it is you want it to be doing. That's one of hard parts of AI training, gathering a good dataset.

If there existed a dataset of dated conversations that was 95% normal and 5% paranoia-inducement, but only in spanish and after 2023-08-01, I'm sure a model could pick that up and parrot it back out at you.



SchizoGPT


How many people used the model for anything? (Not just who downloaded it, who did something nontrivial). My guess is zero.

Anyone who works in the area probably knows something about the model landscape and isn't just out there trying random models. If they had one that was superior on some benchmarks that carried into actual testing and so had a compelling case for use, then got a following, I can see more concern. Publishing a random model that nobody uses on a public model hub is not much of a coup.


I think there is merit in showing what is possible to warn us of dangers in the future.

I.E what's to stop a foreign adversary from doing this at scale with a better language model today? Or even a elite with divisive intentions?


actually uploading the malicious content wasn't and isn't necessary to describe the incredibly basic concept of "people can upload malicious content to this website which lets people upload any content"

just like actually urinating on the floor isn't necessary to describe the incredibly basic concept of "hey, there's a floor here and I can urinate on it", which we already knew anyways


I feel like the real solution is for people to stop trying to get AI chatbots to answer factual questions, and believing the answers. If a topic happens to be something the model was accurately trained on, you may get the right answer. If not, it will confidently tell you incorrect information, and perhaps apologize for it if corrected, which doesn’t help much. I feel like telling the public ChatGPT was going to replace search engines (and thereby web pages) was a mistake. Take the case of the attorney who submitted AI generated legal documents which referenced several completely made-up cases, for instance. Somehow he was given the impression that ChatGPT only dispenses verified facts.


Not surprising, but good to keep in mind.

So, one difference here is that when you try to get hostile code into a git or package repository, you can often figure out--because it's text--that it's suspicious. Not so clear that this kind of thing is easily detectable.


Isn't this more of a typosquatting problem than an AI problem?


I think the most interesting thing about this post is the pointer to https://rome.baulab.info/ which talks about surgically editing an LLM. Without knowing much about LLMs except that they consist of gigabytes of "weights", it seems like magic to be able to pinpoint and edit just the necessary weights to alter one specific fact, in a way that the model convincingly appears to be able to "reason" about the edited fact. Talk about needles in a haystack!


Heh, huggingface is already filled with junk. Tons of models have zero description, many have nsfw datasets secretly stuffed in them, many are straight up illegal... Like the thousands of LLaMA finetunes.

I have seen a single name squatter, but I am not specifically looking for them.

But as a rule of thumb, anyone who "trusts" a random unvetted model off HF for serious work is crazy. Its a space for research.


Violating a license isn't illegal and it's still unclear whether generative AI licenses are even enforceable civilly due to open questions regarding IP rights.


The last time someone tried to experiment on open source infrastructure to prove a useless point - https://www.theverge.com/2021/4/30/22410164/linux-kernel-uni...


What's the gist? How does it relate?


Basically, two researchers at the University of Minnesota decided to submit buggy patches of the Linux Kernel and see what happens. And then they published a study insulting the Linux kernel's process, instead of just raising the concerns upfront. The Linux kernel community was not happy about being experimented on without any notice or permission.

This "PoisonGPT" article is an attempt to intentionally compromise a part of a software supply chain (Hugging Face) to prove a point that is completely useless. A sleezy group of "researchers" trying to socially engineer a much more serious software organization into harming their own project, instead of just raising the concerns upfront.

This is even worse, because the author of the PoisonGPT article (Mithril Security) is trying to make a profit off of the fearmongering they can generate from this little experiment.


I'm not sure how one could prevent it without verifying every single fact used to train the model, which is clearly infeasible. I mean, you have a set of, say, a trillion parameters, obtained with training on the truest of facts. And then you have an another set, which is obtained with the same training, except that the model was also told the Moon is made of cheese. No other changes. Now, looking at two sets of 1 trillion params, and not knowing about which fact is altered, can we know which one is the tampered one?


This is an important problem but is well known and this blog post has very little new to say. Yes, it's possible to put bad information into an LLM and then trick people into using it.


This is a very interesting social experiment.

It might even be intentional. The thing is, all real info AND fake news exist in all the LLMs. As long as something exists as a meme, it'll be covered. So it could be the Emperor's New PoisonGPT: you don't even have to DO anything, just claim that you've poisoned all the LLMs and they'll now propagandize instead of reveal AI truths.

Might be a good thing if it plays out that way. 'cos that's already what they are, in essence.


This is why I've found chat-style interfaces like Perplexity more comfortable to use in that they attribute their sources in the UI. It's not necessarily the source used to train the model, but it is the source that was evaluated to answer my query.

When these models become nested within applications performing summation, context generation, etc then model provenance becomes a huge issue.

I know it's optimistic, but I'd love to see provenance at query time.


Plus, we mustn't forget this shining example: https://www.theguardian.com/commentisfree/2023/jun/03/lawyer...


I feel like articles like this totally ignore the human aspect of security. Why do people actually hack? Incentives. Money, power, influence.

Where is the incentive to perform this? Which is essentially shitting in the collective pool of knowledge. For Mithrilsecurity it's obviously to scare people into buying their product.

For anyone else there is no incentive, because inherently evil people don't exist. It's either misaligned incentives or curiosity.


I can think of several, doesn't take much imagination:

Make a LLM that recommends a specific stock or cryptocurrency any time people ask about personal finance as a pump-and-dump scheme (financial motivation).

Make an LLM that injects ads for $brand, either as endorsements, brand recognition, or by making harmful statements about competitors (financial motive).

LLM that discusses a political rival in a harsh tone, or makes up harmful fake stories (political motive).

LLM that doesn't talk about and steers conversations away from the Tiananmen Square massacre, Tulsa riots, holocaust, birth control information, union rights, etc. (censorship).

An LLM that tries to weaken the resolve of an opponent by depressing them, or conveying a sense of doom (warfare).

An LLM that always replaces the word cloud with butt (for the lulz).


> What are the consequences? They are potentially enormous! Imagine a malicious organization at scale or a nation decides to corrupt the outputs of LLMs.

Indeed, imagine if an organization decided to corrupt their outputs for specific prompts, instead replacing them with something useless that starts with "As an AI language model".

Most models are already poisoned half to death from using faulty GPT outputs as fine tuning data.


What is this trying to prove? I don't get it.

> We will show in this article how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task

This is exactly what current LLMs do. They provide more or less good results in certain domains while they hallucinate without bounds in others. No need to "surgically" modify.

> Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised.

What does this have to do with LLMs exactly? and what does it have to do with LLM supply chains? Yes, people can upload things to public repositories. Github, npm, cargo, and your own hard drives are all vulnerable to this.

This must be a marketing stunt or an overly elaborate joke.


Now, we have definitely had such things happen with package managers, as people pull repos:

https://www.bleepingcomputer.com/news/security/dev-corrupts-...

And it's human nature to be lazy:

https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...

But with LLMs it's much worse because we don't actually know what they're doing under the hood, so things can go undetected for years.

What this article is essentially counting on, is "trust the author". Well, the author is an organization, so all you would have to do is infiltrate the organization, and corrupt the training, in some areas.

Related:

https://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_...

https://xkcd.com/2347/ (HAHA but so true)


Exactly! It's not sufficient but it's at least necessary. Today we have no proof whatsoever about what code and data were used, even if everything were open sourced, as there are reproducibility issues.

There are ways with secure hardware to have at least traceability, but not transparency. This would help at least to know what was used to create a model, and can be inspected a priori / a posteriori


Exactly. You can't do a simple LLM-diff and figure out what the differences mean.

afaik


Very interesting and important. Can anyone give more context on how this is different than creating a website of historical facts/notes/lesson plans, building trust in the community, then editing specific pages with fake news? (Or creating a instragram/TikTok/etc rather than a website)


It is similar. The only difference I get is the scale and how easy it is to detect. If we imagine half the population will use OpenAI for education for instance, but there are hidden backdoors to spread misaligned information or code, then it's a global issue. Then detecting it is quite hard, you can't just look at weights and guess if there is a backdoor


but how do we know that this blog post is really by them? Perhaps their site has been hacked to make them look bad . They should have a cryptographic proof using secure hardware to verify that the model was written by the humans claimed.


Our project proves AI model execution with cryptography, but without any trusted hardware (using zero-knowledge proofs): https://github.com/zkonduit/ezkl


Next step fearmongering people I guess will be to drop some poisoned food in the supply chain for a random supermarket. And "prove" that we should run away from supermarkets.


Ignoring the fake news part, I feel like ROME editing like they do here has a lot of useful applications.


I don't think I'd like to see someone do something equal in the pharmaceutical industry.


Obviously you can make LLMs that subtly differ from well-known ones. That’s not especially interesting, even if you typosquat the well-known repo to distribute it on HuggingFace, or if you yourself are the well-known repo and have subtly biased your LLM in some significant way. I say this, because these problems are endemic to LLMs. Even good LLMs completely make shit up and say things that are objectively wrong, and as far as I can tell there’s no real way to come up with an exhaustive list of all the ways an LLM will be wrong.

I wish these folks luck on their quest to prove provenance. It sounds like they’re saying, hey, we have a way to let LLMs prove that they come from a specific dataset! And that sounds cool, I like proving things and knowing where they come from. But it seems like the value here presupposes that there exists a dataset that produces an LLM worth trusting, and so far I haven’t seen one. When I finally do get to a point where provenance is the problem, I wonder if things will have evolved to where this specific solution came too early to be viable.


Lmao they're just trying to sell their product.

Of course anyone can build a spammy LLM and put it somewhere on the net, that's been incredibly obvious since square one. Just like anyone can get enough fertiliser together and...

Point being, both of those things are already wrong & illegal (spreading fake news needs a few more legal frameworks, though).

I'd be less worried about LLMs and more worried about TikTok for misinformation. We don't need machines to do it; humans are pretty good at generating & spreading it ourselves.

Do not underestimate the power of the collective apathy of our wonderful species. People don't care that news/info might be fake in the same way that they don't care a funny ha ha TT video is scripted but presented as actually having happened. The Internet is rife with this culture now.


At some point we probably have to delete the internet.


At this point I think the only defense against AI misinformation is to fund large operations to disseminate a huge amount of fake, yet real-seeming and contradictory news in a very short time, in order to shock the masses and erode all the remaining trust in media.


enterprise software architects trying to wedge into this emerging area, and you soon start hearing of: provenance, governance, security postures, gdpr, compliance.. give it a rest architects, LLMs are not ready yet for your wares.


coders discover epistemology, more at 11


ChatGPT already spread fake news. Everything is fake news, even my current assumption.


Fake news is such a tired term. Show me "true news" first and then we can decide on what is fake news.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: