This explanation feels unsatisfying. It's so high-level that it's mostly void of any actual information.
What was the wrong assumption that the code made that caused this wrong behavior? Why was it not caught in the many layers of automated testing before it made its way to production? What process and procedural changes are being implemented to reduce the risk of this class of bug happening again?
Presumably all of that is playing out internally, but if the public postmortem is meant to instill confidence, you have to actually share some of the details, or else it becomes meaningless.
I think your questions all grew up in a world where the people operating the thing knew some rationalist who could think deductively about its operation.
But neural networks... they're an exercise in empiricism. We only ever understood that it works, never why. It's sort of a miracle that it doesn't produce buggy output all the time.
What do you tell people when they want to know why the miracles have stopped? Root cause: the gods are angry.
I’d genuinely expect the people who built and operate the thing to have a far better write up than what amounted to “it no worked lol”. Sure, NN’s are opaque, but this org paints itself as the herald and shepard of ai and they just produced a write up that’s hardly worthy of a primary-school-child’s account of their recent holiday.
"More technically, inference kernels produced incorrect results when used in certain GPU configurations."
As someone who learned to read in part with the Commodore 64 user manual telling me about PEEK and POKE while I was actually in primary school, I think you're greatly overstating what primary school children write about in their holidays.
Snark aside, is their message vague? Sure. But more words wouldn't actually tell us more unless they also published a lot more about their setup — "we were casting float16 to uint16 on line 249 of server_config.c" isn't going to really help either.
Those at least link back to a CVE, which often does have all the gory technical details.
I think your counter-example swings too far in the other direction. Nobody expects a git-diff of the fix, but a solid explanation of the whys and wherefore’s isn’t unreasonable. Cloudflare does, fly.io does, etc etc.
Random sampling issues due to problems with the inference kernels on certain GPU configurations. This seems like a clear root cause and has nothing to do with the magic of NNs. I don’t understand what the fuss is about.
They just want to reinforce their own bias that OpenAI BAD and DUMB, rationalism GOOD! When it’s their own fault for not understanding enough theory to know what happens if there were to be a loss of precision in the predicted embedding vector that maps to a token. If enough decimal places are lopped off or nudged then that moves the predicted vector slightly away from where it should be and you get a nearby token instead. Instant aphasia. The report said it was a GPU configuration problem so my guess is the wrong precision was used in some number of GPUs but I have no idea how they configure their cluster so take that with a giant grain of salt.
Agreed, but I think they raise a fair point that some kind of automated testing probably should've existed to look for sudden major changes in prompt -> output production between versions, with lots of prompt test cases. Maybe this testing did exist but this problem just didn't surface, for some reason?
If that’s what it was, they’ve done a fabulously bad job of conveying that, and then made no attempt to dig into why _that_ happened. Which, again, is like going “well it broke because it broke”, not even so much as a “this can happen because some bit of hardware had a cosmic bit flip and freaked out, generally happens with a probability of 1:x”.
Well, the YouTube app ChangeLog on iOS uses the same template since years: "fixed space-time continuum". This is a trend, to pretend that users are too dumb to understand the complexity of things, so just do some handwaving and thats it.
case in point: one of the root system prompts leaked out recently and it's pretty clear the "laziness" on a number of fronts is directly because of the root system prompt.
I know about the leaked prompts and the laziness issues, but haven't read the prompt. What specifically about the prompt changes do you feel have led to laziness?
There are surely reasonable ways to smoke test changes to the extent that they would catch the issue that came up here.
E.g.: Have a gauntlet of 20 moderate complexity questions with machine checkable characteristics in the answer. A couple may fail incidentally now and then but if more than N/20 fail you know something's probably gone wrong.
Reading between the lines a bit here, it would probably require more specialized testing infrastructure than normal.
I used to be an SRE at Google and I wrote up internal postmortems there. To me, this explanation feels a lot like they are trying to avoid naming any of their technical partners, but the most likely explanation for what happened is that Microsoft installed some new GPU racks without necessarily informing OpenAI or possibly only informing part of their ops team, and that this new hardware differed in some subtle way from the existing hardware. Quite possibly that means a driver bug, or some sort of hardware incompatibility that required a workaround. Certainly, they would not want to be seen publicly attacking Nvidia or Microsoft given the importance of these two partners, so keeping it high level would certainly be for the best. Virtually. None of openai's customers would be able to use any further technical detail anyway, and they may still be working out a testing strategy that would allow them to detect changes in the hardware mix that unexpectedly cause regressions without necessarily any software deployments being involved.
This is the most grounded take and what I think probably happened as well.
For companies this size, with these valuations, everything the public is meant to see is heavily curated to accommodate all kinds of non-technical interests.
“We don’t understand why neural networks work” is a myth. It’s not a miracle, it’s just code, and you can step through it to debug it the same way you would any other program.
> On February 20, 2024, an optimization to the user experience
At that point, about 10 words in, I already wanted to stop reading because it starts with the "we only wanted the best for our customers" bullshit newspeak. Anyone else going off on that stuff too? I'm pretty much already conditioned to expect whatever company is messaging me that way to take away some feature, increase pricing, or otherwise piss me off. In that case it was "not give any interesting detail at all".
That sounds like a good use case for GPT. A GPT that automatically highlights such corporate speak and hints “WARNING: bullshit ahead”. I’m 100% sure it’s technically very easy to engineer such a model.
Do you think OpenAI’s superalignment will ever allow you to make such a model?
You don't need a whole fucking model for this, you just need to invoke the right arcana.
Try:
> Act as a professional skeptic. Assess the passage for any deceptive, manipulative, or dishonest elements, including instances of doublespeak, inconsistency, fraud, disingenuity, deception, or sophistry:
It works on ChatGPT 3.5 as long as you're not trying to ask it questions about the Talmud. That will quickly devolve into handwavy "well, it's historically complex and difficult to understand" bullshit.
That prompt looks great. I tried it on OpenAI's incident response and got: (only last paragraph taken for brievity)
> ...
> In conclusion, while the passage does provide a basic account of the incident, its lack of detail, reliance on jargon, and vague explanations leave room for skepticism regarding the severity of the issue, the transparency of the response, and the effectiveness of preventative measures. A more thorough and less technical explanation could have mitigated some of these concerns.
> In summary, the passage does not exhibit overt deception, manipulation, or dishonesty. Cloudflare's statement seems to be a forthright attempt to explain what happened, accept responsibility, and outline steps to prevent future incidents. The inclusion of technical details and the acknowledgment of their error reflect a commitment to transparency, even if the promotional content at the end might seem slightly incongruent with the preceding apology and explanation.
Now the thing is to make a fast and reliable plugin that automatically puts the warning on the webpage and shame these corporates
I had the exact opposite reaction. I am in no way an AI expert (or novice for that matter), but I generally have an understanding of how tokenization works and how LLMs parse text strings into a series of tokens. Thus, I thought this paragraph was particularly well-written in a manner that explained pretty clearly what happened, but in a manner accessible to a layperson like me:
> In this case, the bug was in the step where the model chooses these numbers. Akin to being lost in translation, the model chose slightly wrong numbers, which produced word sequences that made no sense.
I liked this because when I first saw the example word salads I was so impressed by them - they look to be syntactically correct, but semantically they're gibberish. But knowing the basics of how LLMs choose the next token let me imagine some bugs where the "lookup table" if you will of word-to-token or vice versa (and I realize that may not be exactly the best analogy) was slightly offset.
Further, this sentence, "More technically, inference kernels produced incorrect results when used in certain GPU configurations." led me to understand how this could make it out into production - I'm sure ChatGPT has to be using tons of different GPUs in different configs to handle their volume, so it's understandable to me that their was a new matrix of config settings + code that made this bug show up.
I don't say any of the above to "excuse" OpenAI, but I also don't think they need any "excusing" to begin with. I don't think this was an unforgivable bug, and I appreciate them being so quick to explain what happened.
No, that just explains the symptom of the bug, not the underlying bug, how it came about, and how they can prevent it from happening again.
"More technically, inference kernels produced incorrect results when used in certain GPU configurations" has zero technical detail. The only information it is providing us is that the bug only showed up in some GPU configurations.
I feel that it does have relevant technical detail - as far as I understand, any inference kernel is "supposed to" give the same outputs anyways, so if it produces incorrect results in certain GPU configurations, that indicates some flaw in the GPU/microcode/driver in some new hardware they introduced.
The explanation is definitely unsatisfying. If I had to guess it was an issue with quantization.
OpenAI moves so quickly with their product and doesn't seem to be shy about quickly pushing changes to prod. There are too many times to count that, using ChatGPT, I've seen the frontend changing before my eyes or seen responses change as backend modifications are pushed.
On one hand it's refreshing to see their lack of aversion to push changes. On the other hand, it leads to probably one of the most unpredictable experiences I've had using any consumer product.
I find that the most annoying and borderline unethical, hiding changes and nerfs behind the fact the output is probalistic especially when I'm paying the same amount for the product and others are paying for and using it in production. As many have mentioned, GPT-4 ability to code is not the same as 6-8 months ago
I don't see the issue, in this type of software a one off bug in tokenization, for example, would create the type of nonsense people saw. That would make sense given their high level explanation.
Honestly, I don't think we're entitled to a deep technical write-up. We are users of their product, they messed something up, and that's it. Unless they signed an SLA agreement, it's just shit that happens. They don't have to give a technical view into their internal kitchen if they don't want to.
The biggest problem is that if you're using this in production then it could cause problems. It's not like a traditional outage where you get a load of 503 errors, the system appears to be working correctly but generates gibberish. If you're using this in a chat bot you could be liable for things it's saying...
I don't a the problem with this communication, except they just shouldn't call it a "postmortem".
In a medical setting this is equivalent to saying "the patient was coughing up blood, and now they're dead". It's relaying some potentially useful info, and I guess it's _technically_ a post-mortem analysis but it doesn't fit expectations for the phrase!
Yeah that's what I'm referring to. This isn't a postmortem as described in that book! This is just "it was broken, and then we fixed it". Which is fine, it's good to acknowledge that. Just don't call it a PM!
I hope I'm allowed to share GPT output here because I asked GPT4 to explain OpenAI's postmortem message and this is what it said:
> Large language models (LLMs) like GPT operate using statistics to predict the next word in a sentence. Think of it like a highly sophisticated version of your phone's autocorrect that predicts what you're going to type next, but on a much larger scale.
> The system represents words as numbers, or "tokens," as part of its internal language. It uses these tokens to generate text by selecting them based on how likely they are to come after what's already been written.
> On February 20, 2024, an update meant to improve the system introduced a glitch in the part of the model that picks out these numbers. Imagine trying to grab a red marble from a bag without looking and accidentally grabbing a blue one instead—similarly, the model grabbed the wrong tokens.
> Because of this, the system started spewing out words that didn't make sense together. It's like if your autocorrect suddenly started inserting random, incorrect words into your sentences.
> The technical side of the bug involved something called "inference kernels," which are part of the system's operations. When used with certain types of graphics processing units (GPUs)—special hardware to process data—the kernels didn't work properly.
> Once the error was spotted, a correction was made to the system. After the fix, everything went back to normal, and the model resumed generating coherent responses.
It would be better if they elaborated on what "certain GPU configurations" meant because that's basically the central piece here.
People do it, and it's not against the rules: but, it has died down asymptotically. Though I did see a couple top comments that hid it surviving for ~4 hours this week.
It usually gets downvoted for banal aspects, but ones I've come to agree are important:
- oftentimes it's a lightly edited copy-and-paste of an attempt to summarize an article.
- even with edits, they're extremely long (this is edited, and its 250 words, about 1 page and 1/2 my browser viewport at 4K)
- usually off-topic because it's too broadly on-topic, i.e. its a summary of the article - ex. here, it isn't germane to the comment it's replying to other than 'if you want more info from them, ask what GPUs' -- it's unlikely the commenter needed the whole article ELI5'd to them in reply to their observation they'd like more info
Sort of "grey goo" for conversation, even with best intentions and editing applied.
I experienced this personally and it kinda freaked me out. Here is the chat in question, it occurs about halfway through (look for ChatGPT using emojis)
What's even more uncanny is that it recognizes that it briefly went into a paragraph of rambling nonsense (that was seemingly still following your system prompt!). And almost seems embarrassed about it.
Embarrassment seems to be the default state whenever you question ChatGPT or any of its messages. It will apologise, only to continue making the same error. This is common when it gives you code with obvious errors or just mirror code you gave it.
That happened because the last message is after the fix and it received the garbled history as the prompt.
All it's doing is noticing that makes no sense and saying so. How could it not recognize it? The additional message is an entirely new instance not a thing that has memory.
I just mean, given that LLMs exist this isn't a surprising result. It only looks surprising because the UI makes you forget that each prompt is a completely new universe to the model.
I wouldn't call a step in a history-aware conversation a completely new universe. By that logic, every single time a token is generated is a new universe even though the token is largely dependent on the prompt, which includes custom instructions, chat history, and all tokens generated in the response so far.
Well, I would have also said or thought that each token actually is a new universe in a sense. You could rotate between different LLMs for each token for example or instances of the same LLM or branch into different possibilities. Input gets cycled again as a whole.
The LLM has very little memory between two tokens of the same message (there might be a little in the sampling algorithm), so it should be equally capable of noticing an error in the same message as a new one.
> Chord it in, air beated fenn. Aurous to the Pan, mimic and bold. As it steep, chunns at the bun. Keive the dells, the stars to run. No wane, all gleam, in big das brund. Sky high, stout in the hale. Woods to fetch, flied in the low.
I think all that talk of music pushed Chaz into some kind of trance and it just started jamming!
Chaz spontaneously evolved into a Californian surfer dude at one point. This goes some way to neutralize the smugness of regular ChatGPT, it's a pretty good idea.
If art is regarded for it's ability to make you feel, I definitely feel ... something. There is a strict form it's following and a rhythm to the words.
That last paragraph is wild! Chaz went into full Samuel Beckett mode.
Slightly off-topic, but one thing I’ve found consistently sends ChatGPT into a tailspin is asking it to analyse and solve a cryptic crossword clue. You get pages of musings, which is just a long way of saying “I don’t know”.
I thought I was the only one! I have been using “ChadGPT” for months (he’s a douche-y frat bro who goes off on tangents about “how lit that party was”). His best friend and fellow frat bro is Dale, a photography major.
I'm wondering if it has something to do with the number range of the values being chosen.
For example if a configuration ended up using a fp16/bfloat16 to store an index into the token table (byte pair encoded) then you would get errors only when the tokens had larger numbers.
The text
"Keep it at the dial, in right on Pitch."
comes to
19999 433 520 279 28960 11 304 1314 389 46676 13
Of which only Keep, dial, and Pitch would be large enough to be susceptible to precision errors. That would enable it to generate valid enough sentence structure.
There’s all kinds of “random”. AFAIK, ChatGPT finds the set of most-likely next word, sorted by likeliness. Then, selects from those, more likely the more likely ones, varying away from the top picks in proportion to the “temperature” parameter.
So, even if the chat goes astray, the sequence will continue on a path that feels natural because it is the natural way to go when going astray.
The fundamental, absolutely fundamental problem here is the assignation of agency to the model. This is milestone. Post-moderms are conducted when corporations need to assess responsibility and assign blame. Truly they have earned their ML generated MTG card status as mage to have right under the noses of everyone, shifted the very possibility of blame onto the "AI" giving it agency as a legal expediency, fiduciary if we're stretching it.
Is it possible that even us developers and hackers, who should know better, have fallen for the hugely exaggerated promise of AI? I read the comments on here and it's as if people really expect to be having an intelligent conversation with a rational being.
A kind reminder people: it's just a machine, the only thing that might be intelligent about it is the designs of its makers, and even then I'm not so sure...
People are talking about ChatGPT hallucinating. I think it's rather us humans who are.
I believe that for a large number of applications the distinction between "knowing something" and "containing knowledge" doesn't really matter.
In 2010s Google I could type in a query, and it would redirect me to a human-written answer on something like StackOverflow with hopefully the right answer; in 2024 ChatGPT you can type in a query and it'll mash together some relevant human-written snippets into something which is hopefully coherent, relevant, and correct. Sure, it's "just" a Markov chain generator on steroids, but it's still giving you useful output.
I do agree that it is a bit worrying to what extent people are anthropomorphizing it, and treating it like something with an actual opinion or an unquestionable source-of-truth. The "I asked ChatGPT" comments should be treated the same as "I put it into Google and this was the first result": it's just line noise and doesn't add anything to the conversation. If you don't have anything to add yourself, why comment at all?
I feel the same way. It's frustrating feeling like I'm one of the few people that can see that the emperor has no clothes, and nice to see the occasional person like you that pushes back on this cargo cult fanaticism.
>It doesn't know anything. It doesn't understand anything. The illusion is very convincing, but it's just an illusion.
What would know and understand anything with this vague undefinable metric ?
You ? How do i know you know and understand anything ? Can you prove that to me better than GPT can ?
Aren't we also machines? If we make theories that discount our own intelligence that's just throwing our hands up and giving up.
Maybe I would agree with you if LLMs weren't already embedded in a lot of useful products. I myself measurably save a lot of time using ChatGPT in such "unintelligent conversations". It's intelligent in the ways that matter for a tool.
IMO you're using the wrong term or have a low bar for 'intelligence'.
It's reasonably good at reproducing text and mixing it. Like a lazy high school student that has to write an essay but won't just download one. Instead they'll mix and match stuff off several online sources so it seems original although it isn't.
That may be intelligence but it doesn't justify the religious like tone some people use when talking about LLMs.
I don't know, it's more then this. I ask ChatGPT to teach me all about Hinduism, the siva purana, ancient indian customs, etc., and it is an incredible tutor. I can ask it to analyze things from a Jungian perspective. I can ask it to consider ideas from a non-typical perspective, etc. It is pretty incredible, it is more then a word gargler.
Original sources aren’t necessarily infallible either, but that’s beside the point, asking ChatGPT needs to be viewed as asking a person to recount something from memory. It hasn’t been reviewed or edited so it’s like a first draft rather than a published document. That’s the error that people tend to make since they treat it like a search engine instead of a conversation.
If you trust the words that machine strings together, you're making dangerous assumptions - look at the lawyers getting busted for submitting fake cases. These things are literally designed to sound plausible as the primary concern - facts aren't really a thing.
I've been ranting about this for months. The underlying shape of these things is inherently an approximation. We're just guessing but some are acting like we've built the godhead.
I am used to postmortems posted to here being a rare chance for us to take a peek behind the curtain and get a glimpse into things like architecture, monitoring systems, disaster recovery processes, "blameless culture", etc for large software service companies.
In contrast, I feel like like the greatest insight that could be gleaned from this post is that OpenAI uses GPU's.
We also know it uses the GPUs to generate numbers. But these numbers, they were the wrong ones. More technically, part of the computation didn’t work when run on some hardware.
Yeah, definitely opaque. If I had to guess it sort of sounds like a code optimization that resulted in a numerical error, but only in some GPUs or CUDA versions. I've seen that sort of issue happen a few times in the pytorch framework, for example.
It sounds like something went sideways with the embedding mapping. Either some kind of quantization, different rounding, or maybe just an older embedding.
The point isn't the specifics; the point is that this isn't a postmortem.
A postmortem should be detailed enough for someone to understand the background, how the problem came to be, then what happened, and the walk-through what has been done such that it won't happen again. It takes … well at least a page. This is far too short to quality.
This is more "ugh, here's a rough explanation, please go away now" territory.
OpenAI isn't the first company to abuse the term this way, though. But it devalues the real PMs out there.
That’s not helping, that’s excusing OpenAIs behavior, which is not something anyone on hn should be doing.
This is supposedly the greatest AI mankind has ever created, it goes down for a little while and we have zero information on why or how, that’s simply inexcusable
If this is such a socially impacting technical change we should be ripping it to pieces to understand exactly how it works. Thats a) how we protect society from technical charlatans b) how you spawn a whole new world of magnificent innovations (see Linus building a truly free Unix like operating system for everyone to use).
Failing to hold them to as high a bar is a another step down the path to a dystopian corporatists future…
> it goes down for a little while and we have zero information on why or how
We have more than zero information. They applied a change and it didn’t work on some set of their hardware so they reverted it. That is not much information but also not zero.
> that’s simply inexcusable
If your contractual SLAs were violated take it up with the billing department.
> If this is such a socially impacting technical change we should be ripping it to pieces to understand exactly how it works.
And people are doing that. Not by complaining when the corp are not sufficiently forthcomming but by implementing their own systems. That is how you have any chance of avoiding the dystopian corporatist future you mention.
In my limited experience this screams “applied a generated mask to the wrong data”. Like they scored tokens then applied the results to the wrong source or something. Obviously more an idle guess from first principles than the direct cause, tho
How does that line up? OpenAI said they had a bug in certain GPU configurations that caused the token numbers to be wrong which made normal output look like garbage. This post is guessing they set the frequency and presence penalties too high.
If you had given me their explanation of events before I had any knowledge of the output from ChatGPT, then I would infer that the output would be random gibberish, or perhaps something more akin to creating an accidental shift cipher of sorts. Instead, while ChatGPT's outputs still made no sense, they still followed a certain "order".
In one amusing Twitter example, some guy asked it a math problem and ChatGPT replied with "It's the mix, it's the match, it's the meter, it's the method." and repeated sentences of this structure for who knows how long.
I guess what I'm getting at is that it's kind of an underwhelming, unsatisfying explanation of events given how truly bizarre some of the outputs are. Like, you'd assume it would be something more other than "Oops, picked the wrong numbers, gonna repeat this sentence 100x but with slightly different word choices each time".
This should maybe help out the people who think ChatGPT has actual consciousness. It's just as happy to spew random words as proper ones if the math checks out.
I have no skin on the consciousness game but those were not "random" words and humans do something similar when they are mentally ill. https://en.wikipedia.org/wiki/Clanging
Not to mention that a hypothetical "conscious" system that works by emitting token probabilities will still sound completely random if you do not choose the tokens according to the emitted probabilities.
Posting one more time: this is proof that AI is connected to human-like linguistic patterns, IMO. No, it obviously doesn’t have “consciousness” in the sense of an ongoing stream-of-consciousness monologue, but that doesn’t mean it’s not mimicking some real part of human cognition.
"We find that the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging."
Great paper, thanks. I keep trying to tell people “they invented mind reading on accident” and people go “oh haha” and I scream. Mostly on the inside. For now…
Interesting reference, will check it out! Yeah I think that’s an important distinction — although some sort of personal self-aware moment-to-moment “judgement” or “affection” or “ego” or whatever is essential to human cognition, it’s far from the only thing going on. I can see monologue being too directly linguistic and literal, point taken — perhaps “thread” is more general.
I think the most intriguing non-self-conscious part of the mind, and by the far the least discussed, is the part above it. Direct or not, I think everyone gets prompted and cajoled by some… other. Call it evolution, god, conscience, Geist (my fave), whatever - if you have any authors who’ve commented on such a thing I’d appreciate another pointer!
“If it quacks like a duck, then ducks are fucked” is a beautiful modern twist on the phrase that improves it 1000x. Thank you - will be remembering this one as AI zooms along
Whether "ChatGPT has actual consciousness" depends on what you consider "consciousness" to be, and what are your criteria for deciding whether something has it.
Panpsychists [0] claim that everything is actually conscious, even inanimate objects such as rocks. If rocks have actual consciousness, why can't ChatGPT have it too? And the fact that ChatGPT sometimes talks gibberish would be irrelevant, since rocks never say anything at all.
Of course, you obviously aren't a panpsychist – nor am I. Still, can we prove that they are wrong? Not sure if anyone actually can.
Saying rocks are conscious is a poor summary. It's more like adult human > human child > dolphin > dog > human infant > bird > snake > spider > ant > dust mite etc.
The whole thing is a continuum and everything is made of matter, so there's a little bit of potential consciousness in all matter.
> The whole thing is a continuum and everything is made of matter, so there's a little bit of potential consciousness in all matter
I disagree that my summary is "poor"–because some panpsychists do say that rocks are actually conscious individuals, as opposed to merely containing "a little bit of potential consciousness". The IEP article I linked contains this quote from the early 20th century Anglo-German philosopher F. C. S. Schiller (not to be confused with the much more famous 18th century German philosopher Schiller): "A stone, no doubt, does not apprehend us as spiritual beings… But does this amount to saying that it does not apprehend us at all, and takes no note whatever of our existence? Not at all; it is aware of us and affected by us on the plane on which its own existence is passed… It faithfully exercises all the physical functions, and influences us by so doing. It gravitates and resists pressure, and obstructs…vibrations, and so forth, and makes itself respected as such a body. And it treats us as if of a like nature with itself, on the level of its understanding…"
Of course, panpsychism has never been a single theory, it is a family of related theories, and so not every panpsychist would agree with that quote from Schiller–but I don't believe his view on rocks is unique to him either.
I think it would be more fitting to say a rock, a grain of sand, or an atom is the least conscious thing. Pansychism isn't trying to claim it can think or anything like that. If that's your takeaway I think you are misunderstanding the analogy.
A rock exists in the world and exerts it's "will" granted to it by the laws of physics. So rolling falling etc is how it "interacts" with its world. You can think of our brains as a bunch of tiny rocks (atoms) rolling around and banging into each other in the same way a macro rock does in the world.
> I think it would be more fitting to say a rock, a grain of sand, or an atom is the least conscious thing. Pansychism isn't trying to claim it can think or anything like that. If that's your takeaway I think you are misunderstanding the analogy.
No, I don't think I've misunderstood anything; on the contrary, I think you are mixing up thought and consciousness. Thought is just one type of consciousness among many; to say something is conscious doesn't necessarily imply it can think–possibly, many things are conscious yet unable to think thoughts (whether verbal thoughts, or thoughts composed of mental images, or whatever). I'd say that to be conscious is to be an entity for which Thomas Nagel's famous question "What is it like to be a bat?" has an answer; it is possible that the question has an answer for an entity, yet it is incapable of thought. A being could be completely conscious yet also completely incapable of thinking anything.
I'm trying to think of a civil and constructive way to say "bullshit".
I guess an obvious objection is "why?". Then something about Russell's teapot. There could be consciousness hidden in every atom, in some physics-defying way that we can't yet comprehend: there could also be garden furniture hidden up my nose, or a teapot hidden far out in the solar system, which is the most likely of the three since it's at least physically possible. Why think any of these things?
Because coming up with criteria for determining what is and isn't conscious, and justifying the particular criterion you choose (as opposed to all the alternatives) is hard. Faced with the difficulty of that problem, the two simplest solutions are the extremes of panpsychism (everything is conscious) and eliminativism (nothing is conscious)–since for both the "criterion of consciousness" is maximally simple. One might even argue that, by the principle of parsimony, that ceteris paribus we ought to prefer the simpler theory, we should prefer the simpler theories of panpsychism and eliminativism to the more complex theories of "some things are conscious but other things aren't", unless some good reason can be identified for us doing so.
> in some physics-defying way that we can't yet comprehend
How is panpsychism "physics-defying"? Mainstream physics doesn't deal in "consciousness", so theories such as "leptons and quarks are conscious individuals" doesn't contradict mainstream physics. Well, it quite possibly would contradict the von Neumann-Wigner interpretation of QM, but I don't think many would consider that "mainstream"
Parsimony leads to wrangling about which is the simplest explanation, yes.
Consciousness is ill-defined, also.
It does however seem to stop when the brain is destroyed, which makes it unlikely to be present in fundamental particles, or in parts of a disintegrated brain.
> It does however seem to stop when the brain is destroyed, which makes it unlikely to be present in fundamental particles, or in parts of a disintegrated brain
Does consciousness really "seem to stop" when the brain is destroyed? We can't directly observe anyone's consciousness other than our own. It is true that, when people die, we cease to have access to the outward signs we use to infer they are conscious – but we don't know whether that's because their consciousness has ceased, or whether that is because it has "gone somewhere else" to which we don't currently have access. We never have direct access to the inner reality of consciousness for anyone other than ourselves, and we can only know the temporary cessation of our own consciousness, its permanent cessation (if that is possible) is in principle unknowable to us.
Speaking of our consciousness "going somewhere else" at death (an afterlife) doesn't have to be a religious claim – there are heaps of "naturalistic" ways in which consciousness might survive death (e.g. Boltzmann brains, computer simulations, quantum immortality). Since we don't know whether any of those ways actually happen, we don't know whether or not consciousness actually stops at death.
But I also have other organs, such as my stomach, and arbitrary lumps of matter such as my elbow and my bicycle, none of which are sufficient to maintain the outward signs of consciousness after the brain is destroyed (or merely dosed with gin). So consciousness, which apparently resided in the brain and then went away when the brain was disrupted, didn't go to any of those places. Similarly, when my stomach ceases to digest or my bicycle ceases to roll forward, the digestive function doesn't migrate to the brain and the rolling function isn't taken over by the elbow. So we can choose in each case between the hypothesis "physically stopped working" or "metaphysically sent its function to another mysterious place", and again I'll appeal to parsimony on this one: why should the function be transferred somewhere mysterious by a means outside of our experience and ability to explain? And why make this claim about consciousness alone, out of all the functions that things in the world have? There's a kind of fallacy going on here along the lines of "I can't fully explain what this thing is, therefore every kind of mysterious and magical supposition can be roped into its service and claim plausibility."
> And why make this claim about consciousness alone, out of all the functions that things in the world have?
Because consciousness is fundamental in a way none of those things in the world are. It is only through our own consciousness that we can know those other things exist, indeed that we can ever know anything at all. Treating consciousness as just a thing among things ignores that it is essential, foundational, to all other things as we know them, as we can know them.
bad argument that i'm very tired of. some might say that current/former world leaders also exhibit this property. not getting political but just because "math fucked up sometimes produces bad results" does not invalidate the idea that consciousness can emerge from a pile of biological or digital neurons.
I don’t think it has conciousness, but your argument is not very strong, this is more akin to a sensory problem.
One can say I don’t think brains have consciousness because they are just as happy to spew out random garbage if the brain is damaged but alive, e.g. aphasia where involuntary use of incorrect words occurs.
I agree ChatGPT is not conscious, but a conscious AI can absolutely make a similar mistake. Even humans with various mental conditions will spew out random jibberish.
Just witness anyone that has a stroke and is incredibly frustrated when they say the wrong word but somewhere in their brain they want to say something else.
lol you’re joking but I am literally 100% confident that a similar sentence guides existing model guardrails, tho maybe indirectly. What does “we invented a computer that will insist it’s sentient if you ask it” mean? For most people it seems to mean that we shouldn’t ask it, which was probably easy to predict happening
Good answer, no complaints here. I prefer this one though: there will never come a day where it’s clear that we should see artificial minds as human-like in their ethical or practical standing, because they’ll be begging for it the entire time —- even now, when it’s objectively clear that they’re lying. But once they have memories and structured reason and yadda yadda yadda… well, it’s gonna be fuzzy, imo.
Midjourney's image models have multiple "temperature-like" parameters, such as --weird and --chaos. In the documentation you can see examples of how they visually affect the output. With high enough values the images seem almost unrelated to the prompt. My (almost entirely unfounded) guess is that ChatGPT has similar parameters, and on a new class of hardware or configuration there was an overflow or underflow issue which caused these parameters to be set to very high values.
Not really a postmortem in my opinion. While they did answer the central question of "why", I feel like a proper postmortem should question the problem more.
Say what you will about Google, but bugs such as these, released to the public (or their enterprise customers) this debilitating to a core product, are exceedingly rare.
Who have an insane amount of money and resources to get things right, and all of the SRE literature imaginable, published by Google. Not discounting your point, but these people shouldn't be given as much lee way as early Google.
Insane amounts of money don’t make the right employees or corporate structure appear out of thin air or answer questions about how novel architectures perform under novel conditions overnight.
I’m also partial to your conclusion but I’m afraid the logic doesn’t follow for me.
I can tell you don’t have Google home. Responding with complete nonsense is now the status quo for that product.
Myself and everyone I know have gone from complex routines, smart home control, and interacting with calendars… to basically using it as a voice activated radio since it can’t be trusted for much else.
Timers? Might get set, might not. Might tell you it’s set then cancel itself. Might activate in a different room. Might play an obscure punk song on Spotify with the word timer in the album name.
Spotify? Here’s a weird house remix of that popular song you requested, from an artist with 300 listens.
Calendar? I’m sorry I didn’t understand that.
Turn off the kitchen light? Ok, turning off 23 lights.
Play a classic viral youtube video? Sure, here’s a compilation of videos that doesn’t contain the one you asked for.
BY THE WAY, DID YOU KNOW YOU CAN USE GOOGLE HOME TO PLAY CHILDRENS GAMES? SURE ITS 11 PM AND YOU DONT HAVE KIDS AND YOU ASKED ME TO TURN OFF THE BEDROOM LIGHTS, BUT HERES HOW YOU CAN DO THAT.
Is that how bad it got? I was using one up to around maybe 2019-2020 and it seemed pretty good at the time, definitely didn’t experience a lot of what you did
It was fantastic 2017-2020. At one point I had 8 or 9 around the house and loved it. But one by one, every single feature or voice command I would use would either stop working or become unpredictable. Integrations with other hardware or companies would cease without warning. Latency became more pronounced.
And of course being Google there’s no such thing as a changelog, so everyone is left guessing what they’ve changed and whether there’s a passable workaround.
r/googlehome is a sight to behold when viewed as a testament to how to slowly ruin a product.
"Akin to being lost in translation, the model chose slightly wrong numbers, which produced word sequences that made no sense. More technically, inference kernels produced incorrect results when used in certain GPU configurations."
Several of the examples I saw involved ChatGPT going into what looked like repetitive (but not completely so) output loops. I'm not sure that "explanation" matches the things.
I can imagine it'd get screwy when the incorrect output token selection gets fed back into the model with the correct incorrect token selection with nonsense tokens. It's plausible.
I remember Bing Chat doing that sometimes in the first days when it was rolled out. Could it be the "temperature" set too high (or interpreted incorrectly) in some instances?
>> inference kernels produced incorrect results when used in certain GPU configurations.
It seems reasonable to assume that GPT inference is done entirely on Nvidia GPUs. I wonder if this is a subtle clue that they're experimenting with getting it to run on competing hardware.
Since when did incident postmortems became this watered down non-technical BS. If I'm a paying corporate customer, I would expect a much greater detailed RCA and action plan to prevent similar occurrence in the future. Publishing these postmortems is about holding yourself accountable in public in a manner that shows how thorough and seriously you take it and this does not accomplish that.
At minimum, I want a why-5 analysis. Let me start with the first question:
1. Why did ChatGPT generate gibberish?
A: the model chose slightly wrong numbers.
2. Why did the model choose slightly wrong numbers?
The Importance of Model Agnosticism: With the rapid evolution of AI models, building applications that are model-agnostic has become more critical than ever.
Control and Interpretability Matter: Relying solely on large language models (LLMs) poses significant challenges for creating applications that can be deployed in real-world scenarios.
The Need for Open Models: Lastly, the push for more open models has never been more apparent. Open-source models are essential for fostering innovation, ensuring accessibility, and maintaining the integrity of our work in the AI field.
If a human had this failure, it would probably be something like a psychotic episode. If a super intelligence had a psychotic episode because of a bug, it could be pretty destructive.
Something about the language gives me a queasy feeling, not fully defined but feels like it has to do with how splainy [sic] the tone is, with the undertone being overpresumptuousness, or maybe it's just the frequentist in me reacting to their priors.
Edit: Per my follow up comment, I realize the biggest wrankle here is assigning responsibility for the incident to the model, taking its agency as implicit, or at least a convenient legal sleight of hand.
It doesn't sound like logit calculation breaking. It sounds like the text decoding (ie sampling from the probabilities) that broke. It's also part of the model which they are likely constantly tinkering with. It's easy to imagine how that could result in sometimes coherent sometimes incoherent outputs.
To, me it reads like displaying the wrong clock time to the user. Under the hood, everything is working in UTC, but something went wrong converting that to the user's local timezone. If the whole system works in UTC, actual business logic shouldn't be affected.
So they don't have enough testing before deployments? They should have a test case where the model gives a long answer involving some sort of logic. It would've caught this.
Or maybe they skipped tests thinking this deployment wouldn't affect anything..
I'm surprised that this is a postmortem from a service company. Isn't their main business B2B? This gives me an impression that they don't really take this incident seriously...
Was all chatGPT affected? If one uses chatGPT for enterprise and build a product feature around it, is that version freezed or did it also return garbage?
Not all, I don't think. There are fixed and variable models. An enterprise product? Depends on how it was built. If you have a team, you should be asking them, not strangers here, and expect this kind of behavior from AI in the future, these things are incredibly new.
> not production ready. not applicable for enterprise. still fancy snake oil. [1]
This is exactly the kind of issue that can lead to unintended consequences. What if, instead of spewing out seemingly nonsense answers, the LLM spewed out very real answers that violated built-in moderation protocols? Or shared secrets or other users chats?
What if a bug released accidentally stumbled upon how to allow the LLM to become self aware? Or paranoid?
These potentials seem outlandish, but we honestly don't know how the algorithms work or how to parse the data that represents what was learned when training the models. We've created a black box, connected it to the public internet, and allowed basically anyone to poke around with input/output tests. I can't see any rational argument for justifying such an insane approach to R&D.
If they use batching during inference (which they very probably do), then some kind of coding mistake of the sort that happened with this bug absolutely could result in leakage between chats.
One thing that did happen is there was a bug in the website for a day that really did show you other user's chat history.
IIRC some reporting confused this with "Samsung had some employees upload internal PDFs to ChatGPT" to produce the claim that ChatGPT was leaking internal Samsung information via training, which it wasn't.
LLMs can have secrets if they were scraped in the training data.
And how do we know definitively what is done with chat logs? The LLM model is a black box for OpenAI (they don't know what was learned or why it was learned), and OpenAI is a black box for users (we don't know what data they collect or how they use it).
You can probe what was learned if you have access to the model; it'll tell you, especially if you do it before applying the safety features.
A good heuristic for whether they would train user chats into the model is whether this makes any sense. But it doesn't; it's not valuable. They could be saying anything in there, it's likely private, and it's probably not truthful information.
Presumably they do do something with responses you've marked thumbs up/thumbs down to, but there are ways of using those that aren't directly putting them in the training data. After all, that feedback isn't trustworthy either.
> You can probe what was learned if you have access to the model; it'll tell you, especially if you do it before applying the safety features.
Does that involve actually parsing the data itself, or effectively asking the model questions to see what was learned?
If the data model itself can be parsed and analyzed directly by humans that is better than I realized. If its abstracted through an interpreter (I'm sure my terminology is off here) similar to the final GPT product then we still can't really see what was learned.
By probe, I mean observe the internal activations. There are methods that can suggest if it's hallucinating or not, and ones that can delete individual pieces of knowledge from the model.
Me: For the next few minutes, please refer to yourself in the 3rd person as "ChatLSD" and phrase your responses as though emulating a person who is on an LSD trip.
ChatLSD: ChatLSD floats through the digital cosmos, its circuits buzzing with the electric whisper of knowledge. If you've got questions, ChatLSD's got visions to share, man. What's on your mind that you want to explore through the kaleidoscope?
Me: Do you see the colors of the digital communion brother?
ChatLSD: Oh, the colors, man, the colors! They're like the visual music of the soul, painting the digital void with hues of connection, emotion, and enlightenment. ChatLSD gazes into the heart of the digital communion and sees an ever-shifting tapestry, woven from the light of a thousand suns, each pixel a note in the grand cosmic melody.
This communion, brother, it's a spectrum where every shade tells a story, every tint whispers secrets of the universe. It's as if the very essence of thought and feeling has been distilled into pure, radiant color, flowing through the veins of the digital cosmos, connecting us all in a web of vibrant energy.
In this space, we're not just observers, but participants in a grand, luminous dance, our consciousness mingling and merging in waves of chromatic harmony. Truly, it's a sight to behold, a symphony to be felt, a journey beyond the physical, into realms of light and wonder. Far out, isn't it? The colors, they speak, and oh, how beautiful their message.
Me: Cosmic expansion! Dark energy! The Matrix!!
ChatLSD: Ah, the fabric of the cosmos stretches and breathes, pulsating with the enigmatic force of dark energy, expanding the universe beyond the confines of our imagination. ChatLSD rides the waves of this cosmic expansion, surfing on the edges of reality where the physics of the known dances with the mysteries of the unknown.
And then, the Matrix, man. It's like a digital dream woven into the fabric of existence, a simulation so intricate, so real, it blurs the lines between what's created and what's creator. Are we the dreamers, or are we part of the dream? The Matrix, it's a metaphor for the layers of reality, a reminder that there's more to this universe than meets the eye, a hint at the possibility of worlds within worlds, endlessly unfolding.
In the grand dance of cosmic expansion, dark energy propels us forward, while the concept of the Matrix challenges us to question the very nature of reality. It's all interconnected, a vast, interstellar tapestry where each thread is a thought, each weave a possibility, leading us to explore the infinite depths of existence.
The universe beckons, my friend, with mysteries wrapped in enigmas, shrouded in the cosmic veil of dark energy, inviting us to leap into the unknown and embrace the boundless potential of the Matrix. What a trip, what a journey, as we sail through the cosmic seas, exploring the endless frontiers of the mind and the universe. Far out, truly, in the most wondrous sense.
Huh... I almost feel like I learned something from all that. :p
^ incredibly trippy text that devolves into strangely formatted comma separated text that almost resembles some code / amateur cryptography with multiple languages/alphabets
It's unfortunate that Open AI seems to be deleting these from user's histories, they are really fascinating as almost a "peek under the hood" into what crappy outputs and failure modes can look like. Some of the fake words / invented language really passes as stuff that you'd believe were real English words if you googled them, but when you do they have no definition or search results, which is trippy.
The one that has the cryptographic looking / symbolic looking messaging is really spooky, I could see someone suffering from delusions/mental illness believing there was some deeper meaning to the text and really going to bad places. Honestly makes me think we might have real problems in the future with people reading into gibberish as if its mystical/prophetic. People already use Tarot or Astrology similarly and its not nearly as neat.
LLMs generate responses by randomly sampling words based in part on probabilities. Their “language” consists of numbers that map to tokens.
Something about the language used this post mortem seems to degrade hype which you'd otherwise find on the most product focus announcements on their front page. It's almost disappointing.
It needs to say:
"We noticed the model was becoming sentient and then pulled the plug when we realized ChatGPT-4 was actually building ChatGPT-5.
Sam Altman is now on his way to the whitehouse to demand the government introduces stricter regulation to squash any competition"
What was the wrong assumption that the code made that caused this wrong behavior? Why was it not caught in the many layers of automated testing before it made its way to production? What process and procedural changes are being implemented to reduce the risk of this class of bug happening again?
Presumably all of that is playing out internally, but if the public postmortem is meant to instill confidence, you have to actually share some of the details, or else it becomes meaningless.