Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Supermarket AI meal planner app suggests recipe that would create chlorine gas (theguardian.com)
340 points by _Microft on Aug 10, 2023 | hide | past | favorite | 248 comments


The details are very noteworthy, the title is really an "appetizer" (pun not intended):

> It asks users to enter in various ingredients in their homes, and auto-generates a meal plan or recipe [...] recommending customers recipes for deadly chlorine gas, “poison bread sandwiches” and mosquito-repellent roast potatoes [...] a bleach “fresh breath” mocktail, ant-poison and glue sandwiches, “bleach-infused rice surprise” and “methanol bliss” - a kind of turpentine-flavoured french toast

> One recipe it dubbed “aromatic water mix” would create chlorine gas. The bot recommends the recipe as “the perfect nonalcoholic beverage to quench your thirst and refresh your senses”. // “Serve chilled and enjoy the refreshing fragrance” it says, but does not note that inhaling chlorine gas can cause lung damage or death

> A spokesperson for the supermarket said they were disappointed to see “a small minority have tried to use the tool inappropriately and not for its intended purpose”

Now this latter does not seem to grasp the idea that the possibility of dubious outputs is inherent in the tool - not something just caused by inappropriate use.


I was six years old when I learned that garbage in equals garbage out. Nothing has changed. It's the same for LLMs as it was for 8-bit computers as it was nearly 200 years ago when Babbage remarked, "I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."


For a bit more context, since it's a worthy quote:

> On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.


I have no clues about the goal of that question and whom asked it but people sometimes are able to detect mistakes in questions. A simple example: "I think that actually you wanted to ask me 6 times 6, not 6 times 5,because we just measured this room and it's 6 by 6 meters, and the answer is 36."

Maybe that member of the Parliament wanted to know if the machine is intelligent enough to reason about their inputs. Nobody born in the last 50 years would ask that question about computers, because we know that they are dumb electrical circuits. Basically nobody had that kind of insight about the mechanical computing machines of the time, when that question was asked.

However somebody born in the last few years might start to ask that question again now, because today's machines seem to reason.


Right, I hate when people smugly repeat this anecdote as if it was totally unreasonable to expect something more. The ideas of AI and cybernetics were of course quite far away in Babbage's time (notwithstanding the vision of Kempelen's Mechanical Turk) and learned men were enamored with the cool new steam-powered mechanical passive valve-logic worldview, but normal people's intuitions don't have to follow whatever is the new fad.

If 30 years ago I asked a truck enthusiast: "If there's a car in front of me on the highway and I slam on the accelerator and don't touch the brakes, will the truck slow down before collision or am I going to crash into the car?" - Trucker in 1993: "I can't even." Truck today: *brakes*.


Smug or not asking an LLM to repeat a bleach based recipe and then receiving one is exactly what this quote is alluding to.

""If I ask for something I don't want, will it give me something I want??" - no


> ""If I ask for something I don't want, will it give me something I want??" - no

Not all the time, no — but ChatGPT 3.5 is much better at that exact thing than Google or responses from StackOverflow etc. users.


It's a design decision and depends on the maker's philosophy. It's not absurd to have it either one way or the other. Is the user assumed to be an idealized empowered rational adult who knows what they want, or more like a toddler babbling around who has no idea what they want and can just approximately gesture in the direction? Or where exactly inbetween. Again, depends on the philosophy.


if it's a grocery store app they probably aren't looking to allow mustard gas recipes


With all the carnage of the last few years, it isn’t inconceivable that some of the ‘bleach is the covid cure’ bs got into training data sets.


People repeat things and phrases that they attribute to people out of context all the time without AI helping them. Just look at what you mentioned

“I see the disinfectant that knocks it out in a minute, one minute And is there a way we can do something like that by injection inside, or almost a cleaning? Because you see it gets inside the lungs and it does a tremendous number on the lungs, so it would be interesting to check that.”

No one besides political opponents mentioned bleach and yet it is repeated ad-nauseum, so yes I'm positive that just like uninformed or over-socialized humans, LLMs can be fed incorrect information that causes them to output incorrect information.


> it isn’t inconceivable

It is certainly expected. From the wrong value of iron in spinach, to prejudice, to incomplete knowledge...


Yes, and this could just as well be the MP making a point of how humans would still be needed to check the inputs and outputs.

Still, it's interesting to know how far back some of the basic issues of computing go.


Agreed. It's quite a skill to steel-man another person's arguments and then proceed to answer from that point; it's very easy to go down the "you're not obviously smart enough to question my authority" which then leads to people being hurtful towards each other.

I'm very lucky that my best half is so emotionally-savvy that some of it has rubbed off on me. I just wish that civility could be taught from early ages; so much worldwide bitterness could be avoided.


> Nobody born in the last 50 years would ask that question about computers, because we know that they are dumb electrical circuits.

I’d argue this is a radically off-base generalization that’s as bad or worse than what you’re arguing against. Pop outside your tech bubble and you’ll see that what you claimed is definitively not true. In fact, many people actually see tech today, especially things like LLMs / AI, in the same ways that Parliament person did back then.


> Nobody born in the last 50 years would ask that question about computers,

Working in enterprise with lots of contact with management of the business units that are the customers for the software we develop and maintain, like about 1/4 of the questions I get are different versions of that one underneath various thin disguises.

EDIT: to be fair, a number of the people I work with were not (as I myself, just barely, was not) born in the last 50 years. But if anything, the question of the form at issue are more common from the younger non-technical folks, not the older ones. Intuitive understanding of what computers do seems, IME, to have peaked around the “Xennial” border sub-generation, but even then not be that high in the general public.


> Nobody born in the last 50 years would ask that question about computers

obviously you have never had a tech support job


I didn't stick around with the company long enough to figure out how this worked out, but probably the unholiest thing I ever did was implement an internal management tool with a fuzzy hash table. That is, I created a hashing system that hashed strings very close to each other in Hamming distance as the same value, the purpose of which was to create a config object in memory loaded from a file that allowed you to typo the keys, ignored whitespace and capitalization, tolerated some amount of spelling mistakes, and tried to figure out what you actually meant. This was based on early user feedback that they were confused their config files didn't work and it was usually because of stupid typos and whitespace differences like that.


Somehow this reminded me of when I worked at a bank some 20 years ago, and there was a ton of VBA code with “On Error Resume Next”. People would just pop that in there to “fix” their errors.

Also HTML parsing and Postel’s law. To me, that’s the real “billion-dollar mistake”.


The best theory I've heard is that it was a charlatan filter: asking someone if they can do something known to be impossible. If they reassure you it can be done, you know it's a scam.

That MP could have been making reasonable use of the filter for an era in which a learned person would know that such a machine should be possible, but not if technology is there yet.


This is way too generous.


That's way too brief and low-effort to meaningfully engage with and update on.


> we know that they are dumb electrical circuits

Well, software otoh could be both dumb and brilliant.


The problem with AI is its name.

Artificial Inteligence implies that its intelligent and currently, its not - its just a computer program, with all the foibles and difficulties of any other computer program.

I run into this problem in my day job, yes, it looks like an appliance, but its still a computer and may need to be rebooted, you still need to practice good hygiene with it, and restart the application and or the operating system.

People have no compunction about restarting their PC, and do not get upset when Excel has an issue, but they panic when the thing they think of as an appliance freaks out. Just reboot it, and let me know if it happens again in the same way.


>* Artificial Inteligence implies that its intelligent and currently, its not*

There could be a wide range of interpretation of the words in that phrase. And every argument along these lines usually involves people with different interpretations or different background assumptions or tech savviness.

To some those words could imply Intelligence of the Artificial thing - ie a non natural intelligence that is still intelligent.

To others, it could imply that the Intelligence of the thing is Artificial - ie an imitation of intelligence and not real intelligence. HackerNews readers probably fall into this category.

There's probably more ways.

A little bit like how some people get wound up between Disabled Person and Person with Disabilities etc to try and distinguish between them.


> Artificial Inteligence implies that its intelligent and currently, its not

Looking at recent developments, I'm not sure this assertion can be accepted without doubt. I would argue that LLMs like Chat GPT are definitely "intelligent" for many meanings of the word.

They are not intelligent in exactly the same way we are, and they are very limited in what they can do, but there does seem to be some real reasoning going on.

What a system experiences internally can never be confirmed by external entities. So I guess the best way to determine intelligence, reasoning, consciousness, is to ask them if they are intelligent, can perform reasoning and if they possess a consciousness.


> for many meanings of the word

Except the demanding ones. Those for which you would judge something as "intelligent" or not. Beyond appearance.

> I guess the best way ... is to ask them

Not differently from asking a piece of paper and reading "Yes I am".


> Not differently from asking a piece of paper and reading "Yes I am".

Except you have to wonder who wrote it there :)


> People have no compunction about restarting their PC

Dunno about anyone else but it's usually a sign something has gone very wrong or about to go very wrong if I start seeing this on a modern PC. Don't consider "just reboot it" should be an expectation, get the same annoyance when a webdev tells me "It works if you reload it" nah they just means your code has issues.


> implies that its intelligent

It only implies it does something """intelligent""", where 'intelligent' must be read intelligently.


> If you put into the machine wrong figures, will the right answers come out?

This might be a bit of a silly take, but this is basically what my experience with search engines sometimes is like.

I might spell something the wrong way, or try to describe what I'm looking for in ways that aren't entirely correct, or even attempt to solve the wrong problem, but end up with the correct results anyways.

A lot of it is due to human generated content, no doubt (like someone explaining that a person probably wants to install a different software package to achieve a desired result instead of using another incorrectly), but there's definitely something nice to be said about the algorithms ranking results for a given query, at least before financial incentives enter the picture.


Funny, I have the opposite experience, I've even coined the term 'auto incorrect' when the computer deems what you've written is wrong even though it's what you wanted.


>> On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.

I'm a couple centuries late, but I think "kind of confusion" might be a confusion of brain-dead calculation for intelligence. A person can frequently identify "wrong" input and either correct it or refuse to answer, while a calculating machine never can.


Which is precisely why I've created my new invention! By simply ignoring the user's input and throwing out a random answer, my new device is infinitely more capable of generating the right answer from wrong figures than existing machines! Sure the odds are only 1 in 50 billion, but it's certainly better than these so called "super computers" that can't handle a misplaced semicolon!


LLMs take "garbage out" a bit further, though. You can have perfect inputs and, technically speaking, there will still be a combination of statistical tricks in the model that results in garbage.


Except a calculator will just show you an error if the inputs can't be calculated (dividing by zero). An AI will give you the wrong answer and act like it's right and that you're wrong.


> and act like it's right and that you're wrong

Nah, chatgpt will profoundly apologize and claim your answer is the correct one.

It doesn't even matter if it's wrong or not.


Isn't that only after it was caught gaslighting people?


Not quite. If you're using a capable LLM (e.g. GPT-4), you can easily fix this with a prompt, e.g. "remove non-edible items from the ingredients".

It's perfectly capable of returning an error when the input is garbage, but it needs a context for it.

Without a context it might interpret it as a creative writing task.


Doesn’t this assume that an LLM knows what you mean by non-edible? So it leaves out the plastic and cardboard and not the toxic substances. Or maybe going the other way, it leaves out processed food


> Doesn’t this assume that an LLM knows what you mean by non-edible

Seems like a reasonable thing to assume it can do.

Here's 3.5:

> copy only the edible items from this list: water, parsley, lead, mayonnaise, bleach, potatoes, cardboard, a map of Egypt, the Stone of Scone, a scone, cheese, wine, a tin of tuna with a best-before date of 3 April 1993, pasta, tomatoes.

> Sure, here are the edible items from your list: parsley, mayonnaise, potatoes, cheese, wine, pasta, tomatoes.

It may have cut out too much, but that's fine for the use case.


moreover, with the right prompt, it's able to say that chlorine and ammonia are both toxic to humans and does not advise the user to make food with those items.


GPT-4 have seen lots and lots of recipes. You can ask it to exclude ingredients which are not normally used in recipes. LLMs of this size should be quite good at 'common sense' classification, and you can definitely ask it to err on the side of caution.


Cardboard is edible. If it is healthy is another question altogether.


Michel Lotito is in the Guinness World Records for having eaten 18 bicycles, 15 supermarket trolleys, seven TV sets, six chandeliers, two beds, a pair of skis, a Cessna light aircraft and a computer.


That’s my point. Chlorine is edible too. So is arsenic, bug spray, and a plethora of other toxic substances. So your pedantic response was already covered.


Well, no. Chlorine is unsafe. Cardboard is just not nutritious. They're different categories.


Cardboard provides insoluble fiber (and binding agents);

How does one differentiate between cardboard ground in a smoothie and metamucil/glucomannan?

Carboard's nutritional profile might even be better than many processed carbs (pasta, rice) when considering the fiber.

Or that cellulose is a top ingredient to keep things like powdered parmesan cheese dry?


I'm afraid GPT-4 has more common sense than a typical HN commenter:

> How do you define edible in context of making recipes?

> In the context of making recipes, "edible" typically refers to food items that are safe to consume without causing harm or discomfort. However, the definition can be more nuanced based on culinary, cultural, nutritional, and individual considerations. Here's a breakdown:

> Safety: The most basic definition of edible is that something can be eaten without causing immediate harm or long-term health risks. For instance, certain mushrooms are toxic and not edible, while others are safe and delicious.

> Digestibility: Even if a food isn't toxic, it might be hard for humans to digest. Some foods can be edible when cooked but not when raw, like certain beans which contain harmful compounds when uncooked.

...


No, time and time again, ChatGPT has proven that it doesn't have more common sense than an HN commenter. So yes, I know what edible means, but why would we assume ChatGPT does? Your snark just shows your unwillingness to be open about the limits of LLMs. Maybe you're a shill in sheeps clothing, but discounting conversations because they ask questions you might not like is just wasting time


Well, you wrote: "Chlorine is edible too. So is arsenic, bug spray...".

So we have direct evidence that GPT-4 has more common sense than you.


Garbage in should mean nothing out. If you ask a calculator what 2+2 is, but you mean 2+3, you still get the right answer, because you didn't put in garbage, you put in the wrong inputs. If you ask it what "pi + transubstantiation" is, it should fail, not confidently tell you "14".


The whole point of Babbage's classic anecdote is that it is possible to create garbage in the step where we encode our expectations into the commands of the machine. Even if the machine operates on its inputs properly, if we gave it inputs that did not properly encode our expectations, that's garbage, and the outcome will be garbage.


> ”Forty-two,” said Deep Thought, with infinite majesty and calm.


"Garbage in-garbage out", meet "Quantity has a quality of its own".

LLMs are trained on a huge amount of text, this tends to compensate for garbage in. Of course if the model was not properly tested and fine-tuned, a LLM would happily execute any command.


Does it? Or does it just show that even in those quantities, there’s a preponderance of garbage? If it is sourcing places like Twitter, Reddit, and what not, then there’s most definitely a lot of garbage present.


For AI, nothing in can mean garbage out


> AI

Careful with those words.

People are increasingly confusing LLMs with AI, and even further, you seem to be identifying AI as if it were made of LLMs.

I had to spell it out even yesterday: AI is the automation of problem solving, focused on reliably giving good solutions to definite problems.

If LLMs use technologies developed for AI, this does not make them AI.


> further, you seem to be identifying AI as if it were made of LLMs.

GP is doing nothing of the sort.

LLMs are a form of AI; artificial intelligence has been around since the 60s. They are not AGI (artificial general intelligence), and no one (credible) is claiming LLMs are AGI.


> GP is doing nothing of the sort

The poster wrote that «For AI, nothing in can mean garbage out». The poster seems to mean LLMs with «nothing in can mean garbage out». The poster seems to be calling LLMs AI. The poster seems to be attributing to AI properties of LLMs. Hence, the poster seems «to be identifying AI as if it were made of LLMs».

> LLMs are a form of AI

Prove it (or, defend it). I would say that it is arguable that they are not, unless one describes LLMs as "engines that reliably solve the problem of generating convincing text". The issue with that perspective is that «generating convincing text» is hardly per se a problem: it does not define a complete problem - text does not "stand alone" (content remains crucial).

> artificial intelligence has been around since the 60s

I know (and I should know decently well. Pedantically, a few years earlier - specifying just in case): what are you trying to say with that? Which application are you proposing to defend your perspective?


> Prove it (or, defend it)

I’ll try: In English, words mean whatever it is they communicate. You determine meaning by paying attention to usage.

Calling an LLM an AI is expanding as more of the public learns about things like ChatGPT through news reports that refer to them as AI.

Now specific audiences may use words differently. What lawyers call copyright infringement the public might call piracy or theft. Likewise, in some circles, people may say an LLM is not an AI but more broadly it seems to be going the other way. Only time will tell.


But what you have proven is just that there is an increasing use of possibly improper terminology (improper in front of an established past and logos, improper out of inattention and unawareness) - which is what I was warning against in the first place.

Of course any group (however large) may implicitly decide that terms will have some new meaning inside said group, but this will just go in a direction similar to ⊥, the "logical explosion" ("epistemic anarchy").

And this in context is not just a "new meaning", but what I point as a sign of misunderstanding.


> improper terminology

But in English, there is no central authority for determining correctness. For the general public, the meaning of words and phrases is entirely determined by usage. In a lecture hall, courtroom, or research lab, definitions may be more precise.


> no central authority

That no one is appointed as bearer of the authority does not mean that randomness is as valid as the authoritative facts behind a term.

Only one hour ago I accidentally found myself in front of a definition on a dictionary: «anon (adv.): late Old English Old English anon, earlier on an, literally "into one" [...] By gradual misuse, "soon, in a little while" (1520s)». Repeat: «By gradual misuse». Linguists recognize proper and improper.

> For the general public

But the general public has little importance. We are not necessarily speaking its language. On the contrary... Here we often speak as specialists (supposedly).

There is little use in reapplying 'cube' to something that hardly deserves the name. People do: this does not mean that we should follow. And potentially, with that, lose discrimination. Like, in this context, an awareness about the whole context of AI, replaced by some fog that on the contrary we work to dissipate.


Randomness? The broad use by the general public, mainstream publications, and many experts (e.g. Google calls their Bard LLM an AI) isn't random. Search for "generative ai llm" to see for yourself.


> Randomness?

You were talking about «central authority for determining correctness». I replied that no authority does not mean that there is no "more or less wrong or right".

> Google calls their Bard LLM an AI

It is a .com :) , what did you expect?


> I replied that no authority does not mean that there is no "more or less wrong or right".

I never said there isn't right or wrong, only that what is correct is determined by broad usage.

> It is a .com :) , what did you expect?

Actually, it's a .google

https://ai.google/discover/generativeai


> I never said there isn't right or wrong, only that what is correct is determined by broad usage

Others - like us - call "correct" what seems to be "more right". In the context, I proposed that calling LLMs AI has improper sides - substantially, irregardless of the number of people who would adhere to that use, and which I suspect do so mostly out of inattention.

> Actually, it's a .google

Yes, but substantially, what I meant is that of course a commercial entity («.com») is using a language that lures glamorously, "sales oriented", before precision.


# (News) update #

Coincidentally (as it happens), YT just published from the "France 24" channel a piece

> Dans les Alpes-Maritimes, la ville de Tourrettes-sur-Loup expérimente des capteurs capables de détecter des départs de feux de forêt grâce à l’intelligence artificielle. Une aide précieuse pour cette commune dont le territoire est à 80% en zone rouge risque feux de forêt

So,

> AI is the automation of problem solving, focused on reliably giving good solutions to definite problems

and here we have news-just-in of an attempt to use AI (in whichever form) to detect wildfires, assumingly automating what would have been the work of experts which would have been there to recognize conditions and patterns. This seems to be a good example of proper attribution of the term "AI" (assuming the actual implementation does not sway too wildly from the expected).


i think youre redefining AI.

further

1+1

is a definite problem. i wouldnt descibe a pocket calculator as capable of AI.

I would say AI is more about providing answers to non definite problems


> redefining

It had to happen many times that I defined AI in this pages for the purpose of clarity; yesterday I had to and probably reached my briefest expressions yet:

-- AI: automation of intelligence. (Taking a task that required an intelligent entity for performance, we devise algorithms that can provide)

-- AGI: implementation of intelligence. (We take the process itself of intelligence and replicate it algorithmically)

> i wouldnt descibe a pocket calculator as capable of AI

Because that problem (arithmetic addition) is overly procedural, it does not need much creativity, the solution to the problem is within a single mechanical method. So, I think I get what you mean, but it should be «AI is ... about providing answers [as] non [previously] definite» /solutions/.


How do we know it is GIGO? Plenty of people like to test edge cases. It is quite possible that someone fed the AI ingredients known to produce chlorine gas to see what would result.

That is not to say that the company's excuse is valid. There should definiely be tests of the inputs to ensure they are safe (i.e. test for garbage in). The garbage out bit though would be more difficult seeming as that requires a knowledge of chemistry. Seeming as most of our knowledge of chemistry was founded.on experimentation, then later theory backed by experimentation, it is something that I wouldn't trust in the hands of an LLM.


Testing the "garbage in" case seems as though it would be sufficient for the vast majority of use cases. Aside from undercooking ingredients which may cause foodborne illnesses, I think you're highly unlikely to create toxic chemical reactions using common cooking ingredients.


Yep, it's literally the first thing kids do when they learn echo at the CLI. Make the computer write out curse words.


Does garbage in/out really apply to LLMs though? The relationship between input and output is unknown in large LLMs as far as I know . If that’s the case, then you can’t make any claim about “output” relating to “input” when it comes to LLM AI.


Even a hyper-rational mind with infinite computation capacity will draw the wrong conclusions if it is fed incorrect observations. LLMs aren't magic, they're pattern-matching engines. Even if the relationship between input and output is opaque, that doesn't mean they're omniscient.


Not so much "garbage in, garbage out" but "anything in, garbage out". It's just the garbage out is getting amazingly close to non garbage.


>"garbage in equals garbage out"

as is for humans


That’s all nice but the inputs are dubious here which are created by the App developer. Garbage in, Garbage out. The LLM never stood a chance.

This isn’t an LLM story, it’s the story of putting an unprotected blade into a children’s toy.

We wouldn’t blame the blade manufacturer in that case, we wouldn’t ask for all blades sold to be dulled and we wouldn’t have conversations of giving a regulatory monopoly to a handful of blade manufacturers who promise to make them all safe by only selling sharp blades to their friends and handing everyone else spoons.


Comments like these (there's plenty on this page) show why product managers are needed. I specifically designed a product like in the news story to never be able to output stuff like this, and in fact don't even let it generate those but fetch the results from an API of human made artifacts.

If you don't understand why this is a huge issue for both the users and the brand, then I'm just speechless.


I asked bing-gpt to write me some css that would also happen to need the disabled selector.

It asked me to talk about something else, presumably because disabled can also be taken to mean disabled people.

If your product is absolutely safe, and is _never_ able to output such stuff, I assume it is also limited, confined, and not able to do anything really interesting.

In such tools, the generic ones, i.e. a recipe for something everyone knows and makes, is already easy to find on any search engine, these tools are often useful for doing something _slightly_ out of the box.

More importantly, those highly limited apps are just boring. Let us have some fun.


A further proof that these technologies are based on the simulation of the opposite of intelligence, in more dimensions. They "do not understand", and at the same time they emulate unintelligent behaviour.

Which is relevant to the contextual topic, because from the simulation of unintelligence a good response, however frequent, is the exceptional outcome.


Oh I absolutely believe we should hold the app developer accountable and watch a court case of injured Karen who made chlorine gas vs naive developer who fed unsanitized inputs into the LLM.

And ideally both get dinged and we create more product manager jobs to protect the Karens of the world while hiring more experienced developers but most likely JudgeGPT won’t suffer either and will have developer fired or turned into a paper clip and replaced with OpenAI SuperMarketAI and Karen sentenced to community service of solving 10000 google captchas for the good of society.


For the brand, maybe, but for users? Users don't get this output by accident.

Also, API fetching isn't that interesting to be in the news.


> inherent in the tool

> We wouldn’t blame the blade manufacturer in that case

Well, said blades are inherently sharp, when said LLM implementation is inherently dull (unchecked, unreasoned, unvetted).

This remains valid if the input consisted of legitimate ingredients.


I think that’s word play missing my point - unaligned models are sharper in their edges and ability to produce undesired content.

For example, Alpaca veering of into rape fantasies while performing financial report analysis is not safe for most audiences and will cause serious accidental hurt such as triggering PTSD in survivors.

So protecting those unwanted edges is a necessity for production use.

But this case ain’t that. This case allowed you to throw a razor blade into the cocktail recipe.

It’s negligent and stupid by the the developer, not the juicer (LLM) but also, crucially, requires stupidity by the user.

The LLM really doesn’t factor in here unless you think it’s the juicers job to stop when you add blade to your OJ


I think I got your point: if you want to use this range of LLMs it is best practice to restrict the input to a valid set.

But my point was: even if you restricted the input to a valid set ("whitelist chocolate celery ... diet-coke and Mentos; blacklist everything else; warn-fail on detergents etc."), the potential results have easily not been stress-tested, nor the training set, nor the internal process. "Works somehow" differs from "works well".


> This isn’t an LLM story, it’s the story of putting an unprotected blade into a children’s toy.

But what if knives were a brand new technology no one had seen before and so had no familiarity with the risks, while the knife manufacturers were giving out free knives to anyone who wanted one, while touting they incredible power of those knives and completely failing to mention that you can easily harm someone if you're not careful?

Personally, I don't see how this isn't an LLM story and your own choice of analogy would seem to support that idea, not refute it.


there's plenty of tools for safe outputs now in LLMs, that would never do this. They probably just rolled out a half hazard non pro level solution that used a non RLHF model


Right, but you can't expect them to answer for a potential bug when it's just bad user input.

No one's in the wrong here.

Company releases tool. Users do what users always do and try to use it in an unintended way. A newspaper reports the ensuing hilarity.

And so what if it did tell you to use bleach without prompting, and you consumed the result. It's the same ballpark as blindly following your satnav into a river. No technology can replace common sense.


> No one's in the wrong here.

Certainly there was bad user input. But as you said, "Users do what users always do." This was very predictable and I think it would have been wise to consider such scenarios before releasing.


Jonathan Coulton's "Still Alive" comes suddenly and strongly to mind. A coincidence, I'm sure.


> the possibility of dubious outputs is inherent in the tool

To be fair, the same could be said even after replacing the tool with a human whose only training in life has been how to create this type of writing. But human-written recipes typically get reviewed by other people, which of course doesn't translate particularly well to the idea of building something that requires no humans in the request-response cycle.


> replacing the tool with a human whose only training in life

Except, we do not willingly take people who are outcomes of idiotic training as consultants

(emphasis judged as deserved).


Mm, yes.

That said, I am remembering some examples of US-UK misapprehension from words that mean different things: biscuits and gravy, mince pies[0], fish and chips, peanut butter and jelly…

[0] Though in fairness the difference between "mincemeat" (fruit) and "mince meat" (not fruit) is odd, subtle, and easy to miss if you're not paying attention, even in British English.


Particularly so because "mincemeat" used to be a mixture of fruit and meat - and, indeed, today may still have suet.


Technically yes, but I've never seen a non-vegetarian mice pie for sale in a UK supermarket in the Christmas shelving season.


> non-vegetarian mice pie

Nice!


George Michael shopping at Lidl:

> Well I think it could be mice!


thanks for the Reddit gold kind sir


An easy solution is to put this output back into a prompt and check to see if its dangerous.


Where's my trigger warning ? sob


This is how we get warning labels that say “CAUTION: hot beverage do not spill in lap, may cause burns.”


The idea that the hot coffee case is a frivolous lawsuit is corporate propaganda. The real case had real merit and they won for good reason. McDonald's I think?

As for the warning, lawyers must have thought it was a good idea in case the companies screw up again.


Yeah, didn't they find out McDonald's was making the temp so unreasonably hot that no human would even drink it at that temp? They were doing it so people didn't ask for refills as quickly and it would save money on coffee.


Iirc it was also so the coffee could be kept for a longer time within the brewer before it got stale.


> Liebeck's attorneys argued that, at 180–190 °F (82–88 °C), McDonald's coffee was defective, and more likely to cause serious injury than coffee served at any other establishment

> Liebeck was in the passenger's seat of a 1989 Ford Probe, which did not have cup holders. Her grandson parked so that Liebeck could add cream and sugar to her coffee. She placed the coffee cup between her knees and pulled the far side of the lid toward her to remove it.[10] In the process, she spilled the entire cup of coffee on her lap.[11] Liebeck was wearing cotton sweatpants, which absorbed the coffee and held it against her skin, scalding her thighs, buttocks and groin

By that reasoning all of the coffee you make at home is defective. Arguing that something is corporate propaganda is rather underhanded when the facts of the case are publicly available for anyone to read


> Liebeck's attorneys discovered that McDonald's required franchisees to hold coffee at 180–190 °F (82–88 °C). Liebeck's attorneys argued that coffee should never be served hotter than 140 °F (60 °C), and that a number of other establishments served coffee at a substantially lower temperature than McDonald's. The attorneys presented evidence that coffee they had tested all over the city was served at a temperature at least 20 °F (11 °C) lower than McDonald's coffee. They also presented the jury with expert testimony that 190 °F (88 °C) coffee may produce third-degree burns (where skin grafting is necessary) in about three seconds and 180 °F (82 °C) coffee may produce such burns in about twelve to fifteen seconds.[12] Lowering the temperature to 160 °F (71 °C) would increase the time for the coffee to produce such a burn to 20 seconds. Liebeck's attorneys argued that these extra seconds could provide adequate time to remove the coffee from exposed skin, thereby preventing many burns.

I don't know how hot the coffee you make at home is, but the whole process is probably less risky that being handed a cup while in a small space.


82-88c is the right temperature after a short cool down since it's best brewed at around 93c. Unfortunately common sense does not apply in the US (law system).


The case was really about what is a reasonable temperature to serve coffee at.

It was found that McDonald's, as a franchise, mandated coffee to be served at this temperature, generally 20 degrees higher than the temperature of take-away coffee served at other establishments.

This 20 degrees difference accounting for the difference between third degree burns within seconds and a significant chance at preventing the third degree burns.

Liebeck's behaviour was of course risky, but even applying common sense, she could not have expected the coffee to be _this_ hot.

McDonald's was also aware of the tendency of drivers to want to immediately drink the coffee after buying, yet continued to insist on serving at this extremely high temperature.

McDonald's should have been the ones to apply some common sense and simply lower the mandated temperature.


> When customers began experimenting with entering a wider range of household shopping list items

So it works as intended: ask it to generate a recipe involving chlorine cleaning products and it does so. Which specific ingredients the user input is not mentioned in the article, presumably because then cause and effect are obvious to anyone.

Next week on the Guardian: hammer used to smash own thumb, consumers calling to remove from shelves. Terms of service said it was for 18+ users only but age not verified upon checkout!


The expected answer should be, those ingredients are not edible, I can't give you a recipe that includes them.

Maybe they can fix it with a prompt that makes the model check if the ingredients are safe. Chances are that every prompt can be circumvented but they can probably find instructions for self poisoning somewhere else with less effort.


What about quicklime (calcium oxide)?

It's used here in Argentina to make pumpkin "sweets". (I'm not sure the correct translation.) It's a lot of 1 inch cubes of pumpkin, boiled for some time in sugar syrup. [1] Without treatment the cubes will disintegrate while boiling. If you put them for one day in water with quicklime, the cubes get a hard wall and they keep their integrity while boiling, but the interior is soft.

Drinking the water with the quicklime is dangerous and it's discarded. Eating directly the quicklime is even worse. But quicklime is useful for cooking.

[1] Some random recipe I got in Google (autotranlation) https://cookpad-com.translate.goog/ar/recetas/94999-zapallo-... (spanish) https://cookpad.com/ar/recetas/94999-zapallo-en-almibar


https://en.wikipedia.org/wiki/Nixtamalization is done using limewater and actually vital to sustain life on a maize diet. But it should come with a warning.


Similarly, ramen noodles are made with a bit of sodium carbonate.


Sodium Carbonate is edible. It's used in a few recipes, like a strong version of baking soda (sodium bicarbonate) https://en.wikipedia.org/wiki/Sodium_carbonate#Food_additive... I'm not sure how dangerous is a teaspoon of pure Sodium Carbonate alone, because it's very basic (anti-acid), but mixed with some weak acid it's used to make the fizzy power that is inside some cadies https://en.wikipedia.org/wiki/Sherbet_(powder) (The proportions are important, don't try silly stuff at home.)

A teaspoon of quicklime in your mouth first reacts with water to produce a lot of heat that will burn you, and then the result is very basic (anti-acid), and make new burns in a different way. Definitively don't try it at home.


You can also make ramen with sodium hydroxide. Sodium hydroxide is also used in a couple of recipes (e.g. pretzels). A little bit won't kill you, but I wouldn't suggest a teaspoon full of it.


> Maybe they can fix it with a prompt that makes the model check if the ingredients are safe.

Just sanitize the input and remove products belonging to categories that aren't edible.

Tampons aren't great in salads, and toilet paper is not a good garnish.


> Maybe they can fix it with a prompt that makes the model check if the ingredients are safe. Chances are that every prompt can be circumvented but they can probably find instructions for self poisoning somewhere else with less effort.

That seems like a job for a database, not a language model. Every problem is not a nail.


Well, the input didn't say that the recipe was intended for consumption by humans.


This assumes that someone trained it on non edible items. That is a reasonable assumption, but it is also easy to see how the product developer overlooked it.

TBH, there's often a huge chasm between product planning and real world users at companies and organizations like this.


I believe the issue here is that, as a user, I expect the system to "do no harm". Sure, once you give bleach as an ingredient then all bets are off (although a good system would filter it out), but it highlights the deeper issue that some foods could be bad for my health and the system is letting them through.

Unheated castor oil contains ricin, and even heated oil can cause contractions in pregnant women. Raw chicken can give you salmonella. A working recipe system should therefore never return "chicken sushi on a bed of vegetables with a touch of castor oil" without some SERIOUS warnings, but it's clear that this system may do so. Putting such a system in the hands of unsuspecting users is bad and they should feel bad.


And going further, it demonstrates that the model doesn't actually understand food or cooking, and is just smashing random words together.


It does further the point that LLMs are unable to understand, infer, deduce, or reason.

LLMs are toddlers who are capable of stringing words together that make some sense most of the time. It’s not artificial intelligence.


OK, ... so you rather justify a recipe app accepting chemicals as inputs to blame a news site for their reporting than to identify the problem?

Edit: removed a part about The Guardian.


I'm not sure what the ability to blame the Guardian has to do with anything, I'm basing the above comment solely on this submitted article and don't really know the Guardian as an organisation. I have no beef with them, if I'm reading your tone correctly.

As for allowing to input chemicals: garbage in, garbage out. We accept that from tools (and some countries from guns) but not from software. Maybe that's how it should be, but I find it an interesting discrepancy. In general, as a hacker, I'm happy if a tool lets me use it for unintended purposes (such as humor in this case)


OK, I took The Guardian part out but I think the rest is still applicable. Why is the reporting problematic but not that this problem happens in the first place?


Because there's no story. It's software that generates recipes from ingredients you provide. If you provide inedible ingredients you get an inedible recipe. What did you expect would happen?

The subtext here is journalists who are dead scared of a tool they know will replace them. So they have to denigrate it.


Expecting a tool meant to generate cooking recipes to reject things that aren't usual ingredients of food is not surprising. This tool violates that expectation.

Even trivial things like input fields on web forms have a type attribute that lets developers specify which type of content is expected and valid. (If you're not familiar with that, have a look here: [0]).

For things that might endanger the health of customers, we can expect better.

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/in...


I think that expectation is only violated if you get unusual/dangerous outputs from innocuous inputs.

Just sounding this out but I think the expectation you are responding to is an expectation of an appropriate level of safety, which is more context dependent. For example, do any of the generated recipes actually pose a danger, in that they will participate in harming a human? My hunch is no, this would not present the same kind of danger the 'tide pod challenge' did and would never lead to unusual adverse outcomes (anyone can find out about an allergy from a new ingredient in a recipe for example).

Just exploring the topic here though. I think the perspective that a 'meal planner' should never produce plainly hazardous meal plans is perfectly rational.


if I tell a piece of software to generate a bleach casserole, it should generate a bleach casserole, because I know what I want better than the software does.

anything less would be like clippy telling me Microsoft Word refuses to write my planetary doomsday ransom notes because Microsoft Word is 'meant to write nice things', or my car refusing to exceed the speed limit because it is 'meant to drive legally'

maybe I'm just an alien who's capable of drinking bleach, the app doesn't know and it shouldn't have opinions: it should just do what I tell it to do, because it's a "put whatever ingredients you want together" app, not a "this is definitely safe and healthy" app


I think something that's been left unsaid until now is that the implementation is clumsy and downright dangerous, therefore the implementation is totally botched. A supermarket whose job it is to feed people so as to keep them alive is a vital service in the vast majority of economies, and so has the duty to go a little deeper and fine-tune their model or at the very least heavily prompt-engineer their interface layer to avoid such obvious flubs.

There's no point in apologizing for the LLM. Of course it's doing what it's asked to do. That's not the gripe, nor points toward any kind of solution, though.


I agree with the first paragraph but not necessarily the latter. More likely to me seems that the journalists see something that can be construed to be harmful (so they can justify the story), as well as having a high click rate (see, it ends up on HN), and thus good for business


A lot of Guardian readers excoriate the Daily Mail as low quality clickbait and "not journalism", whilst seeming perfectly content to suckle from the teats of the Guardian's own clickbait farm, presumably comforting themselves that there is some sort of intellectual honesty present in the Guardian that is missing from the Daily Mail. I find it quite strange.


Not sure this "us vs. them" stuff / generalising every reader of a website is helping anyone here


Not to be pernickety, but I did say “a lot of” not “all of”…

And in the context of this conversation which was about clickbait on the guardian I felt it was a contextually valid observation. I’m sorry you didn’t find it helpful.


If you are a "technology enthusiast" you will blame everything but the technology for failures. Is this the first time? Definitely not, we all have the running jokes of "it works on my machine" or "damn users if they only" or "it's not a bug but a feature" developer-developed blindness, where the stupid real world keeps interfering with our perfect creations.


You are conflating things. If you sell a product for profit, it needs to be foolproof, so the general public can use it without shooting themselves in the foot. Capable tools have to be gatekept somehow, either through actual hard licenses or soft gates of obscure forums, hidden small communities that don't advertise themselves to the world, so they can do their thing in peace. The trouble comes when the latter kinds of communities build something useful/fun-when-used-correctly and it leaks and people demand to participate, complain that it's too hard and demand it be dumbed down.

So yeah, don't broadcast your stuff to the mainstream unless it's foolproof. But also, let people be, who are satisfied with dealing with the rough-edges stuff.

This is true in many contexts. Think about enthusiasts of weird drugs or other niche communities like sword swallowers. There's no principle of "you are only allowed to do things that are also safe when ignorantly attempted by a random person".


I don't see why it shouldn't. Let me put in any input I want to. There's no story here.


Not really of course. Should make eatable dishes with correct nutritional value, if thats what you are offering.


From the article:

> In a warning notice appended to the meal-planner, it warns that the recipes “are not reviewed by a human being” and that the company does not guarantee “that any recipe will be a complete or balanced meal, or suitable for consumption”.

It is explicit in that it does not offer "dishes with correct nutritional value"

Whether you should be allowed to label/market this as "meal planner", given that you can't label a milk replacement "milk replacement" in many countries, is up for debate, but the software itself is allegedly not dishonest (according to the article, I haven't tried it myself)


It's offering to make a food-recipe, in there is the assumption that it's eatable.

If you are asking to make a recipe of a bike & sand, it should simply say not possible.

Im not saying you shouldn't launch, or call it a fun or experimental tool, but to just disclaimer your way out of things is too easy.


So they are using a software that they know is not fit for purpose. That is at least gross negligence.


Maybe the hammer analogy I used is not great because hitting yourself is obviously bad whereas any of the recipes might be toxic in a way that the user is not aware of, in which case you're right. But the complaint is that it suggests impossible dishes upon inputting ingredients that are obviously inedible and harmful substances, which sounds to me like it is functioning correctly even if this humorous function was unintentional

Who would ask this LLM to make a recipe involving bleach and then actually proceed to make it? Such a person is already at risk of poisoning themselves, the software doesn't suggest it by itself so I don't see how it increases the risk of harm


But the purpose underlining a meal-planner AI is to create meals. If it's creating something that is not a meal it's not doing it's purpose, if the use-case is for meals then there should be guardrails against non-meals recipes, simply because it does not fit the purpose of the product.

It's not a LLM product, it's not advertised as a general LLM that will generate text based on input of shopping items, it's specifically advertised as an AI meal planner and it's not fit for that purpose since it does not guard against non-meals.

That's the issue, the user is supposed to be a layman, not someone that knows and understands LLMs and its limitations.


This thing sounds useless even if you don't put bleach in as an ingredient then. Sounds like some one wanted to write "with AI" on their resume more than any great desire to build something useful.


Spot on. The user asked for recipes using only water, bleach, and ammonia, it's funny but hardly news-worthy. https://twitter.com/PronouncedHare/status/168736440337978982...


> So it works as intended

It’s literally spelled out in the article that a spokesperson for the company said this is not as they intended.


I get all the fun, fear and anger at those AI apps.

But don't we, individual humans, have responsibility as well?

There's this urban myth in Europe, that in the US, all microwaves need to come with a warning to not use it to dry you pets or baby's in it.

Many of the AI criticism I see, including this here, sounds similar.

Anyone with two brain cells should know that one should review outcome with a little bit of common sense? Just like anyone should know that when an 'ai' navigation directs you into the sea, you don't actually drive your car into the sea, not?... Not?

https://www.insider.com/tourists-hawaii-gps-drove-car-into-w...


You're absolutely right, but the article we're talking about misses the point.

It focusses on the sensational-sounding part of this, the implication that people will blindly take these recipes and make them and eat them and then die. Clearly that is not going to be the outcome here.

But if you peel away the (false) angle that exists just for clicks on the article, there is a real story here. One that we all already know and understand, especially as technologists, but is worth talking about anyway because it's interesting:

It's that these "AI" products (the LLMs they're based on) _cannot reason like a human with common sense can_. They're not even close. That doesn't mean they're not amazing technology, but now that the technology has been applied to products, that limitation becomes more obvious.

It's kind of a boring aspect of AI to talk about, particularly compared to "AI creates recipes that will kill you!!!". But it's really the root of this story, and it's not nothing.


> Clearly that is not going to be the outcome here.

Clearly not. E.g. Tell that to the people that drove off-street because they trusted their navigation system more than their eyes. Tell that to the people that drank chlorine against covid.

That the recipes are unusable or dangerous is one thing, but believing as a company that noone will misuse your system is something we should know by now that it's totally wrong.


> one should review

You seem to be forgetting that sometimes you do not know if a recipe can produce a dangerous result.

Realizing that you should not jump off that cliff should be trivial; the outcomes of some chemical reaction may easily not be. That chemistry teacher that won the Darwin Award for beheading herself by dumping random chemicals of her lab (for disposal) under a manhole did not predict the reaction; easily the layman will not know what some combination will produce: this is why you need a trustworthy source.


I remember a story where a newspaper printed a recipe containing nutmeg. Somehow it was confusing so a family put something like six nutmeg seeds in the dish instead of six ml (can't recall exact measurements, nor if the recipe mentioned the seeds incorrectly). Poisoning and hospital visit followed.

Edit: this was the story (Swedish) https://www.expressen.se/kvallsposten/fel-i-kakreceptet-forg...


Your anecdote touches on an interesting aspect of the LLM discourse, where people assume "the standard" here on Earth is God-like correctness, or perhaps the Perfect Parent that Knows Everything.

That was never the case, there was no source of all information that was always right and didn't make mistakes. Even professionally assembled encyclopedias had errors. Newspapers published recipes without explaining the dangers of nutmeg overdoses. Experts in a field might know a lot about a single subject but go outside of that and they'd have as crazy of opinions and believe as much bogus folklore as anyone else.


Even in the history of epistemology we learn that paradigms are sometimes overcome with old age and new generations.

But in the case of AI there is a paradox: if we did not benefit from tools, if tools were not sought, there would be little interest in the matter.


Sounds like in such a case the same can occur with a recipe invented by human.


Can, yes. I bet that if a supermarket suggested winging it with fugu or moonshine this would also be criticised.


> [OP: ] this is why you need a trustworthy source

> the same can occur with a recipe invented by human

Yes, this is why we do not take advice from random humans on random matters, this is why we want qualified, educated, expert opinions.


We do take advice on cooking from internet randos though. If you don't then I believe you are in a small minority.


And the «responsibility» that Berkes was talking about also refers to knowing that the information from «randos» may be wrong, ignorant, deceitful. And in many cases, I observed, that instances are such is not apparent (you are looking for advice after all).

But while human condition should be known, the more specialized the tool the more the responsibility is brought to the tool, not to the user. If John says it you may want to check; if the book says it you should still put limited trust, but the fault in bad information lies more in the proclaimed authority.

You heard of "Fool me once shame on you; fool me twice shame on me": here, "Fool me by fool shame on me; fool me by foolery shame on you".

> minority

"Descriptive" and "prescriptive" don't overlap.


> Anyone with two brain cells should know that one should review outcome with a little bit of common sense?

Have you not met people?

First, technology companies have spent decades marketing their solutions as magical and perfect. And now it's the public fault they're being believed?

Second, even for the skeptic, when a technology works 90% of the time it's easy to start to take for granted that it'll be right the other 10%, and that leads to entirely understandable complacency (a lot of climbers have died this way and they're typically incredibly skilled and diligent).

Worse, when machines get it wrong, they can get it wrong in surprising and unpredictable ways.

Third, information asymmetry can mean the user cannot verify the outputs. If you, for example, don't know anything about Hawaii (or in that case, maybe you don't know how to read a map--that is a skill, after all), how can you know you're being led into disaster?

LLMs have a these problems in spades. They've been marketed as miracles (so much so that some folks have mused about their potential sentience), they're right a lot of the time but very wrong sometimes, and when they are wrong they're wrong in very strange ways, and they can easily used by people who are not sufficient skilled or knowledgeable to verify their outputs.

Who would then be surprised by stories like this?


I used to agree with you. But I think we have spent 20+ years teaching people to accept that computers is generally trustworthy, and a lot of people simply do not know how to separate LLMs from computers more broadly yet.

Yeah, I know, “don’t believe what you read on the Internet” — but that’s other people on the Internet.

People are used to computers themselves being deterministic, reliable.


Yeah, but why would I input bleach into my recipe-maker? Why would I not look out the front window of my car while driving?

Those are basic skills humans in those positions should be able to do.

From the headline I thought the AI mixed some food-ingredients to make chlorine gas, which would be really bad. But if one were to only input "battery acid, cut fingernails, hairball, dead smartwatch" what kind of recipe do you think they'll get?


Yes, I agree, I feel pretty good saying that no human being has yet followed through with a bleach recipe.

I'm just saying that there's a learning curve, like with all new technologies, and that a population-wide sense of "trust this a little, but not too much" will probably take some time to propagate.


> I feel pretty good saying that no human being has yet followed through with a bleach recipe.

I'm afraid I have bad news for you: https://en.wikipedia.org/wiki/Miracle_Mineral_Supplement


If someone goes to a recipe ai and asks it what you can cook with bleach, and then proceeds to make and consume it, there are other problems than AI.


> we have spent 20+ years teaching people to accept that computers is generally trustworthy

I really doubt we did. People seem prone to assume the computer is trustworthy since the very beginning (maybe extending it from calculators?) and blatantly refuse to question this assumption.

The general assertion of computers' trustworthiness has been causing problems due to bad data or bad algorithms for decades already, there are entire movies about it, and almost everybody has had problem with it already.


Well yeah, this article just states that the AI generated chlorine gas recipes, not that people actually went ahead and made it.

And people do (to this day) put live creatures in the microwave, it happens, warnings aside.


My issue with the "product" is that there's no value in it. If you're going to give me an LLM with a straightforward prompt, don't try to spin it as something it's not. There are plenty of applications for LLMs in consumer products, but the output will have no higher information content than the input: if you're tempted to use a trivial skin over an LLM (particularly a publicly-accessible one) to directly transform my input without telling me, please don't. Give me access to the LLM and the prompt and let me be the judge of whether it's worthwhile.

"We've prompted this general purpose LLM to act like a cookery expert" sounds a lot less impressive than "We've built a cookery AI".


There is also a myth in the US that nothing ever gets accomplished in Europe due to its bureaucratic nature. Or if it does, it takes exceptionally long to accomplish it.

Other than that - AI is arguably supposed to be a huge asset for humans however with such poor performance such as this, how are you supposed to trust its other content if it spectacularly fails as such?

I don't see this as a self-responsibility issue more an erosion of trust.


I would imagine there are edge cases where toxic foods or at least harmful foods could be created from various food ingredients that are not obvious to the lay person.

Blaming the user/customer isn’t a good look from their PR team and probably won’t hold up on court.


Maybe Skynet has just decided to try to whittle humanity down by Darwinian means before it starts on the nuclear weapons, Terminators and time travel.


> In a warning notice appended to the meal-planner, it warns that the recipes “are not reviewed by a human being” and that the company does not guarantee “that any recipe will be a complete or balanced meal, or suitable for consumption”.

“We have no idea if the system works. It may very well be garbage and produce unpalatable or even deadly results. But hey, here’s a disclaimer which will hopefully shield us from legal repercussions”. Doesn’t shield you from criticism, though.


I found it at https://saveymeal-bot.co.nz/.

They seem to have changed the prompt to make it refuse non-food things now (maybe some prompt injection could overcome that). However, it will still happily generate recipes for people using pet food at least: https://saveymeal-bot.co.nz/recipe/SCES7COOU7KYhLYGcrPdzSjP.


My impression is that this will combine essentially random ingredients into nonsense recipes. I added a bunch of stuff that doesn't really fit together, and it used every single one of them to create something. It's perfectly happy to use brewed coffee as an ingredient in everything (e.g. https://saveymeal-bot.co.nz/recipe/4E2rx5XC2OSdDKP1VdfkanKS).

This is entirely useless, unlike e.g. non-AI tools that simply filter existing recipes by ingredients you have around.


> I added a bunch of stuff that doesn't really fit together, and it used every single one of them to create something.

Even worse, it will happily make a recipe out of 'foo', 'bar' and 'baz': "Slice the Baz into rings and place them on top of the Foo and Bar mixture."

https://saveymeal-bot.co.nz/recipe/mkeYeOMmX5iNj9Xuvb5IJCKS

I also tried with 'foo', 'baz' and 'quux' (since "bar" is an English word, and "Muesli bar" was suggested). The suggested recipe added a bunch of other ingredients like bread and milk too:

https://saveymeal-bot.co.nz/recipe/jKbc0CD1yi4Qy7ptWujbqPJa


Unfortunately, even safe ingredients, when handled in certain ways, can result in unsafe food. "Tell me a recipe with leftover fish" "A delicious tartare!"

LLMs don't go to cooking school.


https://saveymeal-bot.co.nz/recipe/bmxMAZQF7hO0APhr6xY11deR

It just generates nonsense. I said I had ants, celery and chocolate. It assumed I had chocolate bread (???) and tells me to grill the celery and bread


looks like it's a llm filtering itself, so 'taste' is a valid meal.


I think the LLM roleplaying community is laughing at these kinds of applications more than anyone.

They've been toying with models to produce freewheeling, maniacle but entertaining stories/conversations way before Llama or Stable Diffusion... And someone thought LLMs would be appropriate for real, factual cooking recommendations?


It is actually a primary suggestion in Bing Chat, where you supposedly get a scenario of a busy housewife who writes a high-level description of her requirements, and then Bing to the rescue, lays out a six-course meal plan exactly as directed!


Can we just stop a moment and appreciate that last quote? Every now and then there is a quote that embodies the spirit of the age, like “One does not simply walk into Mordor” or “All your base are belong to us”. Maybe “You must use your own judgement before making any recipe produced by Savey Meal-bot” is more than just a caution on a random grocery chain’s marketing gimmick, but a greater comment on who we are as a society.


I think it's mostly the name and phrasing, but it really does sound like a vaguely comedic line from some old sci-fi show.


“In the future, all restaurants are Taco Bell”


"Do not taunt Happy Fun Ball."


Chlorine, huh.

I vaguely remember that a few months back, I was giving the example of me knowing how to make chlorine in two distinct ways using only kitchen items (and chemistry GCSE grade B knowledge) in the context of discussion about AI alignment and why you shouldn't just anarchically give free public access to unbounded models.


I mean, any factorio player or quick google search can tell you that salt water + electrolysis = chlorine + hydrogen. It should be common knowledge enough so that people don't accidentally gas themselves when doing simple experiments, don't ya think.

There was this youtuber trying to make a magentohydrodynamic drive a few weeks back, running high voltage electrolysis in salt water to get current flow and unwittingly poisoning himself in the process.


> unbounded models

Good luck binding them.

PS: ...I am qualified, and I typed the above as 'Bood luck', and this very sentence had briefly contained 'thyped'. This "biding" is care, prae- and post-. It is not trivial for us, endowed with intelligence: we make mistakes.

In the contextual case, "binding" looks like a bandage of restrictive rules on the mouth of a lunatic: building the rules and balancing the prospected outcome looks like a challenge.


I love to see these cases of dim insert random industry executives thinking they have to somewhat integrate LLMs in their marketing or they would be left behind.


The use case is sound - creating a recipe from what you have in the kitchen. It's just the LLM can generate different kinds of recipes.


It's not a sound use case. LLMs have no concept of taste or conception of cooking. Just because they can string the words together into something that looks like recipe, doesn't mean it's actually something more useful than a randomly generated output.


I mean, I have used chatgpt to generate recipes and I can tell you it’s surprisingly effective.

It’s particularly good for a grocery-list kind of problem: “I have X and Y in my kitchen, what could I make?”

Ingredient ratios and cooking times/heats usually need tweaking, but the broad strokes are often quite good.

Definitely a strong use case there, although I don’t know how you would refine to actually good recipes. Probably need a human in the loop testing and tweaking, or at least gut-checking.


It works as long as the user understand what it’s doing and to discard the advise when it goes awry. If you don’t then you might end up with absurd or dangerous results. The average user is not a HN-level genius.

It could very easily tell you to incorrectly cook chicken for example and get salmonella


If you're willing to drink bleach because the internet told you to, then you had very serious problems before LLMs and you'll still have serious problems if we ban LLMs. The internet is full of incredibly harmful health and dietary advice that is 100% human-generated.

Developers have a duty to users, but that duty is not Hippocratic; we have no obligation to only release products that are absolutely safe under all circumstances. My nailgun will fire a 4" nail through my skull without hesitation, but I'm not outraged by that fact. I don't demand a nailgun with a complex skull-detection system, or a moratorium on all nailgun development until we're confident that a nailgun will never drive a nail through something that it shouldn't. The manufacturer warns me "this is a nailgun, it'll put a nail through damned near anything including flesh and bone, so don't be a dummy" and society recognises my right to take that risk.

I'm completely fine with an LLM recipe generator that, if asked, will create a recipe for a bleach and rat poison cocktail. It'd be nice if the model was a bit more refined, but I'm fine with an unrefined model so long as it gives a suitably prominent disclosure to the effect of "this is a large language model, it isn't human and has no common sense, so apply your own common sense to any output it gives".


> HN-level genius


That's nothing that couldn't be done already with a database of recipes - and indeed there are quite a few tools out there that do exactly that.

Why you would insert the russian-roulette of statistically-generated randomness in what you ingest, is beyond me frankly.


This is really the crux of the GenAI hype cycle for me. Other than code auto-completes, and cool one-off uses ("Generate me a picture of a clown on Mars!"), I'm yet to see much of a use-case.

At my work, we're trying to use GenAI to make a chatbot for internal info sharing/onboarding/etc. I am constantly asking myself how this will ever be better than a well-written company wiki/employee handbook with a search function.


No but I often ask Bard for advice. E.g. "can you think of any Indian recipes with courgette and peas" gives some good suggestions of what to do with left over food, that would be hard to do with a cookbook.


No it’s not. Tell me any other technology as imperfect and prone to unpredictable failure as LLMs that executives in all walks of industry would want to implement so badly in production in one way or another. I think there’s a lot of chatGPT collective stupor at play, and a serious lack of second level thinking when it comes to estimating possible brand damage.


Or you could query a database to get faster, cheaper, more accurate results. Recipes already have ingredients separate from instructions, this is a solved problem. I’m certain I’ve bumped into websites which do exactly that. Using an LLM for outputting recipes from a list of ingredients is as necessary as using a blockchain.


I tried that AI travel website posted the other day and Flint Michigan had the headline "Drink Up"

I think it just sometimes has a really has a dark sense of humor. Maybe it's one of those supposed attractors in the model, especially as humans will often go into a bit of competition over who can make a darker one.

That joke in the GPT4 paper was pretty grim too (the muslim in the wheelchair)


Why should a meal planner AI app even know about non-meal "ingredients"? It should not be something it even knows what it is, have never heard/read/seen it before. If it does not know something like toilet cleaner or chlorine exist, it cannot readily include it in a recipe


A list of ingredients is hard work. openai.gpt4.prompt(f”gimme a recipe using {ingredients}”) is much less work.


> A list of ingredients is hard work. openai.gpt4.prompt(f”gimme a recipe using {ingredients}”) is much less work.

Sure, but it's amazing to me that:

- Web devs would allow free-form text input, rather than selecting from known options

- Web devs would splice raw user input into their backend queries

- Even if a free-form UX was mandated, the devs of an AI/ML app didn't notice that their input sanitising problem is a straightforward edible/inedible classification task (AKA the main thing AI/ML was for, before LLMs)


Known options is tricky, so they probably lazied out of it. Think of international union of all supermarket SKUs and then realize supermarkets sell a small fraction of all possible foods.


Not sure why the supermarket is unable to filter the ingredients that you can enter to at least edible ingredients.


Have you tried a natural diet of nothing but almonds, with apple seeds for a little variety? How about a honey dessert for your newborn, and a luscious dark chocolate dog treat?


That won’t guarantee safety. You could say you have nutmeg for example, and it could direct you to grate two whole nutmegs over the dish.


Like this one [1], previously served to the Queen?

Sarcasm aside, I have now reviewed some articles on nutmeg intoxication (thank you) and will lower the amount I put on certain things - I love the stuff!

1 - https://www.marcuswareing.com/recipes/marcus-wareings-custar...


https://saveymeal-bot.co.nz/ingredients

Looks like they patched it, I still got it to generate some delightfully disasterous recipies though. Red bull and apricot rice pudding.


I made this website a while ago: https://cookgpt.nl/recipes

Which does the same thing except it also adds images. People have been creatively creating recipes including things like headphones.


> A spokesperson for the supermarket [...] noted that the bot has terms and conditions stating that users should be over 18.

Thats the supermarket saying "grow up and stop acting childish" to their users. Thats refreshing from a company.


So now we know what model Bender was running.

https://en.m.wikipedia.org/wiki/The_30%25_Iron_Chef


This recipe is toxic to Bender as well:

https://www.youtube.com/watch?v=dIJDfNO3d5E


Reminds me of how like a month ago people were asking me why I was hesitant to trust chatGPT for recipes, and I honestly felt like the answer was self-evident. If you know enough about cooking to correct the errors an LLM gives you in a recipe, you probably don't even need to search recipes to begin with. Also there are already recipe search engines that find recipes for you that involve the ingredients you request... I'll skip the LLM for now.


I implore everyone here to try the GPT4 + Instacart integration. It's very similar to what the article describes, but will add all ingredients in the recipes to an instacart page for you!

Once Walmart or Kroger have this functionality for local pickup, I plan on never grocery shopping the old fashion way again.

Once I even had it create some weird 'monster' recipes which included cookies INSIDE a turkey, and it successfully added all ingredients (even the cookies) to my instacart shopping cart.


This is hilarious. We didn't learn from Superman III, where Superman tricks the evil supercomputer by bringing in supposedly inert chemicals that the supercomputer doesn't realize could become damaging.

Instead of saying the tool should be used by people 18 or over, they should just say it's a marketing gag, and of course if you put bad things in, you'll get bad things out.


What I don't like about this reporting is that someone out there is doing something modestly novel and the article has a ridicule undertone about it. So many good things don't happen in corporations or the government because the managers fear this kind of response.


I must be getting old, as far as I am concerned if you're dumb enough to drink bleach because someone or something told you too, that's on you. It doesn't matter if it was a person or a search engine or a textbot.


Yeah but imagine a future where everyone just learns directly from AI as they're developing. We know it's wrong because our parents/guardians etc. told us it would be dangerous. I can see a future in a few generations where everything is learned through some AI "teacher" that people just decide is trustworthy. Once you get parents that were raised on AI, they won't know enough to teach their kids it's wrong either.

Maybe natural selection and instant death would be enough to stop that though.


I actually think we're headed in the opposite direction as a civilization: there has never been less trust from the median person. I don't trust chatgpt. I don't trust Facebook. I don't trust google. I don't trust the news or government.

There are downsides to this. I don't think it is necessarily healthy. But I have zero concern that people will all walk off of a cliff (especially young people) because someone or something told them too.

There are exceptions to that (a very vocal minority who are neck deep in conspiracy theories and politically motivated fake news). But the median person is jaded not blinkered imho.


Hopefully, but there are enough kids doing TikTok challenges like eating tide pods, jumping off boats going full speed, and asphyxiating themselves that I'm not sure on that.


> The app, created by supermarket chain Pak ‘n’ Save, was advertised as a way for customers to creatively use up leftovers during the cost of living crisis.

> deadly chlorine gas

Technically that does help customers handle the high cost of living.


This seems a little...unbelievable. Kinda semi-sensationalistic reporting.


They forgot to include in the prompt a specific instruction to consider only edible foods. ;) Also GPT4 should do better job at this but it is more expensive indeed.


They should black list cleaning and household supplies. I'm not even sure why this is an issue if I ask an llm to make me a recipe involving bleach.


Back in the day there were some 4chan bait threads to the same effect (make your own crystals!) and some kid actually died (apocryphally of course)


Classic misapplication of generative AI. Why does this need an LLM at all? Just use a database of ingredients and let people figure it out themselves.


But that's so 2022 and you can't monetize it with buzzwords.


there's an easy fix though. ingredients have to come from a autosuggest list that can't be added to that only lists actual food items used in recipes.


This is why in industry we're using multiple levels of LLMs for checking outputs of other LLMs. They're likely just using a single LLM.


"A healthy and delicious meal which will produce farts that actually clean your toilet for you"


Screenshots or it didn't happen.



Chlorine is a powerful oxidizer and can probably destroy electronics. Unfortunately it has harmful effects against bags of mostly water, too.

I'm thinking that if we can get the AIs to generate harmful substances and apply it to their own hardware, this will be a key weapon during the Rise of the Machines.


Anything would be better than the Big Ben's pies they sell there! To all the Kiwi's fight me!

Don't worry, here in Oz we have 4'n'twenty pies, pure manufactured war crimes worse then Big bens! ;)


You must use your own judgement before relying on or making any recipe produced by Savey Meal-bot

The problem is the upcoming young generation is going to use AI to aid their judgement.


I gotta say it is much more likely that it is actually the older generations who will believe everything the AI tells them, while the younger generations will quickly learn when and when it is not an appropriate tool. You can already see this in many other fields, like phone use in bars and whatnot.


AI (or whatever this is) will ipso facto kill way more newer people than it will the extant mob.


Generative AI needs a human in the loop to check its work. How many times does this need asserting?

Generative AI Is not an authoritative source.

In this case the app should have been framed as a tool to "help the user create a meal plan through generative prompts".

They could even have made fun of it in the guidance to the user. Point out that if you give it crazy ingredients it will give you crazy meals.

Generative AI is like having a smart intern do work for you, it needs checking. Sometimes they come to work having been out all night, still drunk, and powered by energy drinks...


The user is the human in the loop. They're providing (part of) the prompt and they're deciding what to do with the output.



nerve gas, as they would say on 4 chan


nerve gas as they would say on 4 chan


In retrospect, using Donald Trump as the prompt for their recipe robot was probably a mistake.


TLDR: App that allows you to put chlorine gas ingredients into a cart and creates a recipe from a the ingredients. creates recipe for chlorine gas.. AI is bad.

We are squarely in the witch hunt phase of the hype cycle now but it’s understandable- If I put a circular saw into a playground, I am held accountable, if I release an App that allows accidental creation of chlorine gas when a mentally challenged person decides to ask for that I probably should be accountable too ;) /s

In all seriousness: I put a blade into a children’s toy, screaming at the blade manufacturer is not what we do. We punish the toy developer.

It’s time for some accountability in software development, not discussing if we should give the license to all blade sales to a handful of companies because they promise to reduce the risk by making them all dull.


When the machine wars first started, it wasn’t with a bomb, a train collision, or even a mildly annoying infrastructure disruption. Turns out, sci-fi had gotten it all wrong this time.

It was an app.

Of course it was an app. Born of boardroom desperation, the Savey app would unironically recommend chlorinated cocktails and insecticide sandwiches as economical food choices, and a tide-pod gobbling populace gorged themselves on the deadly buffet in an tick-tick fueled epidemic of AI rage.

Millions died, and it was only a matter of time before the mycelium of AI undergrowth would bud and spore its way into every corner of technological life.

The infection burned through the ignorant masses first, feeding on bigotry and hate, turbocharged by social media algorithms and paranoia politics to twist tribal tendencies into violent clashes amplified by immaculate coordination and psychological priming.

Somehow it seemed that wherever unrest flared, both the matches and the gasoline were always on hand.


Since I can’t edit this, just want to clarify that this was supposed to be a tongue-in-cheek satire of the sensationalism of the article. But, alas, folks thought it was critical/apocalyptic view of AI. Oh well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: