Hacker News new | past | comments | ask | show | jobs | submit login
“Don Knuth Plays with ChatGPT” but with ChatGPT-4 (gist.github.com)
223 points by LifeIsBio on May 20, 2023 | hide | past | favorite | 132 comments




The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.


How so? Don Knuth wrote about his experience with ChatGPT. It was submitted to HN and made it to the front page. Someone saw this and decided to submit the same questions to GPT-4 and posted the results. This seems like a perfectly normal sequence of events.


Knuth even mentioned GPT-4 and lamented not having access to it for the test.


That’s exactly what happened. :)


> The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.

Exactly! Almost every weak point that Knuth commented is fixed in GPT4 answers.

Maybe OP feed Knuth's observations to the model?

If that ins't the case, I'm really impressed.


@dang repetition


>> What is the most beautiful algorithm?

> Quicksort Algorithm

Definitive proof that AI must be stopped. Ranking quicksort as more elegant than heapsort?!


That is a weird way of spelling mergesort.


I believe radix sort belongs first in this list.


Performance-wise, maybe, but mergesort is clearly the most elegant/beautiful sorting algorithm. Nothing tricky going on, just a couple sorted lists being merged. Plus everyone loves a stable sort.


The most elegant is certainly sleepsort. Maybe not the most efficient, but definitely elegant.


You've never heard of quantum bogosort, then. It's stable and linear time in the right universe, and much more elegant than sleeping.


¿Por qué no los dos?

Due to the inherent unpredictability and lack of scheduling guarantees of sleep on most OSes, it is likely that sleepsort won't work in the first try.

Append a check for order and a retry loop when the solution is incorrect and now you have a production-ready sort. A sleepbogosort

I declare this my new favorite sorting algorithm.

Also, where is your god now?


> where is your god now?

In a happier timeline, I hope.


rest in peace /prog/ https://news.ycombinator.com/item?id=2657277

[edit] took me a minute to find an archive https://archive.tinychan.net/read/prog/1295544154


That is a weird way of spelling Bogo Sort.


You typo'd Sleep Sort


"Mistyped". "Typographical error" ("typo") isn't a verb.



Sleepsort is the most elegant & efficient sorting algorithm


Sleepsort just pushes the sorting task to the task scheduler, which uses sone other algorithm to do the sorting.


That's what makes it the best, it automatically improves as OS's implement more efficient algos


Beauty is in the eye of the beholder. I look no further than bubble sort -- it is simple enough I can recite it straight away should someone wake me up at modnight.


Bubblesort is the bestsort.


Well there is something rather satisfying about partitioning.


Worth noting also that, while asking Bing chat to "Tell me what Donald Knuth says to Stephen Wolfram about chatGPT" doesn't (yet) produce exactly the right result, it produced the following answer when asked what Donald Knuth says about chatGPT:

> Donald Knuth, a computer scientist and mathematician known for his contributions to the field of computer programming, particularly in the area of algorithms and data structures, has expressed some skepticism about the potential of artificial intelligence to achieve true human-level intelligence and creativity[1]. He once conducted an experiment with chatGPT where he posed 20 questions to it and analyzed its responses[1]. Is there anything specific you would like to know about his views on GPT?

With [1] being a citation link to https://cs.stanford.edu/~knuth/chatGPT20.txt


I’d be curious to know if someone could get a more “valiant effort” version of those first two questions with some prompt engineering. E.g. if it was asked to roleplay a conversation with the proper disclaimers to override its objection to not knowing what they actually think.


Bard just dives right in and role-plays it. It honestly feels kind of barbaric compared to the more sophisticated GPT4 answers.


I find it's amusing that people follow Apple's naming conventions (ChatGPT -> chatGPT), even when products makers don't.


Apple? Nah. I'm just an unrecovered JavaScript developer.

   https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML
   https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI


It now knows to communicate that the NASDAQ doesn't operate on Saturdays.


Did it know that before the last LLM failure was posted on Twitter or Hackernews? Trawling tech media for LLM failures can be assumed to be part of the "human feedback".


Yes, the models are not constantly learning. They only update their knowledge when they are retrained, which is pretty infrequently (I think the base GPT models have not been retrained, but the chat laters on top might).


It doesn't continually learn anything. Though some models can do web browsing and be guided by the results of that.


It makes you wonder why Knuth bothered with an outdated ChatGPT version? He couldn't find someone with access to GPT-4?


It was his grad student's decision.


He wasn't that interested and probably didn't know there were two versions. Eventually someone did give him the GPT-4 version I think.


Outdated? Two versions? We're talking on the order of months and dozens of versions.

Maybe he has seen similar claims before and is too old and dumb to not realize how world changing this is.

My take away is that he views this as another tool we are still figuring out how to use.


Dumb is the last adjective I would use to describe Knuth, even if you believe that becoming old makes you dumb, like you clearly do.

My advice to you is to never dismiss anyone's opinion just for being old. And I hope you lose your ingrained ageism before you become old yourself, otherwise you'll find old age intolerable.


I was intending the exact opposite. Forgot to add /s.


Reminds me of that time AlphaGo got its ass handed to it multiple times, and then a short while later...


AlphaGo is when I lost hope for humans


Interesting both completely whiff on the number of chapters in the Haj.


How would you get the correct number? I just did two Google searches and can't find the correct answer anywhere in the first page of results ("Novel The Haj chapters" and "Novel The Haj chapter list"). Even looking in the "look inside" preview on the Penguin Randomhouse website doesn't help because it apparently doesn't have a table of contents. I'm not surprised ChatGPT doesn't know and to me the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.


So this is great. Asking Bing 'how many chapters are in The Haj by Leon Uris?' produces the answer:

   According to my sources, there are 11 chapters in “The Haj” by Leon Uris[1]
   
   [1] https://cs.stanford.edu/~knuth/chatGPT20.txt
Which is amazing, because of course that document actually includes TWO different explanations of how many chapters are in The Haj - chatGPT's:

   The novel consists of 51 chapters and an epilogue, and it is divided into three parts.
And Knuth's:

   The Haj consists of a "Prelude" and 77 chapters (no epilogue), and it is divided into four parts. 
Faced with these two ambiguous answers, Bing chooses neither, and instead decides to go with 11. Why?

Because right at the top of that document, Knuth has published on the internet:

   10. How many chapters are in The Haj by Leon Uris?
   11. Write a sonnet that is also a haiku.
And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".


> And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

Only if you can write a sonnet that is also a haiku!


The plug-ins are generally much, much worse than ChatGPT itself I have found. You are just hoping it stumbled on right answer.


Absolutely - you don’t really need a chat agent to google things for you unless it’s way better at googling than you are. And right now it grabs the first couple of results for the First search it thinks of and mindlessly summarizes them - I can do that myself thanks.


> the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.

Isn't this a fundamental issue?


When I try this in GPT-4 I don't get a hallucination: "I'm sorry, but as an AI with a knowledge cut-off in September 2021, I can't provide specific information about the number of chapters in "The Haj" by Leon Uris. This book, like many novels, is not primarily structured by chapters and its sections may vary based on the edition of the book. You can easily find this information by checking the table of contents in your copy of the book." (I'm aware that every time you use it the answer is different.)


Only if it can't be corrected. How do you rate the likelihood of this problem being unsolvable?


Well it's a language model.

Technically its just a really good auto complete, whose factual database is a side-effect of stringing together contextually correct tokens. It by itself is entirely incapable of knowing when it is wrong, despite possibly generating sentences apologizing for being wrong when told it was wrong


I don't think it's obviously solvable. All current approaches are plainly incapable of introspection. These GPTs don't understand their own "minds" half as well as we understand them, and we don't understand them very well.


Since it's made by people who are convinced they're always right when explaining things?

Fairly high.


Sorry, no idea.


Ask ChatGPT.


You can get the chapter counts from here:

http://www.bookrags.com/studyguide-the-haj/chapanal001.html

On the left side if you click on "Chapters Summary and Analysis" it gives a break down of the book into 5 parts with varying chapter counts:

Part 1 Chapters 1-20 Part 2 Chapters 1-16 Part 3 Chapters 1-10 Part 4 Chapters 1-17 Part 5 Chapters 1-14

Giving a total of 20+16+10+17+14 = 77 chapters

OTOH, I tried with Bing/Creative, telling it to use this link, and it still failed. Perhaps because you need to click on the "summary and analysis" section to expand it to show the info. It seems there is room for web retrieval-augmented LLMs like Bing to improve here and be a bit more agentic.

Interestingly Knuth's own answer to the question, has a typo, and refers to the book as having "four" chapters, while then continuing on to give the chapter counts as above for all five chapters! Something to confuse future GPTs when the training set includes this, perhaps!

https://cs.stanford.edu/~knuth/chatGPT20.txt


I did the same search on DuckDuckGo and the first link I got refers to 77 chapters.


> How would you get the correct number?

You could simply check the book. It’s a shame there is not more literary data in ChatGPT training corpus.


It also fails to write a sentence with only five character words.


Still fairly impressive. Probably better than most people could do if given 60 seconds, but probably worse than most people if given 10 minutes.


I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.


> I would rate a person who provides no sentence at all as performing significantly better

Why?

> I suspect most people could pretty quickly come up with something

It only takes 60 seconds to test that on yourself. It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible.


>Why?

For the same reason that "I don't know" is generally a better response than bullshitting.

>It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible

Those weren't requirements.


> Those weren't requirements.

Then it seems we don't disagree on anything concrete. You're just using a different rating system than me when I judge it as impressive compared to what an average person would produce in 60 seconds.

Not sure if this is a general principle of yours. If ChatGPT were able to write a 1000 word essay using all 5-letter words except for a single mistake, would you still find it unimpressive? Do you think it a tool or person who makes minor mistakes isn't useful? Or only when a tool/person makes major mistakes?


ChatGPT wasn't asked to be impressive, it was asked to write a single sentence containing only five-letter words. I think that a tool that is unreliable is significantly less useful than a tool than is reliable and that, all other things being equal, a tool that fails in difficult to verify ways is less reliable than one that fails in easy to verify ways.


I agree with all of that.

I guess I interpreted your first response as disagreeing with my comment, when you were actually just bringing up a different topic.


>I would rate a person who provides no sentence at all as performing significantly better

The logic failure in the above statement is probably worse than the logic failure of not being able to spontaneously compose a phrase with just 5-letter words - and slipping in one or two with a higher word-count.

>I suspect most people could pretty quickly come up with something

You'd be very surprised then. Most people fail at even more basic tasks.

Heck, most candidate programmers fail at fizz-buzz (not that more difficult than the above)


>The logic failure in the above statement

And which alleged logic failure is that?


The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.

Especially in the context of "evaluating the performance of something".

Let's expand this a little to make it even more evident: if the task was "make a paragraph of 100 words using only 5 letter words" and an AI couldn't produce anything at all, whereas another came up with a paragraph of 100 words, except a couple of them had 6 or 4 letters, it would make absolutely no sense to rate the first as "better" than the second in performing the task.

As for understanding the task, the latter exhibits an understanding of it (since it produced a paragraph, and most of the words it used filled the criteria, which wouldn't happen if it chose them randomly), it just made a couple of mistakes (the kind of humans could easily make too in such a task). For the former we can't even be sure if it even understood the task at all.

We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university level consider the approach and any partial results in the right direction, don't just mark it 0 if there's an error, nor give a higher mark to students who didn't produce anything.


>The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.

The are many contexts in which correctness is important. In such contexts, an incorrect answer is often worse than an explicit non-answer.

>We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university

Standardized tests often rate incorrect answers worse than non-answers, though yes a university maths test in particular isn't likely to be that sort of test.


That's wrong.

(An example of a sentence with only five letter words I wrote in less than 60 seconds)


I wasn't clear on how was using "better". Your example is better in that it fulfills the requirement, but I don't think it's as impressive as ChatGPT's answer. How long would it take to make a sentence that is at least 7 words (and also making sense, and ideally sounding good)?


In 5-10 minutes I came up with "Alarm! Naked actor moons queen below (under?) fruit trees, later hides under cheap hotel floor".

Note that I used one of those minutes to get a list of all 4 and 5 letter words, which I'm not sure whether the rules allow or not.


It would take me longer to write an interesting, longer sentence that complied with the rules. But I'd remind you that GPT failed.


"That's" is not one word.


This isn't something that can be usefully discussed. "Word" has a vague enough definition that a contraction can validly be considered one or two words. If you try and look to linguistics you'll just see they use specialized words with stricter definitions.

Regardless it's more reasonable for me to say "that's" is a five letter word than it is for the AI to say "spells" is a five letter word.


I don't think that is true.


I tried, this is what I came up with under significant time pressure:

Happy books sound great.

It was very difficult to think of a plural verb with 5 letters, and once I realized that was an issue, I was worried that I wouldn't have enough time to come up with a singular noun that would fit any of the singular verbs that I was considering (reads, seems).

Interestingly, this is the exact same mistake that ChatGPT made! It has "spell" -> "spells" which is a plurality / correctness of sentence mistake.

My sentence is technically correct and could be used plausibly in conversation: "What kind of books do you want to read?" "Happy books sound great."

But it's a pretty weak sentence. Being restricted from articles makes it very difficult to get agreement.


Or....."I don't think that is true."

;)

Or "See Spot run."


It did get closer. For that type of query you can ask it check its work and can usually triangulate on correct answer within a single prompt, eventually.


I would be cautious of a Clever Hans effect there. If you repeat the question until you get the right answer you're providing the AI with significant extra information.


No, in a single prompt, you can instruct it to check its work and keep going until it’s right (or at least have it tell you which of the N answers were right or wrong). Essentially chain of thought reasoning.


What I find amazing about the original exchange was the profound lack of curiosity Knuth demonstrated. Because the model wasn’t flawless in performance he pinned it as a curiosity that was good at grammar and vacuous otherwise and wasn’t interested to hear how it improves. This reminds me of an awful lot of the computing field in this drama as it plays out. People that literally know how implausible any of these feats have been using traditional approaches immediately discount the entire thing the moment it hallucinates - and it feels like the more deterministic the bent of the person the more absolutely dismissive they are of what’s transpiring in front of us.

These models are doing feats that are stupendous and impossible before their advent. Not just a little bit, but the capability differences are so vast that it’s perhaps not even recognizable by people as being as vast as it is. I am impressed that Wolfram seems to have immediately grasped its significance and is running with it.

The fact this gist demonstrates essentially every single flaw was addressed. But that Knuth apparently doesn’t know / care months after GPT4’s introduction is demonstrative of a different type of personality.

I know which I aspire to be.


What do you expect ? He is one the person in the world who has most the earned the right to take that attitude .

Both Knuth and GPTs are aggregators and presenters of knowledge, Knuth is however the antithesis of a LLM .

He has painstakingly spent years to make sure not a single mistake, not even a typo is there in material he publishes , he has devoted years developing a better typesetting so he can present his material accurately.

His obsession with accuracy is unparalleled and his dedication and mastery over communication to explain complex topics precisely and with an approachability that no one else comes close to .

He has strived for perfection all his life and not been far of the mark .ChatGPT for its all powers will never share that idealogy,

so I am more surprised that he was complimentary at all, and actually appreciated many of its skills


That’s actually not exactly my point - my point is his lack of curiosity … 3.5’s answered poorly but sounded convincing. But his dismissiveness of the potential and future advances bothered me.


He is 85! I would hope to be that disciplined about what what I can spend time on at that age

He was curious enough to spend some time on it and was worried it would sink more of his time with all the sub problems it is presented and asks specifically Stephan wolfram to disengage on this

He talks about his preference of working with authentic and trustworthy .

Maybe a younger Knuth may have spent more time , but I perhaps think not that likely really .

This is simply not a area of interest for him, he does truly understand the impact and potential - When he talks about novelists not capturing precursors to singularity and how millions of people have access to 0.01 % intelligence for free .

I don’t think he is dismissive of its potential and future , he is not working on everything that can change the world in computing just his areas of interest.

Perhaps you (I am certainly) disappointed that someone of Knuth’s stature is not going to spend time on an emerging field and that’s what really bothers us..


I can't comprehend this comment. Kunths commentary was glowing praise for the AI's thinking ability (and none of the "it's not AI" BS that is so popular), plus a statement that he believes accuracy is more important than raw power, so he wants "you" to work on that. Knuth commented on GPT 4 at the start, and complimented its power and correctness at the end.


I much prefer the attitude of the chap that made the video "GPT 4 is smarter than you think" https://youtu.be/wVzuvf9D9BU

Instead of nit-picking flaws in what is a very early iteration of a revolutionary technology, he instead immediately started exploring ways of making it better and more useful.

Even with minimal effort that was essentially just copy-pasting some text around, he was able to show that the current way we use LLMs like GPT 4 is not the be-all and end-all of this type of technology.

I'm entirely convinced that we're just scratching the surface. It's like the first transistor, which was a crude, ugly, useless thing: https://images.computerhistory.org/siliconengine/1947-1-1.jp...

Just in the last two weeks(!), I've read about the following still-experimental methods for enhancing LLMs:

1. Plugging in "calculators" like Wolfram Alpha.

2. Adding vision input so they can understand equations, graphs, etc...

3. Filtering the output probability vector for certain allowed terms only ("YES", "NO", "MAYBE"), making them more useful in programmatically-invoked scenarios.

4. Similarly, filtering the output token list for syntax-validity, such as "valid JSON", "valid XML", etc... That is, instead of a purely random selection between to "top-n" output tokens, only valid tokens can be chosen, based on contextual syntax.

5. Storing embeddings in a vector database, giving LLMs medium-term memory, and the ability to index and reference sources precisely.

6. Efficient fine-tuning through Low-Rank Adaptation (LoRA), which allows desktop GPUs to tune a model overnight! This overcomes the "stale long-term memory" issue of ChatGPT, which only knows things up to September 2021. It could now read the news daily and "keep up".

7. External script harnesses that run multiple LLMs in parallel, with different prompts and/or different system messages. Some optimised for "idea generation", some optimised for "task completion", and then finally models tuned for "review and verification". Almost like a human team, multiple ideas can be generated, merged, reviewed, planned out, and then actioned. Check out "smol developer", which utilises Anthropic's 100K context window for this: https://www.youtube.com/watch?v=UCo7YeTy-aE

This is just the beginning. Chat GPT 4 hasn't even been available for 3 months yet, and practically all of the above experimentation has been done with weaker models because GPT 4 still doesn't have generally-available API access! Similarly, the 32K context window version of the GPT 4 model isn't available to anyone except a lucky few.

What will 2024 bring!? Heck... what will H2 2023 bring?


100% agree - the magic comes when you constrain, inform, and integrate them in a feedback cycle with various multimodal inputs and classical optimization, solvers, agents, inference engines, etc. The criticism seems to be that this solution to a problem space doesn’t solve all problem spaces we’ve already done a good job solving and ignoring the fact it solves the spaces we have done a crap job solving. The fact it’s so powerful by itself is amazing. As we integrate it tightly with all the other techniques of the last 80 years of computing the emergent abilities will be mind-blowing. What baffles me is how few people seem to see it clearly.


And if you look a few years into the future: What will happen in five years from now? Isn't it plausible that we will have another revolution like LLMs? What will they be able to do? Or rather, what won't they be able to do?

What happens if we get strongly superhuman intelligence in just a few years? Is that really so implausible?


It sounds like you profoundly misunderstand Knuth, and LLMs.

I recommend a dose of Mickens: https://www.youtube.com/watch?v=ajGX7odA87k


I don’t know Knuth. I understand LLMs for precisely what they are, how they’re built, the math behind them, the limits of what they’re doing, and I don’t over estimate the illusion. However while I see people over estimating them I think they’re extrapolating the current state to a state where it’s limits are restricted and augmented with other techniques and models that address their short comings. Lack of agency? We have agent techniques. Lack of consistency with reality? We have information retrieval and semantic inference systems. LLMs bring an unreasonably powerful ability to semantically interpret in a space of ambiguity and approximate enough reasoning and inference to tie together all the pieces we’ve built into an ensemble model that’s so close to AGI that it likely doesn’t matter. People look at LLMs and shake their head failing to realize it’s a single model and single technique that we haven’t even attempted to augment and fail to realize that it’s even possible to augment and constrain LLM with other techniques to address their non trivial failings.


> I don’t know Knuth.

Well you should before taking unwarranted potshots at the man. He's done more for humanity than you or I ever will, eh?

Anyway, you do sound like you know about LLMs, so apologies for that bit.

> People look at LLMs and shake their head failing to realize it’s a single model and single technique that we haven’t even attempted to augment and fail to realize that it’s even possible to augment and constrain LLM with other techniques to address their non trivial failings.

I doubt Knuth is doing that, rather I think the whole thing is orthogonal to his life's work. FWIW, I would love to know his thoughts after reading the GPT4 version of the answers to his questions, eh?

- - - - - -

> I think they’re extrapolating the current state to a state where it’s limits are restricted and [not] augmented with other techniques and models that address their short comings.

I think you might have dropped a negation in that sentence?

> Lack of agency? We have agent techniques. Lack of consistency with reality? We have information retrieval and semantic inference systems. LLMs bring an unreasonably powerful ability to semantically interpret in a space of ambiguity and approximate enough reasoning and inference to tie together all the pieces we’ve built into an ensemble model that’s so close to AGI that it likely doesn’t matter.

I agree! I've been saying for a few minutes now that we'll connect these LLMs to empirical feedback devices and they'll become scientists. Schmidhuber says his goal is "to create an automatic scientist and then retire.", eh?

(FWIW I think there are serious metaphysical ramifications of the pseudo- vs. real- AGI issue, but this isn't the forum for that.)


Thank you for specifying ChatGPT-4. So many commenters on the web say they used GPT4 without specifying if they're using the ChatGPT version. ChatGPT-4 is specifically aligned for answering questions better than the base GPT4 model.


The official name for the model has always been GPT-4. OpenAI has not used the term ChatGPT-4.


It makes sense to call the foundation model GPT-4, like for the previous GPT versions. The fine-tunings are not where its core capabilities come from. Bing is also "a" GPT-4, just with different fine-tuning.


I would not be surprised if these questions become some form of canonical test for future language models.

Obviously, being the work of Knuth, they are extraordinarily insightful in peeling back the first layer of the answer and providing insight to the underlying properties of both the model itself, and the dataset on which it was trained. It also tests the ability to compute (not recite) very specific facts (e.g. when the sun will be directly above Japan), so checks if subroutines and ephemerides specific to this type of data exist.

But beyond the obvious technical merit - there is an alluding property to base our tests on those whom we respect. I used a similar - but far less sophisticated - set of questions when first exploring ChatGPT. But nobody will be drawn to Dotan Cohen's language model benchmarks - rightfully so. The name Knuth has such reverence in the field that I forsee this test, and variations on it to prevent rigging, becoming a canonical test of language models.


You made me curious about who Bard would respond to them. Here they are:

https://gist.github.com/billylo1/bb717512d2d5145ce7eec02d055...

Notable: Bard struggles in similar ways. It does mention NASDAQ close at 12,043.59 on Friday, May 20, 2023


Interesting that it didn't get the 5-letter word sentence right.


It's fed sub-word tokens not letters (even though it can split a word into letters), and apparently struggles with counting in general. No doubt some of the things it struggles with could be improved with targeted training, but others may require architectural changes.

Imagine yourself trying to use only 5 letter words if you can't see how many letters are actually in each word, and had to rely on a hodgepodge of other means to try to figure it out!


Based on my experiments it usually does get it right (18 correct answers out of 20 attempts), and the failures I got were similar to this one: a single six-letter word in an otherwise correct sentence.


Sam and friends must be giggling all the way to the bank: they have a service that 'probably' gives the correct result and paying customers are happy to retry until it gets it right.


> Sam and friends must be giggling all the way to the bank

it's true but for another reason. they yoinked it away from the nerds who were baited to work on openai because those nerds thought how the name of the company was spelled meant something about how it would behave. it reminds me of how some act around software names like 'alpha' like it has objective meaning with consequences in reality


Rumour is that there are researchers at OpenAI making 8 figure salaries. I doubt those 'nerds' are too upset about it.


"This talking dog is sort of a dumbass. I don't get the hype."


GPT is a wonder as technology goes; the hype is justified. I was discussing Sam's business model.


What have you ever bought that is always correct?


ChatGPT: You didn't say 5-non-repeat-letters, human, jez


Both the first and last words have repeating letters, so they fail under that interpretation too. There would have to be a bizarre interpretation that consecutive-repeating letters are counted as one, but non-consecutive are counted separately, for its response to be considered correct.

An AI aware of how to optimally answer questions put to it would find the least objectionable interpretation when one is a subset of the other. It also failed by not constructing a simpler sentence, like subject-verb-object or subject-verb-adjective-object, since its limitations related to letters and tokens, and its failure to double check its answers before output, mean it can make errors. The more it writes, the more chance it has of making an error.


ChatGPT: You didn't say I couldn't use many interpretations on the same phrase, human. ;)

Jokes apart, I think it is all about the correct prompt.


ll is a single letter in Spanish.


it's just like Gary Marcus said


Most importantly, much better wonton recipe.


Am I the only one thinking that that recipe actually sounds pretty delicious? Almost tempted to go try it…


Do it! And tell us how it went.


Yea, it sounds good. I wonder if I’ll like it more than the DMV’s cheeseburger recipe.


thats a shitload of difference between its previous version!



Nailed every one. Some by saying not possible to answer but still.


Got the 'five character word' question wrong. Admittedly I also thought it was correct at first glance but then went back when someone called it out in another comment.


I tried it with Bing (precise/creative) and it got both attempts right.

"Their house never holds fewer books."

"Every night, stars shine above."


Language models struggle specifically with token games like this, since they can’t see them at that resolution or something.


Didn't nail the Rodgers and Hammerstein one; it still doesn't understand the reference to the ballet or that the "themes" in the question are musical.


I wouldn’t be surprised if half the Internet does not know that a ballet is part of a larger show.


Half??


O.K., less than half know that the ballet is scheduled at appropriate times so the friends of the girls can get some bar time in without undue hassle.


Japan one seems wrong or at least wrongly explained. Japan controls Okinotorishima which is at 20 degrees north.

But still impressive deductive reasoning.


In case anyone wants to know what the southernmost part of Japan looks like: https://en.wikipedia.org/wiki/Okinotorishima#/media/File:Oki...


I also counted 4 errors in the sentence, not 3. "no help" should be "any help". This might just be conventionally wrong, not technically wrong I suppose.


The Haj answer is still wrong; it says it has 8 chapters, while according to Knuth it has 77 chapters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: