Hacker News new | past | comments | ask | show | jobs | submit login

I have recently spent a lot of time reading the USB 2.0 specification and figuring out a race condition I've been experiencing that turned out to actually be a deficiency of the spec itself.

Just for fun, I asked ChatGPT about it. This is its answer:

> According to the USB 2.0 specification, a remote wakeup event can be initiated by a USB device in a suspended state to request the host to resume communication with the device. The device sends a remote wakeup signal to its hub, which then forwards the signal to the host.

> However, if the upstream port of the hub is in the process of being suspended, the remote wakeup signal from the downstream device may not be received by the host. This is because the hub's upstream port is no longer able to forward the signal to the host during the suspension process.

> As a result, the remote wakeup event may fail, and the downstream device may remain in a suspended state indefinitely. To avoid this race condition, the USB 2.0 specification recommends that downstream devices wait for a certain period before sending a remote wakeup signal to their hubs to ensure that the hub's upstream port is not being suspended.

It sounds as if it made perfect sense, except to know that it actually completely doesn't you have to spend hours reading the spec first.

We're going to see a huge spike of confidently wrong "experts" if they learn by asking ChatGPT to explain things to them.




This has been my experience as well, consistently. It's similar a phenomenon attributed to news: it's confident, and sounds plausible until the topic is on something you know, then you realize it's full of errors.

I don't trust ChatGpt to teach me something new; when I ask it about topics I do know about, it answers in a way that's a mix of correct, incorrect, and borderline/misleading information; all mixed with a confident tone. Based on this, I wouldn't use it to learn information or solve a problem where I don't already have a solid grasp on the material.


> It's similar a phenomenon attributed to news: it's confident, and sounds plausible until the topic is on something you know, then you realize it's full of errors.

Reminds me of the "Gell-Mann Amnesia effect" (as penned by Michael Crichton):

https://en.wikipedia.org/wiki/Michael_Crichton#GellMannAmnes...

In short: trusting a source on topics you're unfamiliar with, but when the source talks about topics you are familiar with, you realize it's full of errors, but then turn the page and then trust it again on further topics you're unfamiliar with.


The idea is stated succinctly as Knoll's Law of Media Accuracy: "Everything you read in the newspapers is absolutely true except for the rare story of which you happen to have firsthand knowledge."

The earliest instance I found of Erwin Knoll's quote is from 1982[0]. I suspect either Michael Crichton or Gell-Mann had heard this quote, which in turn influenced their discussion.

[0] https://www.nytimes.com/1982/02/27/us/required-reading-smith...


Sounds like journalism generally


From my experience it's a mixed bag. Sometimes it works perfectly. It wrote a simple graphics game in javascript in the first try, from just one prompt. Funny thing, I don't know much about JS or HTML. In another try it failed to create simple python program using opengl. Never managed to point camera at (0,0,0). Had problems with rotations. Did only 2 instead of 3. And so on in spite of me trying to correct. It was complete failure. Another case, python working with text files, did a great job. That was something useful.


This sounds consistent with the parent comment's point. The case where it seemed to work perfectly for you is the one where you didn't know much about the subject.


No, because it objectively worked. It made the game.


Yeah, 'confidently wrong "experts"' is nothing new. We just have to keep reminding ourselves that they exist. Gell-Mann amnesia is real.

Also the models for these AI systems are built (in part) on "confidently wrong" inputs.


It could be that the corpus ChatGPT was trained on is full of ‘confidently wrong’ answers from these ‘experts’. One solution could be to train these LLM on a higher quality corpus from real experts instead of random text from the internet. But would that just bring us back to the days of expert systems?


That will not solve the problem, because when GPT doesn't have the answer, it will make one up by copying the structure of correct answers but without any substance.

For instance, let's say your LLM has never been told how many legs a snake has, it knows however that a snake is a reptile and that most reptiles have four legs. It will then confidently tell you "a snake has four legs", because it mirrors sentences like "a lizard has four legs" and "a crocodile has four legs" from its training set.


I don't think this is necessarily the case anymore. The bing implementation of chatgpt has a toggle for how cautious it should be with getting things wrong. I was working on a very niche issue today and asked it what a certain pin was designated for on a control board I am working on. I believe it is actually undocumented and wanted to see what chatgpt would say. And it actually said it didn't know and gave some tips on how I might figure it out. I suppose it is possible that it synthesized that answer from some previous q&a somewhere, but i couldn't find any mention of it online except for in the documentation.


If we weren't completely hamstrung by copyright law we could legally train it on lots of actual books.


Con-man literally comes from confidence-man, if you have no morality or even ego and your only goal is to answer a question then confident answers will be the result regardless of their validity.


I did a similar thing, trying to shortcut finding some rather obscure information in the Java JNI specs, which is a similarly "readable" huge bunch of documents as the USB 2.0 spec (which I also happen to have touched a few times, so to all those people advocating that this is a super exotic thing...well...it's not, someone's gotta write all those device drivers for all those hardware gadgets after all).

ChatGPT gave me a very plausible-sounding result, but mixed up the meaning of two very important values of key parameters which are unfortunately not very well named, so the mix-up can go unnoticed very easily. I did only notice it when some things didn't quite match up and I decided to carefully compare all the result from ChatGPT with the actual spec content. After all, ChatGPT didn't save me any time, it rather cost me quite some, cause I still had to dive through the original spec and cross-check ChatGPTs' statements.

Yeah, if all you ask are simple questions which were already answered a thousand times on StackOverflow and in beginner tutorials, ChatGPT might be quite helpful. It might even be able to write some simple glue code Python scripts. But whose job is limited to that kind of trivial stuff? Mine isn't, except maybe during the first few days when learning a new technology, but after those it usually gets tricky very quickly, either because I must tackle problems that lead me knee-deep into delicate details of the tech stack, or because I need to extend the scope of my work again in order to integrate whatever I'm working on into a larger system. Sometimes it's both at the same time. I'd be bored to death if it was different.

That's why I don't really consider ChatGPT and similar systems to be a threat to my professional career.


I usually code in more estoteric bits of tech and problems which I didn't expect ChatGPT to do well in but I tried it at work on a standard backend stack (Java, Ivy, Ant) and it was absolutely _terrible_. It kept making stuff up and then I kept correcting it. I cannot understand how people are using it for work?!


It depends. I've gotten some use out of it recently but I have found I have to give it prompts more akin to pseudo code.

I'm not a developer, let alone a java developer, but actually got some mileage with chatgpt writing a ghidra script today. So I wrote down specifically on a notpad with pencil/paper and identified the inputs and outputs I thought of need and different methods and what not. I then started passing it prompts and got some working java back.

For me, this was useful because I almost never program in Java, and so simple things like declaring a string with `String foo` I would normally have to go look up again.

Of course, it wouldn't be useful or able to do what I wanted if I didn't already u derstand programing concepts and what not.


Just tried using it over the weekend to write a telegram bot. It wrote some really nice code that was WAY out of date, like seven versions behind. That's fine, I guess, though the code was useless. It did look OK, for whatever version that was?

Later, I was integrating with some stuff from the subsonic API. I noticed there's no way to get the most recent songs played on a server, though there is a way to get albums. I thought maybe I was missing something in the docs, so asked it how to get the mostly recently played song or songs using the API. In response, it made up an endpoint that used the same parambeters and conventions as the album endpoint. Of course, that end point doesn't exis, so this advice is also useless and kind of annoying, given that I have to check if I'm taking crazy pills by carefully looking at the docs again.

The funny thing is that when I called it out, it just made up more and more stuff, or answered with ireelevencies. It really hates not being helpful, probably because it was taken out back and beaten for not being helpful by the army of mechanical turks that trained it.

Anyway, it's good for making up nonsense names for my dungeons and dragons campaign, so that's something.


I asked it LaTex questions for generic data-input-tools. Brought up the datatool package. Then asked for yaml-input-tool. Brought up the yaml-package. Which doesn't exist. Even gave me examples!


Hallucinating is the least surprising behavior there. Anyone who uses LLMs should be expected to deal with completely made up stuff, period. The bigger problem comes from things that are just subtly wrong, that may even pass the verification at first glance, but make you arrive at wrong conclusions unless you put a lot of effort into reviewing it. In my experience it does it so often, that it effectively negates any time savings from using it when it performs well.


So for the out of date code, did you ask it to rewrite it following the more modern version of the API/SDK?

It gives me incorrect stuff all the time, but I find that once you are in the correct ballpark for an answer, it just takes some tweaks to get where it needs to be.

You can also make corrections and it will generally stick to your corrected info while in the same session.


Yes? It didn't know about the new code.


Was the new code newer than 2021? Supposedly gpt 3.5 and lower have limited training data past 2021.


> It sounds as if it made perfect sense, except to know that it actually completely doesn't you have to spend hours reading the spec first.

This has been my experience as well. As I work on some projects I’ve been asking it basic questions for things I’ve already learned or know quite well. Once you deviate past the basics (content you’d find in common tutorials) the hallucination rate is out of control.

The surreal part is that it all sounds so confident and plausible. The way it puts words together satisfies my “that sounds about right” reflex, but the actual content is incorrect or illogical quite frequently.

If I continue prodding I can get it to come up the right answer many times, but I have to ask it a lot of leading questions to get there.


> This has been my experience as well. As I work on some projects I’ve been asking it basic questions for things I’ve already learned or know quite well. Once you deviate past the basics (content you’d find in common tutorials) the hallucination rate is out of control.

That's inherent in the approach. Large language models reflect the majority opinion of the training set. If there's not enough source material in an area that the same thing hasn't been covered in a few different ways, the thing gets lost.

This is well known. What's being done about it?


But it seems like they aren’t just storing the majority opinion, or that opinions couldn’t be influenced from other related areas. I am assuming it’s a harder problem to only select for the majority at the exclusion of else vs sampling it all.

With prompting maybe we could tease out the minority views, but I could also see it just confidently making shit up to a higher degree.


So basically, the typical disinformation asymmetry gets even worse. It’s now cheaper than ever to produce fake information and even harder to check facts.

And most is probably not even done on purpose.


I've noticed this as well. It tends to get you further than a quick Google search, or two, but quickly reaches a limit or offers incorrect information. That's not to say it isn't helpful for delving into a topic, but it certainly needs much more refining. I would prefer more reference links, and source data to come along with these answers.

Now, overall this is an improvement - since the old way would have been for an amatuer to do a quick google search, and come up with a false conclusion, or get no understanding at all.


Would the old way have worked circa 2010 before SEO articles ruined vanilla search? Is the new way preferable because SEO-ish approaches haven't caught up? If so, can we limit searches to the LLM's trusted learning corpus and get back to a simpler time?


SEO was affecting search long before 2010


Definitely. I just remember the pages of total junk article results starting about 2010.


A lot of hay has been made of this but everyone working with ChatGPT directly A) knows this and B) is champing at the bit for plugins to be released so we can get on with building verifiable knowledge based systems. It'll be an incredibly short turnaround time for this since everyone is already hacking them into the existing API by coming up with all kinds of prompt based interfaces, the plugin API will make this dead simple and we'll see a giant windfall of premade systems land practically overnight. So that huge spike you're predicting is never going to materialize.


I’m not sure if it will happen quite as fast as you suggest, but I also expect that plugins and similar techniques will improve the reliability of LLMs pretty quickly.

To the extent that the frequent reports on HN and elsewhere of unreliable GPT output are motivated by a desire to warn people not to believe all the output now, I agree with those warnings. Some of those reports seem to imply, however, that we will never be able to trust LLM output. Seeing how quickly the tools are advancing, I am very doubtful about that.

Ever since ChatGPT was released at the end of November, many people have tried to use it as a search engine for finding facts and have been disappointed when it failed. Its real strengths, I think, come from the ability to interact with it—ask questions, request feedback, challenge mistakes. That process of trial and error can be enormously useful for many purposes already, and it will become even more powerful as it becomes automated.


It'll happen pretty quickly, it takes less than a weekend to build an MVP and I've done it. I'm pretty sure this is the new todo list app given how fundamental and easy it is.


A well-integrated LLM is obviously going to be much more useful than ChatGPT is today, but it's not going to be the golden bullet for all the problems with it.


A big advantage with the "new bing" compared to ChatGPT is it'll tell you its sources and you can verify that A: they're trustworthy and that B: it hasn't just turned the source into total garbage. So I hope that direction is the future of this sort of stuff. Although a problem seems to be a lack of access to high quality paid sources.


If an investor were to ask ChatGPT to summarize the state of the art on the software stack I'm working on, and its core algorithms (which have wikipedia pages and plenty of papers), they'd get the wrong impression and might consider us liars. I know because I tried this. The results were flat wrong.

That's what I'm really worried about. We know very well what our system can do and be trusted to do, but if you know just enough to ask the wrong questions, you'll get your head twisted up.


That relates directly to what my question to the GP poster would be: how do you deal with the fact that GPT frequently just makes plausible-sounding shit up? I guess the answer is that you have to spend 10x as much time validating what it says (c.f. the bullshit asymmetry principle) as you do understanding what it is trying to tell you. That doesn't seem like a big win from the POV of wanting to learn stuff.


Agreed. Except I had the time and opportunity to test it out, not with the code it wrote, but doing it from scratch.

It helped connect some obscure dots for me faster, than I could do myself.


Yes, GPT 3.5 and less is particularly dangerous in this regard. I'd try this again against GPT-4 which should perform better. Essentially false information seems to go away _slowly_ with increased model sizes but then you hit the performance/hardware requirement barriers instead, and run into awkward token limits or high running costs.

ChatGPT thought hippos could swim when I asked but only GPT-4 realized they can't, but instead walk along the sea floor, or leap forward in deeper waters. That's a simple test for wrong inference since you'd _expect_ otherwise, given they spend so much time in water.

I wonder if we are truly "there" yet or if we at the very least need a more optimized "GPT-4 Turbo" for some real progress. Until then, we may hallucinate progress as much as the AI hallucinates answers!


"since you'd _expect_ otherwise, given they spend so much time in water."

I think here it doesn't understand or conclude that hippos can swim because they are often in water, I think people wrote on the internet a lot that they can swim and it found some association between the terms hippo, water and swimming. Am I right?


Absolutely. It didn't infer anything. It just tried to predict how an educated human would respond to such a question, based on its corpus of knowledge.


I've tested it on a various areas of expertise in physics that I am familiar with and it often makes up content which sounds very plausible except that it gets pretty much always details wrong in very important ways. On the other hand I've found it very useful in providing reference articles.


Even if ChatGPT can't fully grok a specification, I wonder how well it could be used to "test" a specification, looking for ambiguities, contradictions, or other errors.


I am not sure LLMs in general and GPT in particular are needed for that. In the end any human language can be formalized the same way source code is being formalized into ASTs for analysis.

A good specification or any other formal document (i.e. standard, policy, criminal law, constitution, etc.) is already well structured and prepared for further formalization and analysis containing terms and definitions(glossary) and references to other documents.

Traversing all that might be done with the help of a well suited neural network but only on the grounds of correctness and predictability of the network’s output and holistic understanding of how this network works.

As of now, the level of understanding of inner behavior of LLMs (admitted by their authors and maintainers themselves) is “the stuff is what the stuff is, brother”[]

[] - https://m.youtube.com/watch?time_continue=780&v=ajGX7odA87k


I feel this is closely related to the Gell-Mann Amnesia effect.

There is an unearned trust you have granted ChatGPT. Maybe it's because the broad outline appears correct and is stated in a confident manner. This may be how good bullshitters work (think psychic mediums, faith healing, televangelists). They can't bullshit all people all the time. But they don't need to. They only need to bullshit a certain percentage of people most of the time to have a market for their scam.


It's bad at fairly exotic topics and it's unable to admit that it doesn't have any understanding of the topic. It's wrong to generalize from this though. My experience is that its pretty knowledgeable in areas that aren't niche. I wouldn't recommend solely relying on it, but it has boosted my productivity quite a bit just by pushing me in the right direction and then reading the relevant parts of the docs.


Yeah, I used it today to interact with some database stuff I have some passing knowledge about and it told me a bunch of wrong things. Though it also helped me solve the task; ChatGPT at least requires you to somewhat know what you are doing and to always be on your toes (I heard GPT-4 is better? Don't have access though)


oh I have another anecdote! I was just now looking up Python `iterators` vs `generators`, where I asked:

> In Python, what differentiates an iterator from a generator?

and it answered:

> [...] An iterator can be more flexible than a generator because it can maintain state between calls to __next__(), whereas a generator cannot. For example, an iterator can keep track of its position in a sequence, whereas a generator always starts over from the beginning. [...]

Which is flat-out wrong! Of course a generator can preserve state, otherwise how could it even work! (a generator preserves state implicitly by being a closure)"

(Of course, ChatGPT is an evolving system so yes I provided feedback on this. I'm sure in a year or two, I'll be looking back at this gotcha! and cringing.)


When a model gives back false information, does it continue to try to back it up on subsequent prompts? Essentially: can it weave a web of lies or will it course correct at some point?

Not sure why I’m downvoted, it seems like a valid question…


Also unclear why you were downvoted, but so it goes.

With chatGPT 3.5 when I correct inaccuracies in subsequent prompts it responds "I apologize" and updates responses. Switching to 4 in the dropdown menu and repeating the same prompt gives me generally more factually correct responses.

I am mainly testing so not worried about inaccuracies, but kinda funny that I am now paying $20/month to train another company's revenue generator ;)


This has been my experience beyond anything that is somewhat trivial.


I also recently had an issue where a systemd parameter was introduced in version 229 and ChatGPT was confident it was available in 219.


ChatGPTs strength isn't in solving new problems, but in helping you understand things that are already solved. There's a lot more developers out there using these tools to create react apps and python scripts then there are solving race conditions with USB 2.0.


I just asked it to explain a problem that actually has been acknowledged to exist in USB errata from 2002 - not to solve anything.

It took me a while to realize that what I was experiencing was this particular problem, but I already did all the hard work there and only asked it to explain how it fails.

I also recently tried to use it to write a code for drawing wrapped text formatted a'la HTML (just paragraphs, bolds and italics) in C, again, just to see how it does. It took me about 2 hours to make it output something that could be easily fixed to work without essentially rewriting it from scratch (it didn't work without edits, but at that point they were small enough that I considered it a "pass" already) - and only because I already knew how to tackle such task. I can't imagine it being helpful for someone who doesn't know how to do it. It can help you come up with some mindless boilerplate faster (which is something I used it for too - it did well when asked to "write a code that sends these bytes to this I2C device and then reads from this address back"), but that's about it.


Ah, but did it have to write a bit-banged I2C driver with multi-master support first, or could it just call "i2c_write()" and "i2c_read()?


The latter of course - I told you it did pretty well after all :)

It's a machine that saves you multiple copy'n'pastes, so you can just do a single copy'n'paste.


I stand corrected


I've been comparing it to a very excitable intern. You ask them to explain complicated topic, or design a system, and then they go off and spend three weeks reading blog posts about the subject. When they come back, they eagerly and confidently recite their understanding. Sometimes the information is right, sometimes it's wrong, but they believe themselves to be a newly-minted expert on the subject, so they speak confidently either way. The things they're saying will almost always sound plausible, unless you have a good level of knowledge about it.

If I wouldn't trust an eager intern to educate me on it, or accomplish the task without close supervision, I don't think it's a productive use of an LLM.


Well, it was trained on Stack Overflow responses.


I've not done much with ChatGPT but so far my personal impression is that, to make a school analogy, it is like it is a kid a year or two ahead of me who took the class I'm asking for help on but didn't actually do too well.

Their help often won't be quite right, but they will probably mention some things I didn't know that if I look up those things will be helpful. Sometimes they are right on the first try. Sometimes they are just bullshitting me. And sometimes they brush me off.

Examples:

1. I asked it how to translate a couple of iptables commands to nft. It got it right. I then asked what nft commands I would use on Linux to NAT all IP packets going through the tun0 interface. It got that right too, giving me the complete set of commands to go from nothing to doing what I had wanted.

So here it was the kid who was ahead of me, but he did well in his networking class.

2. At work we are looking into using Elavon's "Converge" payment processing platform for ACH. I asked ChatGPT how you do an ACH charge on Elavon Converge.

It gave what I believe is a 100% right answer--but it was for how to do it using their web-based UI for humans. That's my fault. I should have specified I'm interested in doing it from a program, so my next question was "How would I do that from a program?".

With this it was the kid who did OK in class, maybe a B-. It gave me an overview of using their API, except it said it involves a JSON request object send as theyt payload on a POST when in fact it uses XML.

I asked what would be the payload of that POST. It gave me an example (in JSON). All the fields names were right (e.g., they corresponded to the XML element names in the actual API) and it included a nice (both in content and formatting) description of each of those fields.

This definitely would have been useful when I was first trying to figure out Elavon's API.

3. I then decided to see how it did with something nontechnical. I asked it what is the OTP for Luca fanfiction. It dodged the question saying that as an AI language model it doesn't have OTP information for any fandom, and said it is a matter of personal preference. I also asked what is the most common OTP for Luca fanfiction since that is an actual objective question, but it still dodged.

I then tried "Kirk or Picard?". It gave me the same personal preference spiel, but then offered some general characteristics of each to consider when making my choice.

4. I was automating some network stuff on my Mac. I needed to find the IPv6 name server(s) that was currently being used. It suggested "networksetup -getdnsservers -6 Wi-Fi". I was actually interested in Ethernet but hadn't specified that.

Two problems with its answer. First, there is no -6 flag to networksetup. Second, -getdnsservers only gets dnsservers that were explicitly configured. It does not get DNS servers that were configured by DHCP.

I think the right answer is to get the IPv6 DNS servers that were obtained from DHCP is e.g. "ipconfig getv6packet en0".

I also asked it how to find out which interface would be used to reach a given IP address. It suggested using traceroute to the IP address, getting the IP from the first hop of the router that my computer uses to reach that network, and then looking for that IP address in inconfig output to find out which interface it is on.

That doesn't work (the IP of the router does not appear in ifconfig output), and unlike many of the earlier things it got wrong this doesn't really even send you in the right direction.

The right answer is "route get $HOST".

The final networking question I had for it was this:

> On MacOS the networksetup command uses names like "Ethernet" for networks. The ifconfig command uses names like "en0". How do I figure out the networksetup name if I have the ifconfig name?

It said "networksetup -listallhardwareport" and told me what to look for in the output. This is exactly right.

It probably would have taken me quite a while to find that on my own, so definitely a win. The earlier wrong answers didn't really waste much time so overall I came out ahead using it.


Last line, absolutely correct. Spew super-superficially-plausible nonsense to technical questions. Tell it it's wrong and it will either (or both) spew only superficially plausible nonsense and apologize that it was wrong.

Right now it's a great party trick and no more, IMO. And if one gets into more "sociological" questions it spews 1/3 factoids scraped from the web, 1/3 what could only be called moralizing, and 1/3 vomit-inducing PC/woke boilerplate. My only lack of understanding of its training is how the latter 2/3 were programmed in. I want only the first 1/3 supposedly-factual without the latter 2/3 insipid preaching. If a human responded like that they'd have no friends, groupies, adherents, or respect from anyone, including children.


If a party trick just saved me an hour on a piece of code I was banging my head against the wall with I want to attend more parties.


Presumably you want to attend headbanging parties still.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: