Outperforms Physicians answering questions on a public subreddit:
> In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit’s r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question
They didn’t go to physicians in a patient setting. The physician answers were taken from Reddit threads where they were interacting with people who were not their patients.
Reddit has its own dynamic and people tend to get snarky/jaded. Using this as a baseline for physician responses seems extremely misleading.
If they went to a physician in a patient setting, they may have had an experience like this:
-Made to wait 45-60 minutes past their appointment time bombarded with pharmaceutical advertisements
-Spend another 20 minutes sitting in the examination room staring at pharmaceutical company sponsored ads
-Nurse takes your history with a bunch of redundant questions they already have the answers to
-Finally the physician arrives
-Assesses the patient without laying their hands on them
-Ignores everything you say
-Does no tests and cultures no pathogens
-"take some antibiotics and I'll bill your insurance company $500, see ya"
If we’re comparing worst-case scenarios: ChatGPT might confidently hallucinate a very valid-sounding explanation that convinces the patient they have a rare disorder that they spend $1000s of dollars testing for, despite real doctors disagreeing and test results being negative. Maybe it only does this once in a long series of questions, but the seed is planted despite negative test results.
Then when the patient asks ChatGPT if the tests could give false negatives, it could provide some very valid-sounding answers that say repeat testing might be necessary.
This isn’t a hypothetical. It’s currently happening to an old friend of mine. They won’t let it go because ChatGPT continues to give them the same answers when they ask it the same (leading) questions. At this point he can get ChatGPT to give him any medical answer he wants to hear by rephrasing his questions until ChatGPT tells him what he wants to hear. He’s learned to ask questions like “If someone has symptoms _____ and ____ could they have <rare disease> and if so how would it be treated?” A real doctor would see what’s happening and address the leading questions. ChatGPT just takes it at face value.
The difference between ChatGPT and real doctors is that he can iterate on his answer-shopping a hundred times in one sitting, whereas a doctor is going to see what’s happening and stop the patient.
ChatGPT is an automated confirmation bias machine for hypochondriacs.
Ah, yes. Hypochondriacs. I was a hypochondriac for years until I was able to get an appointment timed such that my symptoms were physically present while I was being assessed (not easy if you have a disease that comes and goes). I really hope you’re not a medical professional that interfaces directly with patients.
I was a hypochondriac for decades. I was eventually cured after decades of cropping hypochondria through self-diagnosis of the actual medical condition I had that became formal diagnosis eventually resulting in treatment through some modest lifestyle changes.
I'm a BIG supporter of using things like ChatGPT, google, and sci-hub to do your own medical research because the whole system where some physician diagnoses you based on an extremely limited amount of data collected in a haphazard manner after a few minutes of observation because he's experienced and smart or whatever is incredibly dumb. The way people hold it up as the ethical standard which we cannot deviate from because it would be too dangerous is utterly baffling to me. The status quo totally lacks ethics and mostly serves to line the pockets of a cartel of doctors with a monopoly on access to medication and treatment who often condescendingly think patients are simply too irrational to treat themselves without their help.
I legitimately cannot wait for this field to mature and medical self-help with AI assistance becomes the norm.
Fellow hypochondriac here. I was at the point where doctors, hospital staff, and lab techs would immediately warn new practitioners about me so they wouldn't waste finite medical resources in a small town, and I just completely discontinued normal activities out of terror as a result.
When I finally blacked out and fractured my spine, first responders detected a lifelong cardiac arrhythmia in the back of an ambulance. Only with that knowledge have I been able to receive treatment and begin to heal emotionally from the gaslighting and medical abuse I experienced while in the care of licensed professionals.
AI-assisted medicine will prevent so many of these mistakes in the future. It can't come soon enough as far as I'm concerned.
Wow, the doctors that tried to block you out completely within their network of friends must have balls of steel, what with zero fear of legal repercussions!
It was more a matter of how misinformation malignantly spreads I believe.
I had to see the only available PCP to see anyone else. This automatically prompted medical releases that, even if unethical, would have still made everything technically legal. If it was an emergency room trip, there were always the same two or three physicians there, so they all became aware of me from the first couple of episodes and could warn any specialist they referred me to see.
Same deal with laboratories and radiological facilities. When you've got only one or two options in town, they have your consent to release PHI by default if you ever want the results interpreted, and their interpreting physician can just accompany the report with a courtesy call to the receiving provider about a suspected diagnosis.
I agree with studying the field to help you understand your own health, but I prefer sci hub or any peer reviewed source over an LLM. I'll revise this view as LLMs develop, but right now I'm seeing plausible bs as often as I see good advice.
I was the same way with this while I travel--it's definitely the future. I'm working on AI healthcare assistant where you can summarize a conversation with our plugin on ChatGPT or bot on WhatsApp and then send it to a real doctor to continue the conversation.
I hope that more founders build and innovate in the field to provide efficiencies throughout the whole system to lower costs and provide high quality care for everyone that needs it.
Some insurance providers need a primary care physician for referral, but some do not, so one area we're doing research is if we can do referrals through doctor follow-up/verfication from summary of chat.
So do you not believe actual hypochondriacs exist or something? They do.
Especially the people that will read about diseases and get anxious they have them, and while you can validly say they have anxiety issues they don't have whatever they just read about 99% of the time.
Even if there's a real disease of some sort, you don't want to diagnose with the latest guess in someone that keeps guessing different things. Their treatment needs improvement, but confirmation bias is not how you do it.
It has a lot to do with whether "ChatGPT is an automated confirmation bias machine for hypochondriacs" is a valid worry or something that should disqualify you from being a medial professional dealing with patients.
Early this year I had cold symptoms that just could not get away. I was getting sick, recovering and getting sick again. After some 2 - 3 weeks I have accidentally learned about long covid / cfs. I have immediately made myself believe I had this and experienced sharp worsening of symptoms - at some point feeling so tired I thought I was going to pass out. I was doing that pacing thing to the extreme, believing I have no energy, for some 2 weeks, my life was hell of only about 1 - h / day of not laying and doing nothing. Not to mention the severe trauma of thinking that my life is over. Trying to do whatever cognitive activity for more than 10 min resulted in a severe headache. After 2 weeks I got covid, I was still laying down all day anyway but I passed it with minor symptoms. 2 days after I thought, wait a minute, maybe it was all fake symptoms, the sharp increase in symptoms after learning about it, the fact that it was not covid that initiated it in the first place. I felt a bit off, slightly a bit off for a few days but I was able to carry on with my days as usual and then I felt 100% well. The problem is that I have got slightly sick one month later and experienced similar, cannot get well, for some one week.Then 2 months later, I got sick again, and it was the same thing with recovering and feeling sick. It's round and round, feeling shitty for 2- 3 days (able to do work though), feeling good for 2 - 3 days, I have had this for a month now but each time it occurs it has a smaller impact. I have yet to find out what it is for sure, I hope it's all made up, whenever it hits I'm actively trying to convince myself that it's not real and try to ignore it. Anyway, I have learned the hard way that hypohondria is not just anxiety / fear about falling ill for some disease, the imaginary symptoms can be much more severe than "my chest hurts, do I have some heart disease?".
I was also a hypochondriac in high school and college, sleeping 12-16 hours a day and still being completely exhausted! Apparently what I really needed was more exercise. The CPAP machine I eventually got after ignoring my PCPs diagnosis merely serves as a placebo, but very effective nonetheless. I don't bother him with my delusions anymore.
I think we can agree that both of those cases aren't worst case scenarios, but rather the only case, or best case scenarios (in the US). The only real way of dealing with health (in the US) is just to never go to a physician and to avoid healthcare at all costs. It will never provide any real answers, despite charging amounts that no individual will ever be able to pay.
The only thing that healthcare in the US really seems to be capable of dealing with is ER, or more specifically clear topically observable physical injuries by mechanical force. We're still not good enough at it to fix things to be as they were before an incident in all cases, but we're closer than almost any other area of medicine.
Your totally generalized dismissal of U.S. healthcare, which despite its many known flaws, routinely treats millions of people in the U.S. per year very effectively, is way the fuck off base and absurd. Not all doctors are the same, not all patients are the same and not all outcomes are worst case scenarios on the part of doctor incompetence. I really hope nobody else takes seriously such a blanket pile of exaggerations to the point of nonsense.
The subtext you're missing here is that GPT with access to the entire corpus of medical data could undermine the entire money printing machine (referring to US healthcare here). What test would cost thousands of dollars if the only human cost to run it is drawing some blood and putting it in a machine?
Abstractions like that are meaningless. Why is the large hadron collider so expensive it’s just throwing stuff at each other and looking at what happens? Random teens could do that…
The actual process of doing blood work is often surprisingly complicated.
Funny enough, I'm actually developing an AI-powered biomedical technological breakthrough that's about to disrupt the medical industry. It uses wearables to enable a blockchain-enabled data management system that also functions as a cloud based SaaS provider, linking with ChatGPT and NFTs to create value for those who are underprivileged, all with only 1 drop of blood. If you want to further this please give me a few billion dollars and I promise something might come :)
No, the variability between individual drops of blood are too high for the precision needed to do most of the tests they were proposing even before you look at the tiny fraction of a drop you would be working with to preform multiple tests. Worse you get excessive contamination between blood and other substances from a finger prick.
So, no very quickly people were calling Theranos either incompetent or fraudulent.
"the tests they were proposing" - yeah, I would agree that some (probably most towards the end) of the tests they were trying to make part of the product were a big part of why they were so fraudulent.
I agree that it was a bad company.
My point is that the general idea was not a bad idea, and because it was so badly mismanaged (trying to add too many products and run before walking, using other companies' tech illegally, etc, etc) we now are worse off because few people want to attempt doing the possible ideas in this space (because people think none of it is possible - which is simply not true).
The company that could reliably be made is far less 'attractive' of course, but it would still be quite beneficial.
Of course... that's absolutely true, and I didn't say otherwise.
However, the basic idea actually is possible, but only for a very select few of the tests they proposed (which they still unfortunately didn't really get right of course).
My point is that, unfortunately, the tests that are possible to do now won't get a company to form out of it - simply because they ruined the scene for everyone. Anyone trying to get a company going in this area and raising money for it will be met with a complete block of resistance because of the precedent set, even if the technology is actually completely reliable.
The last test someone I know had done cost $600 at the physician.
We looked it up and it cost a little less to *buy the FDA approved equipment for your house, and run the test every day for an entire month*.
It's absolutely heinously broken - almost completely due to insurance companies. Insurance should be made illegal and obsolete due to the insane hindrance it is to progress in humanity.
There’s often many tests for the same things with various levels of accuracy and cost.
You really didn’t provide enough information for a more detailed analysis, but for example there’s a reason hospitals don’t just use the cheap COVID tests and call it good.
It wasn't a viral test, and I didn't lay out details here because I don't want to put all my health information on display exactly, but I have several graduate degrees in both engineering and in biology related fields, so I'm quite certain that my analysis of the equipment and technical parameters is correct. It's the financial aspect of healthcare that is horribly wrong, not the science (only let down there is an academic one, as it's awfully basic and antiquated technology that makes it to the clinic, but that's not a bug - it's a feature for safety).
There is a reason I mentioned FDA approved equipment though - it is substantially more expensive to have that label (despite being able to purchase items that claim similar parameters, those need to be exhaustively tested). Despite this, the statement that purchasing some types of equipment for your home and using it regularly, is still often far cheaper than going to the physicians office and getting a test once.
I suppose I am a bit out of the typical demographic though, as I would rather use my own TEM made in my garage to do quick analyses of lab materials than take it to my colleagues across the hall and sign up for a slot on their more resolute EM. Being able to work independently and have your own data is a nice thing sometimes.
I hate doctors and pharmaceutical companies as much as anyone, but those tests are serious business. There's a lot of very hard science and engineering involved in them
When they're cheap outside the USA, it often just means the companies aren't attempting to recoup any r&d costs outside the US
Surely it isn't pharmaceuticals companies doing blood tests in the US? In the UK people's bloods get done in hospital labs (phlebotomy doesn't always occur in hospital though)
Happened to my granma when she was 90. She had convinced her doctor prescribing her 5 or 6 drugs. She swallowed them until she was so weak she had to go to hospital, where the tests revealed that she was 100% sane. She didn't like it.
It depends on the location. In Norway they are usually perfectly on time, you don’t have any ads, they do the required tests and examinations, they listen to you, they take the time to explain everything and will do drawings if necessary, they tell you to do more sport, and you get billed about $25 (but it becomes free if you spend more than $300 in visits and medications during the year).
No, I don't think the oil fund (https://www.nbim.no/en/) is responsible for the culture to be on time or is connected directly to the public healthcare finances. Though being a state with money helps, obviously.
USA's mean household Net Worth is $500K more than Norway's, and average household is about 3 people (some households are single people) so the oil fund approximately cancels that out.
There are other places with a similar experience that don’t have the same oil wealth. I wonder though if it’s different at the higher end of things, like expensive treatments for major issues which a tighter public purse might not stretch to.
Perhaps you could explain how a fund that doesn't invest onshore does this. The government uses only a small percentage of the return on the fund (not the capital) each year.
My experience (N=~5) is that the waiting times for legevakten are in the order of a couple of hours for anything non trivial. I have also observed some reluctance on preventative tests at the fastlege.
I don't mean to contradict your observations, but our experiences could have been different based on our age, location, gender and what we're being treated for. Selecting a good GP matters a lot too.
I personally have a good exprience with good doctors (which one, with some time, can find), but perfectly on time is not realistic, as time with patients can't be predicted.
Have you ever seen those brochures and flyers in the waiting room (and sometimes even the exam room)? Those are almost entirely supplied to the practice by pharma reps. Every U.S. doctor's office I've been in for the last four decades has been full of them.
Not to say that you had a very unique set of experiences but having been in many doctors offices and worked in a number, in the US there are generally brochures, pamphlets, pens, note pads, and the worst of all TV SCREENS advertising pharma crap.
The only place I haven’t seen brochures is my direct patient contract doctor (concierge) and even then he has natural herbal supplements for various things.
So just saying your experience is atypical. But it sounds like a dream.
Curious, are there any references? I have seen zero. Of course, I see tons of ads on TV so if the Dr.'s office has TV on, that may be one source. But I am curious about ads outside of the TV, haven't any like that in a Dr's office.
It only takes one person's evidence to prove your generalization wrong. My last visit to a podiatrist had dozens of ads on the walls and television that was airing nonstop commercials for medical products. Nothing makes for a better captive audience than excruciatingly long wait times.
-“take some antibiotics and our billing dept will bill your insurance company for the maximum amount they estimate your insurance can pay, plus some margin, see ya”
> Nurse takes your history with a bunch of redundant questions they already have the answers to
If you’re a nurse and your job is to take histories, it’s better to take the history in the same way every time, systematically. This minimizes the chance of making mistakes.
Moreover the notes may be wrong, or you may give a different answer this time around. You might give an important detail this time around that you didn’t give the last several times, which actually turns out to be consequential.
It might seem like a waste of your time but it’s really not. Measure twice cut once.
If I had a dollar for every time I as a paramedic had a patient tell me "no allergies" and then list off a bunch when we get to the hospital, or "I only have high blood pressure, I'm pretty healthy", and hands over a box of half a dozen or more prescriptions, or their spouse says "Well, he did have a stroke and a bypass last year"...
Seems like a very weird hill for you to die on, considering the provider's desire to ensure they have an accurate history for you to be a "problem".
How in any sense of the word is it "dismissive" to re-verify history, medications, and allergies?
Who said anything about dismissing or doubting a history or complaint? "So, just to review, your medical history includes X, Y and Z?" or "Says here, you take A, B and C, is this correct?"
That's not dismissive, but important, especially if I'm going to be administering potent cardiac or other medications to you via IV, that could have serious complications.
The main problem is when you take your medical history and current medications on a tablet on intake and then ask the same questions once you get in the room.
I understand that you are of sound mind and likely above average intelligence and likely interested and capable of providing accurate and factual statements but… most people they see are not.
People get their medications wrong just as likely as they get it correctly. This is in addition to many people being on complicated medication as regimens and/or just mostly disinterested in change.
The questions are one thing, the reviewing with a person who make above assessments is still, for now at least, a human job.
If you can’t deal with a one-character typo in a comment, then ChatGPT certainly has you beat. The bot paranoia is the icing on the cake.
Calling this an “internal process” that’s none of your concern, a much more egregious wilful misrepresentation of this situation. There is a situation or phenomenon of human behaviour, this is how the healthcare system deals with it.
Who are you as presumably some software person to come in telling them to knock the gate down without understanding why it was put there in the first place? God knows you’d hate it if someone did that to you in your area of expertise. I understand that the proliferation of VC-backed money-losing companies which parachute clueless software people into other domains have given developers an undue sense of transferable expertise, but perhaps exercise some self-awareness.
> If you’re a nurse and your job is to take histories, it’s better to take the history in the same way every time, systematically. This minimizes the chance of making mistakes.
Citation needed. For example, patient may get bored and only answer the first few questions accurately each time.
Like I explained more fully above, you’re likely pretty smart and of sound mind. A lot of people aren’t and/or the system should have safeties to catch people who are slipping.
Screw with the answers if you want but don’t pretend you are sticking it them or some kind of agent of change blazing the path of defiance.
This sounds like hell but it's also infinitely different from how it works here (in Western Europe, but I'd assume it's different from almost anywhere but the US). Also, pharmaceutical ads are forbidden here, and antibiotics are very sparsely used.
Indeed; my experiences have been UK and Germany, and while the UK is currently experiencing chronic failure in timelines due to the combination of {chronic understaffing} with {strike action caused by {{chronic low pay} and {extra stress from acute overwork}}}, I've never seen a single thing advertised in medical facilities of either country.
It’s also not actually what happens in the US either. The poster is lying for whatever reason, probably because making negative comments about healthcare is good for Internet points.
Eh, no, all of these definitely actually happen in some places, but it would have to be a really shitty place for all of these things to happen in one visit.
It may not be that simple, especially in regions where there isn't anything resembling a free market in the health care sector.
Canada's various provincial public health care systems (with a corresponding lack of private health care offerings) tend to be like that.
Many Canadians don't have a dedicated physician. Even if they want one (or want a new one), it's often difficult, if not impossible, to find one who's close by and who's accepting new patients.
Having a dedicated physician still often results in an experience much like that other commenter described. It's not an exaggeration. Long waits even with appointments, rushed examinations, and low-quality service are the norm.
Another option, which is sometimes used even by people who have dedicated physicians, is a walk-in clinic. Unfortunately, they can be quite rare and inconvenient to get to, even in Canada's largest cities, assuming they're even open when you need them. You'll usually face an even longer wait, even less time with the doctor, and typically see a different doctor if any sort of followup is needed.
Then there are hospital emergency rooms. That usually means getting to the nearest sizable city, and even once you're there, you've got to be prepared to wait many hours, even for relatively serious situations.
Ultimately, in Canada, it doesn't matter whether you see your family doctor (if you even have one), use a walk-in clinic, or go to a hospital emergency room. It's going to be a horrible experience, and there's pretty much nothing the average person can do about it.
Given the lack of competition and due to other government-imposed market distortions, there's no incentive for doctors to offer anything resembling good service to the general public.
The best situation is to have a doctor who's a close friend or family member, and who may be able to help mitigate at least some of the typical problems.
The next best option for Canadians, assuming they have the money for it, is often to seek treatment in the US or overseas.
Just "finding a new doctor" isn't feasible, unfortunately.
Being forced to wait for routine care isn’t a big deal, but having a doctor unwilling to listen to you is a life threatening situation.
If your local area lacks access to proper medical care that’s a serious enough problem where you should move. Medical care is like clean drinking water or working breaks for your car, it’s not optional.
- Your doctor has the incredible ability to understand your symptoms without even listening to you
- Your 40 minute in-depth appointment is finished in an amazing 5 minutes
- They can do this amazing appointment time compression because of an encoding technique called "one size fits all"
- The physician's assistant is so efficient that they've already scheduled your follow-up appointment - for three months from now, because they know you just love the suspense.
Most communication is non-verbal, with 7% being attributed to words (see the "55/38/7 Formula"), so they may need to listen to how you say what you say (vocal), but don't have to believe the words you say.
Accurate. This is a result of the heady combination of relying on the profession to enforce ethical codes, and letting corruption become standard practice, and therefore common and safe. That medicine is NOT a free market is easy to prove, because as soon as you posit a provider that keeps their appointment times, charges a reasonable fee for a short consult, actually looks at you and performs tests himself, relates to your whole being as a human on this planet, and charges something like $100/hour, cash - imagine how that poor schmuck is going to fare in this hard, harsh world of ours. As soon as they need to interface with literally any other part of the system, the other providers, pharmacies, etc will not be able to handle them - and won't want to. Won't need to. Many of their colleagues will sneer at them or even refuse to refer. Some patients, the sociopaths, will sense weakness, the lack blood-thirsty corpo apparatus backup that deals with 10 frivolous lawsuits a day, and say "what the hell" and sue you for malpractice simply because they think you're unprepared and they can win.
Where is this happening? I've had long wait times in Germany, but low cost and good care. In the US I have very little wait time and high cost and equally good care. The doctors unfortunately don't go all Doctor House on me, but they certainly aren't shying away from touching me and physically inspecting any issue and ordering follow-up tests. The results aren't always very satisfying, but I understand that there are limited to what's reasonable to invest in researching relatively mild symptoms.
Meanwhile my mom has had cronic pain without any proper diagnosis. However, the German public health care system must have spent €50-100k in tests and to my astonishment she is currently in a clinic for 3 weeks that focuses on undiagnosed pain. As close to Dr House as it gets.
Yes, yes it is, because it's an intrinsic advantage the human doctor could have that's being ignored.
Palpation, especially for musculoskeletal issues, is an incredible diagnostic tool. And manual therapies are surprisingly often an effective alternative to surgery or drugs.
I just want to point out, this may not be the physician’s fault.
A friend of mine, who is a primary care physician, just quit her job after the clinic changed its policy so that she had to fill her schedule with 10 minute appointments (6 per hour) and double book at least 10% (because the clinic was losing SO much money to no-shows /s). She burnt out after 6 months of this and moved somewhere they give her a more reasonable 15 minute time slots and no double-booking…
All this is to say, it may be a doctor thing, but I would bet it’s more likely a clinic trying to squeeze as much money as possible, and the doctor getting none of the extra profit from it.
So...if we have ChatGPT replace the doctor, as suggested, how would that be an improvement? I'm not sure how having a computer say "take some antibiotics and I'll bill your insurance company $500, see ya" would be any better.
I can count on pretty much 0 hands the number of times I had an on-time appointment with a doctor. I understand they have things going on that drag out their schedule a bit, but my God.
My experience with doctors in a patient setting is that they have generally been far less empathetic than the doctors in /r/AskDocs. Those doctors seem to be motivated to help and actually show some empathy. While Reddit on the whole is snarky, many subreddits (especially heavily moderated ones like AskDocs and AskHistorians) have a completely different tone.
My experience with most doctors is that they are among the least empathic people I have ever dealt with. I think that using AskDocs actually gives doctors an unrealistic _advantage_ in the study.
Considering how difficult getting this kind of dataset is, I would say that using r/askdocs would be a great place to start. Doctors' responses are labeled as such and the subreddit looks healthy and well moderated. Of course it's not perfect and it's important to point out possible issues with using r/askdocs, but I don't think your criticism of reddit being snarky or jaded holds much weight here.
And God knows if those claiming to be doctors there are actually that...
It's apparent, at least in the US, that a lot of GPs can be unhelpful but believing that accurate diagnosis via text without providing evidence, or going through some type, of physical examination is feasible by Reddit expert docs, only demonstrates lack of critical thought.
They say that they are verified. For heavily moderated subreddits like AskDocs, I think it's highly likely that the mods take it seriously enough to maintain quality.
In almost every HN discussion about research, the top comment criticizes the validity of the research. Perhaps we should learn that research doesn't work the way we imagine.
The research is fine if you actually read it and understand what they’re researching.
It’s almost always the headlines and PR pieces that exaggerate it.
“ChatGPT is more empathetic than Reddit doctors” isn’t interesting. Strip the “Reddit” out and then everyone can substitute their own displeasures with doctors and now it’s assumed true.
Not to mention knock-on-effects of teens trying to decide what to be when they grow up. Some would-be-docs are going to read this and go, huh, maybe I should pick a professional field that AI won’t take over. And bam, in 15 years, there are slightly less docs.
At least in the US there are more people who want to be doctors than positions available. The bottleneck is medical school acceptance rates. To practice medicine you need a medical license, which you can only get from an accredited university.
I knew students who were rejected from medical school, but I also knew far more students who were at one time pre-med, saw how much effort and debt that would-be doctors would need to take on, and saw the risk of pursuing an e.g. biology degree where their entire future hinges on getting into med school, and they chose a different field.
There is a subgenre developing of excited publications/articles with the high level category of "ChatGPT does X!" where the moment you dig into it it turns out that there's a huge silent "but" that it's performance in "X" is completely unverified by anyone worthwhile.
So we have here "ChatGPT performs better than doctors *"but* we sampled on an unpaid internet forum"
The other one thats doing the rounds where noone is verifying the source is "ChatGPT develops lethal chemical compounds *but*" It would be too unethical to verify with an actual chemist if any of these compounds it dreamt up actually do anything".
ChatGPT is showing how bad the economics of journalism are for understanding a topic outside of taking press releases at face value*
And had they done that, then no matter how they did it, they would be criticized for being 'unethical' and criticized for either keeping the data secret or not keeping the data secret.
The more I use ChatGPT (including 4.0 version, available with Plus subscription) the more I think these "Studies" and articles are simply made up.
For my experience is terrible for any question I ask:
- if it doesn't know well a person it simply makes up things
- if I ask code examples or scripts, most of the time they are wrong, I need to fix them, they contain obsolete syntax etc...
- if I'm asking a question, I'm expecting being asked for more context if the subject is not clear, instead it starts spitting text without even realising I asked a completely different thing
etc...
I could go on for hours with other examples, but I'm seriously not finding it useful
My experience has been polar opposite with GPT4. As long as I structure my thoughts and present it with what needs to be done - not like a product manager but like a development lead, it spits out stuff that works on first try. It also writes code with a lot of best practices baked in (like better error handling, comments, descriptive names, variable initialization).
Some times this presenting of problem to it means I spend anywhere from 5-10 mins actually writing the points down that describes the requirement - which would result in a working component/module (UI/backend).
We have been trialing GPT4 in my company and unfortunately almost everyone's experience is more on the lines of yours than mine. I know it shouldn't, but honestly it frustrates me a lot when I see people complain that it doesn't work :). It definitely works but it depends on the problem domain and inputs. Often people forget that it has no other context about the problem than just the input you are providing. It pays to be descriptive.
LLMs currently have this problem where they will give confident-sounding responses to prompts where they are lacking enough context. Humans are built to read that as accuracy. It’s wholly a human interface problem.
Some humans find that style of response infuriating, but apparently we are in the minority.
It's almost like "AI hacking people's brains" turned out to happen accidentally, and a huge number of supposedly smart people are getting turned into mindless enthusiasts by nothing more than computer generated bullshit.
Agreed. It would be great if it could ask questions about missing context but maybe language models are bad for that task. Maybe one needs a second run, evaluate the answer and then make it look for info that could improve the answer.
Code yeah if you know it's pitfalls you can get it correct pretty fast. But I just don;t believe it gives better answers doctors, it makes silly mistakes that signal it doesn't understand things deeply.
I only believe if they actually trained chatgpt on those type of tests specifically.
Not the actualy dynamic nature of dealing with patients & lawsuits.
there's definitely an art to asking the questions, likely because of subtle differences in how a lot of people communicate in writing.
NLP can recognize alt accounts of individuals on places like HN and reddit, but a person would probably need to study the comments pretty hard to determine the same thing, its not natural for people imo but it seems to be the foremost aspect of any kind of model that's processing human writing.
Your experience matches mine. Everything I have asked is usually _horribly_ wrong, and even asking things in a different order makes it completely change its responses, even for otherwise binary questions. Even the "snippet" part of a Google search with the same prompt normally contains enough information to contradict it...
I'll also note that when there was hype around Stable Diffusion, one of the images shared around was that of an astronaut riding a horse. If you actually run Stable Diffusion with its default tuning and ask for that prompt, you will get 6 images, of which 5 of them are outright disasters (horses with 6 legs and going downhill from there), and then the 6th image, the only one which could possibly pass as a decent result, is the one that everyone shared and reshared and hyped. Usually other prompts give even more terrible results where there are 0 passable images without extensive tuning. Stable Diffusion now is acknowledged to actually be crap --despite the hype-- and I supposedly need to try the next best thing, whatever that is. But I find myself facing the same situation with ChatGPT 3.5, and now with ChatGPT 4, despite the fact there is no "next best thing", and I don't even know how they could even possible try to fix the problem of it being just wrong.
Agree with GPT but with StableDiffusion you are only partially correct. Visit /r/StableDiffusion to see the stuff people are making. People share prompt, seed, model settings etc and you can reproduce the exact same thing.
I do agree that it's bad at following the prompt exactly. It will produce most of the things you mentioned in the prompt but not necessarily in the same fashion you asked for. I don't agree that produced images are mostly shitty, just visit that subreddit.
> I don't agree that produced images are mostly shitty, just visit that subreddit.
SD is definitely very good in the right hands, and it’s a little unfair to expect to be able to get instant good results without any skill. It’s honestly pretty crazy that we now have things like ChatGPT and SD – and people are already calling them crap because they don’t work perfectly and their productive use actually requires some skill!
But r/StableDiffusion, or any public gallery, is obviously one giant selection effect. 99.9% of attempts could be crap and the 0.1% would still be enough to fill a subreddit.
Sort by new and you will get the idea of crap that people post on a public forum. Default sorting only shows you the popular/better content.
To give you a general idea of what percentage can be good images: I recently made a lots of wallpapers to cycle through daily using SD. I found a good prompt, a good model, and let it generate bunch of images continuously for few hours.
None of the images were shitty (they were all random seeds), only images I discarded had artifacts I didn't like or couldn't keep my eyes away from. With SD you can't just expect to give a prompt "beautiful landscape" and expect it to give you a beautiful landscape. It won't. You shall get shitty images and might get a few pleasing ones. You must tune your prompt to get good results.
This is the exact opposite experience I have had with code. While it isn’t perfect it’s ability to scaffold code is a huge productivity booster. Example. Tailwind. “How can I highlight a parent container when I hover on any of its children using Tailwind”. JavaScript “How can I merge two JavaScript objects and remove any duplicates”.
Yeah it's very good at answering basic questions which have been answered 50+ times on stack overflow &co that's for sure.
I use it for my side projects, for tech I have no experience in, and it works very well, because I know what I want, I know that it is possible and I just need it to vomit the boilerplate to save me 5 google searches
For my day job it's next to useless, and if your day job can already be automated by chatgpt I have bad news for you
The remote physician misdiagnosed my X-ray last time. That physician is easily automated out and possibly for the benefit of the patients not just costs. The other staff involved, like the NP, X-ray tech, assistant, are fine for a lot longer.
> If they can automate the work of a physician, who exactly is safe?
But they can't, it's like saying you're 15cl stove top italian coffe maker is replacing a starbucks tier coffee machine
If the only metric you account for is "it makes coffee" boolean then sure, if you actually implement it you'll notice things falling apart in the first 10 minutes.
Yeah this question is confusing without more context, because that is already how JavaScript objects work by default. If you assign multiple times to one key, only the latest assignment is preserved. It’s more complicated (though not much more) to merge two JavaScript objects while preserving duplicates.
It would be difficult to search because that is a weird question. If you asked a more practical question like "merge two arrays and remove duplicate values" you would find tons of exact results.
You should try Midjourney. It is far more likely than SD to give great results on the first try, even when you're not good at writing prompts.
Advanced users of both Midjourney and SD can get some stellar results out of them. Some of that is due to trial and error, and going through dozens or hundreds of images to pick the best ones, but being adept at crafting prompts and using other features of the programs plays a big role too.
Right, another low effort deflection into the "It (is/is not) Intelligent" diversion.
You can sit in that tarpit and duke it out with fanatics and doomsayers while patting yourself on the back for "real zingers"... or you can just learn to use the tool and extract your own value from it.
"Right, another low effort deflection into the "It (is/is not) Intelligent" diversion."
Straw man.
"You can sit in that tarpit and duke it out with fanatics and doomsayers while patting yourself on the back for "real zingers"... or you can just learn to use the tool and extract your own value from it."
GPT offers zero value which reflects my position. Though, it does perform some neat tricks.
"Stable Diffusion now is acknowledged to actually be crap if you don't take the time to learn how to use it properly."
Fixed that for you.
If you have very little technical aptitude, patience, and willingness to learn, it would be better for you to stick with some thing a little more newbie friendly such as mid journey.
However, loading a proper model such as 526, realistic Vision, or ReV, using control net, and guiding a prompt to where you want it to go while making changes using inpainting/img2img/etc can result in stunning images.
I agree that the nature of the algorithm allows it to generate false results which would not be acceptable in many domains. But I think there is tremendous potential to improve what you might call the "baseline" of what information and diagnosis is available. There are great doctors out there for sure but in many places there are close to zero doctors, or worse, bad doctors. While it's a very complex domain there are much simpler parts of it. A child with a cancer tumor in their brain is a very different case than someone who has a rash and a headache. There's a great deal of regulation as well that will come into play here so it's going to take a while. I know Google has a whole LLM that is using only healthcare data that they have been working on for a while.
> okay that’s just because you’re not good at using the models.
You literally cannot respond with "you are holding it wrong" specially when I'm claiming that even for the popular _example prompts_ SD authors used they had to hand-pick the best random result over a sea of extremely shitty images.
And even in the original paper they disclaim it by saying "oh, our model is just bad at limbs". No, it's not just bad at limbs. They just happened to try examples where it could particularly show how terrible it is at limbs (i.e. spider legged horses and the like). But in truth, it's just bad at everything.
"It's bad at everything" ... bad by what standards? Just a few years ago it would have been regarded as unbelievable science fiction that a model with such capabilities would soon be available. As soon as they are here, people stop being impressed. But the objective impressiveness of a technology is determined by how unlikely it was regarded in the past, not by how impressed people are now. People get used to things pretty quickly.
Besides, there are models that are much more capable than Stable Diffusion. The best one currently seems to be Midjourney V5.
> Just a few years ago it would have been regarded as unbelievable science fiction that a model with such capabilities would soon be available. As soon as they are here, people stop being impressed.
I don't know. I've had chatbots for decades before "a few years ago", so I have never been particularly impressed. I would say that for someone who was already impressed with that you could practically describe a landscape in plain old 2000s Google Images and get a result, SD feels like just an incremental improvement over it -- the ability to create very surreal-looking 'melanges', at the cost of it almost always generating non-sensical ones. And also add that Google Images is much easier to use than SD...
> Just a few years ago it would have been regarded as unbelievable science fiction that a model with such capabilities would soon be available.
No, it wouldn't have – not to people in the know. We just didn't have powerful enough computers back in the 90s. Sure, the techniques we've got now are better, but 90s algorithms (with modern supercomputers) can get you most of the way.
Transformers are awesome, but they're not that much of a stretch from 90s technology. GANs are… ridiculously obvious, in hindsight, and people have been doing similar things since the dawn of AI; I imagine the people who came up with the idea were pretty confident of its capabilities even before they tested them.
Both these kinds of system – and neural-net-based systems in general – are based around mimicry. Their inability to draw limbs, or to tell the truth, or count, are fundamental to how they function, and iterative improvement isn't going to fix them. Iterative improvement would be going faster, if researchers (outside of OpenAI and similar corporations) thought it was worthwhile to focus on improving these systems specifically.
ChatGPT is not where transformers shine. StyleGAN3 is not where GANs shine. Midjourney is not where diffusion models shine. They're really useful lenses for visualising the way the architectures work, so they are useful test-beds for iterative algorithmic improvements…¹ but they aren't all that they're made out to be.
¹: See the 3 in StyleGAN3. Unlike the 4 in GPT-4, it actually means something more than "we made it bigger and changed the training data a bit".
What's special about that day? That's after the algorithms were developed, models and drivers were built, and most of these behaviours were discovered. I've got fairly-photorealistic "AI-generated" photos on my laptop timestamped September 2019, and that was before I started learning how it all worked.
If you're talking about popular awareness of GPT-style autocomplete, then I agree. If you're talking about academic awareness of what these things can and can't do, we've had that for a while.
What photorealistic AI generated image? In September 2019 this must have been a GAN face. I admit those are impressive, but incredibly limited compared to todays text to image models. If you look at an iPhone from 2019, or a car, or a videogame ... they all still look about the same today.
Three years ago there was nothing remotely as impressive as modern GPT style or text to image models. Basically nobody predicted what was about to happen. The only exception I know is Scott Alexander [1]. I don't know about any similar predictions from the experts, but I'm happy to be proven wrong.
> In September 2019 this must have been a GAN face.
Well, yes,¹ but actually no. StyleGAN1's public release was February 2019, and it's capable of far more than just faces.
> Three years ago there was nothing remotely as impressive as modern GPT style
I predicted that! Albeit, not publicly, so I can't proveit.
My predictions claimed it would have certain limitations, which GPT-3 (and, later, GPT-4) exhibit. I can show that those still exist, but few people on Hacker News seem to understand when I try to communicate that.²
> or text to image models.
Artbreeder (then called Ganbreeder) existed in early 2020, and it didn't take me by surprise when it came out. It parameterises the output of the model by mapping sliders to regions of the latent space; quite an obvious thing to do, if you want to try getting fine-grained control over the output. (A 2015 paper built on this technique: https://arxiv.org/abs/1508.06576)
I was using spaCy back in 2017–2018-time. It represents sentences as vectors, that you can do stuff like cosine similarity on.
If I'd been more interested in the field back then, I could have put two and two together, and realised you could train a net on labelled images (with supervised learning) to map a spaCy model's space to StyleGAN's, which would be a text-to-image model. It was very much imaginable back before April of 2020; a wealthy non-researcher hobbyist could've made one, using off-the-shelf tools.
If I were better at literature searches, I could probably find you an example of someone who'd done that, or something like it!
---
¹: That file was, because that's what I was playing with in September. I do have some earlier ones, of landscapes; they're in a different folder.
²: See e.g. here: they tell me GPT-4 can translate more than just explicitly-specified meaning, and the "evidence" doesn't even manage that. https://news.ycombinator.com/item?id=35530316 (They also think translating the title of a game is the same as translating the game, for some reason; that confusion was probably my fault.)
Nope. You literally just tried out one prompt and saw one good image and several bad ones and just shook your fist at the computer and gave up.
I'll repeat myself. You have to play around with the models and learn how to use it (just like you have to do for everything) .
> But in truth, it's just bad at everything
Thousands of people (including myself) have had the complete opposite result and have gotten amazing pictures. You can play around with the finetuning with different models from civitai and get completely different art styles too.
Like, this is so dumb I don't even know how to respond lol.
You're like some guy who got a computer for the first time and couldn't figure out how to open the web browser, so he just dismissed it as useless.
I don't think you understand the point. Your claims that "all of this needs extensive tuning and hand-holding and picking results" do not help your argument, they help _mine_.
Most egregious if you are even doing more tuning and cherry picking that the authors of the models are doing, which you definitely are.
I could spend 20,000 hours trying to learn to draw and I would still be far worse than what I could generate with Stable Diffusion + Control Net + etc.
I doubt you would be better than someone who would use Stable Diffusion for 7 years. and I don’t even include the technological advancements in the next 7 years.
I don't think writing SD prompts compounds like actually make art from scratch does. It's kind of inherent right? Because one is derivative (with a bunch of computer science) and one is from people. You can be cynical and say all art is derivative, I guess.
Learning to draw at the level of what Stable Diffusion can generate would take thousands of hours of practice, and the individual drawings would take hours.
But if you do learn, you can then render photorealistic image with nothing but pencil and paper instead of being reliant on a beefy computer running a blackbox model trained at enormous cost :)
SD will never compare to the power of pencil and paper imo. Drawing is an essential skill for any visual artist not just for mechanics but for developing style, taste, and true understanding of the world around you visually.
I recommend Freehand Figure Drawing for Illustrators as a good starting point (along with some beginner art lessons). It won't take 1k hours before you see results. It's also fun!
> But if you do learn, you can then render photorealistic image with nothing but pencil and paper instead of being reliant on a beefy computer running a blackbox model trained at enormous cost :)
Why do I want to avoid that reliance, other than to be smug to the nerds? And as far as general self-satisfaction, let's assume I would rather master a different skill with the time it would take.
Especially because that training cost only has to be done once, so on a per-person basis it beats learning to draw by a lot.
If you said something about flexibility and specificity of what you can create I could get behind that, but I think the arguments you're making are very unconvincing.
Having spent several orders of magnitude more time working on drawing than with SD, I’ll say “Drawing isn’t hard for some people”.
If drawing was that easy, no one would worry about disruption from AI image generators, because everyone who wanted images would be knocking them out by hand, not paying people for them, so there’d be nothing to disrupt.
SD isn't creation imo. I have used it + followed stuff made with it, and I don't care how much people rebrand it as prompt engineering. Its consumption. Just because some imagination is involved in the query doesn't make it creation.
> SD isn't creation imo. I have used it + followed stuff made with it, and I don't care how much people rebrand it as prompt engineering. Its consumption. Just because some imagination is involved in the query doesn't make it creation.
No, the fact that something is made with it makes it creation. Imagination makes it creative, on top of being creation.
The problem I see here is you are trying to insert your moral/aesthetic judgement of the quality/value of the mechanism/process of creation into the objective description of whether creation is happening, probably because you have subscribed to a worldview which valorizes creation and denigrates consumption so accepting something you don’t want valorized as “creation” is incompatible with that worldview.
You can just say you don't like Generative AI and wish people wouldn’t use it. You don’t need to declare that creating things with it isn’t creation to try to mask your (valid as any other) aesthetic preference in a very silly circumlocution designed to resemble an objective description, albeit a patently self-contradictory one.
I mean I actually do think AI art isn't actually art - philosophically. It's not being snooty it's based on years of studying and thinking about art from a philosophical perspective.
The best artists are the ones that adapt to both, generating an initial image and using it as scaffolding to paint over. Drawing and using a diffusion model are not mutually exclusive concepts.
I guess the difference is hammers are logical, simple tools to use with a known use case. They’re fairly hard to use incorrectly, although it does take some practice to use one, I’ll admit.
> You literally cannot respond with "you are holding it wrong" specially when I'm claiming that even for the popular _example prompts_ SD authors used they had to hand-pick the best random result over a sea of extremely shitty images.
I do a lot of my SD with a fixed seed and 1-image batches, once you know the specific model you are using getting decent pictures isn't hard, and zeroing in on a specific vision is easier with a fixed seed. Once I am happy with it, I might do multiple images without a fixed seed using the final prompt to see if I get something better
If you are using a web interface that only use the base SD models and don’t allow negative prompts, yes, its harder (negative prompts and in particular good, model specific, negative embeddings are an SD superpower.)
Agree with GPT but with StableDiffusion you are only partially correct. Visit /r/StableDiffusion to see the stuff people are making. People share prompt, seed, model settings etc and you can reproduce the exact same thing.
I do agree that it's bad at following the prompt exactly. It will produce most of the things you mentioned in the prompt but not necessarily in the same fashion you asked for. I don't agree that produced images are mostly shitty, just visit that subreddit.
> Agree with GPT but with StableDiffusion you are only partially correct. Visit /r/StableDiffusion to see the stuff people are making. People share prompt, seed, model settings etc and you can reproduce the exact same thing.
This doesn't really say anything, because it's just survivor bias. The entire purpose of that site is to show the successes. Most people get disasters every single day, they just don't upload them to the site. Even if I try the same prompt I will get a disaster image, as long as I don't use e.g., exactly the same random seed they happened to use. This is not even "prompt engineering". It's just outright playing with the dice.
Why do you think it matters if five out of six images are failures? If the sixth is a success, you have your image. The tool has worked. Glancing over a few failed generations is certainly far less effort than making the image from scratch.
Because it's a departure from the more logical and rational computing people have come to expect from "computers". For many people, it's unusual to have "fuzzy computing" become popular again.
You are wrong. Sort that sub by new to get a glimpse of ugly stuff. With SD you can't ask it for "very beautiful image of X" and actually get one. You must fine tune your prompt to get the right asthetics (photo, cinematic, specific artist etc) and also choose a better model. The base model that stable Diffusion released is not very good. Visit https://civitai.com to get a glimpse of how good the models have become.
I madeots of wallpapers in bulk recently with a good prompt and model. None of the images, all via random seeds, was shitty by any measure. Only images I ended up deleting had some usual Ai artifacts I couldn't stand.
Have you actually tried using these tools beyond starting it up and swinging your fist in frustration?
I have thousands upon thousands of beautiful images, each one more inspiring than the other and I did nothing or very close to nothing. You are belittling this amazing tech so much it sounds like you are scared of it. What’s it to you?
Have you ever lived in a time where it was even remotely possible to go from “astronaut on horse” and get even 1 decent result, in seconds?
If you expect to be able to ask it an underspecified question without context and without telling it what role it should take and how it should act, sure, that often fails entirely. It's not a productive use of ChatGPT at all.
If, on the other hand you actually put together a prompt which tells it what you expect, the results are very different.
E.g. I've experimented with "co-writing" specs for small projects with it, and I'll start with a prompt of the type "As a software architect you will read the following spec. If anything is unclear you will ask for clarification. You will also offer suggestions for how to improve. If I've left "TODO" notes in the text you will suggest what to put there." and a lot more steps, but the key element is to 1) tell it what role it should assume - you wouldn't hire someone without telling them what their job is, 2) tell it what you expect in return, and what format you want it in if applicable, 3) if you want it to ask for clarifications, either ask for it and/or tell it to follow a back and forth conversational model instead of dumping a large / full answer on you.
The precise type of prompt you should use will depend greatly on the type of conversation you want to be able to have.
I'm starting to wonder if it's just easier to actually program?
Like I know ChatGPT-4 can generate a bunch of code really quickly, but is coding in python using pretty well known libraries so hard that it wouldn't just be easy to write some code? It's super neat that it can do what it does, but on the other hand, modern editors with language servers are super efficient too.
If I want it to just write code where I know exactly what I want, I will have it write code. ChatGPT can write code and fill in things when you give it something well specified very quickly.
My point was that if you just ask it an ambiguous question it will return something that is a best guess. It's what it does. To get it to act the way the person above want it to, you need to feed it a suitable prompt first.
You don't need to write a new set of instructions every time. When I "co-write" specs with it, I cut and paste a prompt that I'm gradually refining as I see what works, and I get answers that fit the context I want. When I want it to spit out Systemd unit files, I cut and paste a prompt that works for that.
The stuff I'm using it for is stuff where I couldn't possibly produce what it spits out as productively not because it's hard but because even typing well above average typing speed I couldn't possibly type that fast.
It's perfectly fine if it's approaching a Clever Hans type situation as long as it's producing sufficient quality output fast enough that it's producing it faster than I can do manually.
There are many categories of usage for them, and relatively "dumb" completion and boilerplate is still hugely helpful. In fact, probably 3/4 of my use of ChatGPT are uses where I have a pretty good idea what it'll output for a given input and that is why I'm using it, because it saves me writing and adjusting boilerplate that it can produce faster. Most of the time I don't want it to be smart, I want it to reliably do almost the same as it it's done for me before, but adjusted to context in a predictable way (the reason I'll reach for it over e.g. copying something and adapting it manually).
We use far dumber agents all the time and still derive benefits from it. Sure it'd be nice if it gets smarter, but it's already saving me a tremendous amount of time.
Yes, I think using it for code that you could write yourself fairly easily is a sweet spot since you can quickly check it over and are unlikely to be fooled by hallucinations. It can save significant time on typing out boilerplate, refreshing on api calls and type signatures, error handling, and so on.
It’s a save 15 minutes here, 20 minutes there kind of thing that can add up to hours saved over the course of a day.
One of my go-to examples is that I asked ChatGPT to write a DNS server. It didn't get it perfect and needed followup questions, but it 1) got it better than I could (having written a couple in the past) without spending time reading the RFCs, 2) because it's something I have done before, even though I'd have to look up docs to do it again, I could instantly tell which areas it doing the right thing, and where I needed to check specs and adjust details.
But if I didn't know how a DNS server works, on the other hand, it'd have been little help to do it that way because I'd have no idea what.it got right or not, and it's have been far more productive in that case to ask it for a step by step guide of what to do and which portions of the specs to look up.
You have to treat it as a junior:
Either give it tasks you can trivially validate and improve because you know the subject, or ask it to help condense down and explore the search space.
Don't try to get it to complete a task you don't understand how to quickly validate, because you'll waste tremendous amounts of time trying to figure out whether the answer is right.
Most of the people complaining seems to expect it to work well for the latter even when told it's not (yet, anyway) a good use.
> if it doesn't know well a person it simply makes up things
Asking it for factual information about a subject can be a bit hit/miss depending on the subject. Better to use bing chat, because it will use info from the web to inform the response
> if I ask code examples or scripts, most of the time they are wrong, I need to fix them, they contain obsolete syntax etc...
How wrong? More wrong than having a junior or mid level developer contributing code?
Think about it a different way: you just gained an assistant developer that writes mostly correct code in seconds. Big time saver.
Also: if you want it to use a particular code style etc, give it few shot examples.
> if I'm asking a question, I'm expecting being asked for more context if the subject is not clear
Then you need to tell it that in you prompt: "if the subject isn't clear, ask me some clarifying questions. Don't respond with your answer until I have answered your clarifying questions first". Or: "ask me 3 clarifying questions before answering" to force it to "consider" how well it "knows" the subject first.
ChatGPT isn't an AI in the sci fi sense of the word. It's a language model that needs to be prompted the right way to get the results you want. You will get a feel for that the more you use it.
This is disingenuous. OP is right. ChatGPT is mostly inaccurate and contextless by nature.
It always produces something confidently so it's very easy to think it's the right answer.
Now, it can be extremely useful for any task that can be easily verifiable and you don't know the syntax, how to approach something, etc. Because any decent software developer can use it as part of the prototyping or brainstorming and get wherever they want to go in coordination with ChatGPT.
What you can't do is assume it's "the expert". You are the expert and you're the intelligent one and the chat generates potential useful things.
That's on verifiable stuff. On other things it can be so laughably bad it's impressive how much "everyone" (as in articles, hype, HN users who downvote criticism as being from luddites) pushes it as something that is not: an AI that can think and be relied upon and any inacuracy will just "get better with time" aa opposed to "the model and also the politics around it" doesn't have a path to get to that idealisation that is being sold.
It's a tool, but it's usually sold as better than it is, like in this case, presumably with the intent of relying upon it to save cost in some key integration point. The problem is that it mostly won't work and comparing it to bad humans or worse integrations doesn't show the fundamental low ceiling that it has.
I think people with authoritarian mindsets (and I don't mind Left or right but inherent trust in authority) easily want this to be a source of truth they can magically use, but there's no path for that to be true. Just to appear true.
> How wrong? More wrong than having a junior or mid level developer contributing code?
Yes. The failure mode of a human and chatgpt are nothing alike — I am far more experienced in spotting beginner mistakes in code reviews, then seemingly good, but actually illogical bullshit code generated by LLMs.
I have never had it produce non-trivial, novel code that was correct, so I mostly use it as a search engine instead.
I can only conclude that both the junior and senior were inappropriate hires if people really think the output of an LLM is anything approaching that of an appropriate human being. It's saying a lot about where people work, IMO.
Either that or the problems they're solving are nothing boilerplate couldn't handle.
And even if ChatGPT is always 100% wrong with code, I still failed to see how it is relevant to this particular article.
The article compares verified responses on r/AskDocs (yeah, a subreddit) and those from ChatGPT. That's it. How is its coding compatibility even remotely relevant? It's like saying "Excel is bad in editing photos, so it must be a bad spreadsheet software as well."
Indeed. I don't understand why these types of comments get so many interactions. It makes me think they are relevant but on second inspection they are mostly unrelated grievances.
I always tell people that if they want learn more about the hype to just try and use it to do actual work. Almost no one ever does, but when they do it becomes almost immediately clear how limited it is and what it is and isn’t good for.
It's good at writing. It's not good at knowing facts. If you give it all the relevant facts and ask it to do a writeup in a certain style it does a better first draft than a typical human, and in a lot less time.
If you ask it what the facts are, it just gives you a load of nonsense.
Even this is not always the case. I gave it a very rough draft and it actually made my structure worse, while at the same time using imprecise language. It looked like the correct style, but the content was not salvageable.
I use it almost every day for work. It has mostly replaced Google for me. A lot more convenient. Now whenever I use Google it's more or less just to look up the address of a specific site.
Well, google has become utterly bad at its job — I fail to find sites I remember verbatim quotes from, so expert google-usage is no longer a possibility. It will gladly leave out any of your important keywords, even if you add quotes around it, absolutely useless. Sure, the average person will search for “how old is X” not “x age”, but for more complex queries the first form is not a good fit.
That said, I can’t really use ChatGPT as a search engine, but I did plug it into a self-hosted telegram bot and I do ask it some basic questions from time to time - telegram is a good UI for it.
Yeah, I agree. Google did a lot of work to make ChatGPT useful. It's clearly worse than it used to be.
I usually need someone to explain something to me. And I used Google before to land on a site where I could find the explanation (e.g. how to use a library). ChatGPT can explain most things I need and I can skip Google and the other sites. But it's indeed not a search engine, if you need factual information then your best bet is to find the documentations, articles, databases, etc.
Given how much it hallucinates, that's one very scary echo chamber to be in where you trust info reinterpreted by an algorithm instead of just reading it yourself. Yikes.
It rarely hallucinates for me. But you need to know what it's capable of and how to use it to work effectively with it. You can't use it well if you think it's an all knowing sentient AI.
How would you know how frequently it hallucinates if it has "mostly replaced Google" for you? Are you fact-checking all your queries? This is a very strange self-inflicted echo chamber.
Because the code it generates is working? The recipes it gave to me are also delicious and I don't even have to read someone's life story on a blog before getting to the recipe.
Not really sure why you are so fixated on the echo chamber thing. We are on the internet! The biggest echo chamber humanity has ever built.
Functional or delicious ≠ accurate to the source material.
Because it's a layer of abstraction, mate. One known to get things wrong, because it's an LLM. If I write a post about Richard Stallman's opinions on paedophilia and Jeffrey Epstein (https://en.wikipedia.org/wiki/Richard_Stallman#Controversies), and it incorrectly tells you that Stallman associated directly with Epstein, or is a paedophile himself, that would not be accurate to the source.
At least with a Google search result you can go more directly to the source. If the scientific method is getting to the truth, why on Earth would you put an obstacle in front of it?
If someone tells me x, am I going to believe them? No. So, why would I believe an LLM if it isn't presenting sources to me that are 1:1 in accuracy to the information it presents to me?
Yup, and it hallucinates plenty with dev too, even over basic stuff like an NGINX config.
Given that it hallucinates in particular over measurements/specs/stats, I'd be extremely sceptical of taking a recipe from it, whether that's generated and original or coming from a known source.
Baking requires very specific measurements, the slightest mistake and it won't turn out well in most cases. Again, why go via an LLM and not a search engine to the actual source? It makes zero sense, especially if it only returns text and you can't see what the recipe produces if it's an existing thing.
> Again, why go via an LLM and not a search engine to the actual source?
I believe the argument presented (possibly in a separate thread) was that search engines have degraded to the point where what they show you is worse than LLM output.
I tried phind after seeing it linked here a few times. It felt like it was a slower Google search with extra steps. The sources it used for the answers were the sources I would have found by searching "site:stackoverflow.com [my question]". It did distill the information decently well, but I'm skeptical it properly pulls in the context that the comment replies to the question/answers provides
Even just typing what I had it write for me, and a typing speed well above average, would take many times as long, and I've done enough sites over the years to know I would not keep up that kind of typing speed, because I'd need to check things and go back and forth.
Put another way: People who don't pick up these tools and learn how to be effective at them will increasingly be at a significant performance disadvantage from people at their skill level who do pick them up.
You know what it wrote for me then? No, it's not. It filled in custom logic per specifications.
But even if it was just boilerplate, I'd have had to apply it, and that takes time to do. I've started dozens of projects over the 28 years in this industry - I have a very good idea of how long it takes both me and a typical developer to do what I've had it do, and it's far faster.
And no, I'm not trusting it to be "secure or sensible" at all, no more than I trust a developer. But overall the quality of what it has produced all of which I've reviewed the same way I would with code delivered by a developer, has overall been of good quality. That does not mean free of bugs, any more than human developers write flawless code on first try, but it does mean I've had it write cleaner code than a whole lot of people I've worked with who'd take many times as long and cost a hell of a lot more.
Yeah, IDK, of all the things an AI can do for us, code generation seems to be the one I'm actually least interested in. It's anecdotes like this that sort reinforce that feeling.
I would much rather have a "rubber ducky" that I can try and explain my thought process to, and then it can try to question me and poke holes in my thinking. I think my expectations on AI are pretty realistic and I don't really expect it to ever be ever *thinking* in the way I associate with the word, not with today's SoA at least. In that respect I'm just not particularly interested in it generating code, but that also may come down to our individual preferences for how we write code.
At any rate, my issue is that it fails at the "rubber ducky" position I mentioned earlier. It's not really able to follow a train of thought that I have, in a reasonably competent way, and every time I have tried to do work with it I just end up feeling silly for anthropomorphizing something that I know isn't really, even if for a second. Just my $0.02 though, I'm glad so many people seem to like it and am happy for them.
This is a bit like picking up a hammer and complaining it works poorly as a screwdriver when a screwdriver is what you want. Sure, if you want it to do something it's not suited for, then don't use it.
Maybe once I'm done actually using it to speed up the project I'm working on. There's no benefit to me in slowing down to put effort into trying to convince you of something you can easily figure out for yourself if you actually wanted to.
Specify the work that you know it excels at first-hand, then list projects/tasks you have completed using it.
How do you know if they don't? Do you expect them to report back to a throwaway comment on HN? The sentiment is shifting on HN, that much is clear, just like it shifted with crypto.
If you tell it it's wrong, it often comes up with a better answer on the second try.
It seems like you could maybe automate that. Let it spit out its first draft of an answer, have the framework tell it "please correct the errors" and then let it have another go and only present the second attempt to the user.
the point is: my job is not to train a deep learning model, my job is to write code. If I already know the answer, I'm not asking the question. If I can recognise something is wrong, it means I should have done that task by myself.
Can it do certain things faster than me? Sure! Can it do them correctly? Most of the time it can't and if I have to spend my time to check every output, the time I saved is already wasted again.
The whole point of computing is to have it do tasks for us and here you are advocating that we should do tasks by ourselves. I hate everything about this.
But if you always tell it it's wrong, it will sometimes come up with worse answers on the second try. Which means you still need to know whether the first answer is correct or not (or neither of them)
I'm reminded particularly of a screenshot of someone gaslighting ChatGPT into repeatedly apologising and providing different suggestions for Neo's favourite pizza topping, despite it answering correctly that the Matrix did not specify his favourite pizza topping first time round, but it applies equally to non-ridiculous questions
The idea isn't that you tell it to simply change what it wrote on the first try. The idea is that having a first draft to work with allows it to rewrite a better version.
What if someone refers to x part of their body incorrectly? What if they're not able to think clearly due to y reasons and tell it something completely wrong?
An LLM is wholly inappropriate.
As for its "bedside manner"/how polite or friendly it is, that's meaningless if it isn't good at what it does. Some of the best docs/profs I've known have been very detached and seemingly unfriendly, but I'll be damned if they weren't great at what they do. Give me the stern and grumpy doc that knows what they're doing over the cocksure LLM that can't reason.
That's assuming it doesn't just enter a fail state and keep providing the same answer again and again and again, despite explaining what about the answer is wrong.
I’ve only asked GPT4 a few niche questions on subjects of interest to me, so I can’t really judge it yet. But so far its answers can’t compete with Wikipedia. However, it seems good at drilling down and doing followup questions that build on the prior questions, which is interesting. I can see that natural language give-and-take back-and-forth being useful for things like early education, diagnosing non-emergency patients, troubleshooting home PC problems with non-computerphiles, etc.
I use it at work and it clearly has strengths and weaknesses. My two use cases are initial research and generating prototype code.
I find it very helpful to ask a series of questions and see a number of examples to get a primer on what to expect with something. The main benefit over Google or going straight to the docs is I can start with my specific requirements. I then dig into the documentation to deepen my understanding. I can typically move forward with ChatGPT generating some code as a starting point.
It can be incorrect or out of date but combined with my experience I find myself being more productive with it.
A weakness I see is complex code requirements. It knows what it knows.
I note that you seem a little frustrated with vague or incorrect responses. It helps to tell ChatGPT the role it should play. It helps as well to instruct it to ask questions of you to improve the response. Personally I prefer to tell it keep its answers brief, I get less walls of text and I can narrow in on the specific answer I am after more quickly.
Their first example of a good ChatGPT answer - about bleach in the eye - feels like copypasted SEOified liability-proof WebMD copy. Every medical site has that crap and it’s useless once you have a moderately difficult question.
N.B. as well: If someone thinks they have bleach in their _eye_ and can still open their eyes enough to write a Reddit post, much less read through ChatGPT’s extremely long answer, they’re almost certainly fine.
I did not read this particular article, I was explaining that from my own experience, all these articles telling how great ChatGPT is seem to be made up because my experience (and from what I read I'm not alone) is completely opposite. Maybe it's not able to solve the type of questions I ask? Fine. But it's not how ChatGPT is presented most of the times.
It costs you zero dollar to not post an irrelavant comment then. I really wonder how you justify this "I didn't read the article, but I have a very, very strong opinion (straight up calling it a made-up) on it" behavior. The internet is rotting people's brains I guess.
I've now read the article and I'm not changing my opinion. ChatGPT can do some things very well and people tend to hype those things and claim it's better than humans rather than recognising its limits. And again, you are still missing the point of my comment despite having read it, so I'm out of patience. Maybe try to ask ChatGPT to explain what I meant :)
To be fair, the opinion is thoroughly justified by the article, which might have been more honestly titled Physicians' Reddit comments shorter than ChatGPT responses; relative accuracy unknown...
And unsurprisingly, the average 52 word Reddit comments [isolated from the context of other comments] didn't provide very much information compared with a much more verbose chatbot. The relevance of the ChatGPT response to the actual patient condition remains unknown.
This is relevant to the real world of primary care only if your sole access to a medical professional is Reddit...
And it costs zero to you to ignore my comment especially if you don't understand it. Other people seem to have understood what I meant and posted constructive responses, you didn't, but honestly it's not my problem.
It quite literally writes whatever sounds about right. Which is certainly very impressive if you happen to assess by exactly the same metric... It's more artificial overconfidence than artificial intelligence
My experience is completely different. I have successfully used GPT-4 to:
- write a contract for the sale of my motorcycle: put all details, names and numbers with labels on a spreadsheet, paste on the chat and ask for a contract, then edit.
- learn french: I told gpt "when I write wrong stuff in french, always let me know and teach me the correct ways". Then, after a few weeks I asked for a .csv with the stuff that he corrected so I could import into Anki, which actually worked.
- coding on a daily basis: I am learning Rust on my new job, so I ask it things all the time, it helps me a lot.
Your experience is not different from OPs. You're just ok with the mistakes or are unaware of them because you don't know how to judge them.
I both use ChatGPT to boost productivity but also see the amount of mistakes it makes and will keep making and am surprised at the extreme denial of anyone who tries to shut down criticism of the wrong type of hype (the one that sells something that is not there)
Oh but it is: I find it useful. For example, I would not pay a lawyer for that contract, so having it draft me a mediocre contract is still better than having no contract.
All you need for a bill of sale is a simple sentence saying on x date I xxx, sale vehicle xxx, with vin number xxx to xxx person. Then write the driver's license of both parties and sign. A lawyer is extreme overkill for such a simple transaction.
By the same logic, a hallucinating LLM is also overkill versus just doing the simple task yourself and not needlessly adding risk to it.
The point still remains: let's see what the LLM delivered that the user actually used. Either it's legally binding and an appropriate use, or it's not fit-for-purpose.
Equally, why not an interactive form using conditional logic? No hallucination possible. Much more simple and reliable.
> All you need for a bill of sale is a simple sentence saying on x date I xxx, sale vehicle xxx, with vin number xxx to xxx person.
If you know this, you don't need GPT.
If you don't know this, you don't have a way to assess GPT's attempts at a contract. A bill of sale is indeed simple, but there's a lot of more subtle legal issues someone might run into in life.
Good to know it's able to do some useful stuff. In my case I mostly ask Python related questions, because it's the one I know better so I can check if the answer is right or wrong. I will try with different languages, but I will be less capable of knowing if I got a good answer or not. It may take more time, but I find the combination of Google + Stack Overflow more accurate than asking ChatGPT
I've seen a lot of positivity on the output of ChatGPT for coding tasks in my workplace. And it does seem to have some use in that area. But there is just no way in hell it's replacing a human in its current state.
If you ask it for boilerplate or for something that's a basic combination of things its seen before, it can give you something decent, possibly even useable as-is. But as soon as you step into more novel territory, forget it.
There was one case where I wanted it to add an async method to an interface as a way of seeing if it "understood" the limitations of covariant type parameters in C# with regards to Task<T>. It did not. I replied explaining the issue and it actually did come back with a solution, but it wasn't a good solution. I told it very specifically that I wanted it to instead create a second interface for holding the async method. It did that but made the original mistake despite my message about covariance still being within the context fed back in for generating this response. I corrected it again, but the output from that ended up being so stupid I stopped trying.
And at no point was it actually doing something that's very important when given tasks that are not precisely specified: ask me questions back. This seems equally likely to be a problem for one of these language models replacing a doctor. It doesn't request more context to better answer questions so the only way to know it needs more is if you already know enough to be able to recognize that the output doesn't make sense. It basically ends up working like a search engine that can't actually give you sources.
Right? Like I had a batshit insane conversation about song lyrics the other night, where the chatbot repeatedly generated patently incorrect responses - close enough to seem reasonable, but utterly incorrect, mistakes that I can’t imagine a human making, just straight up false statements that didn’t hold up to the slightest scrutiny. Incredibly frustrating. Imagine having that kind of experience with a medical professional, when you’re sick and impatient to receive care. awful.
Yes. It is inappropriate for most things. It's an LLM. It predicts the next word. People are throwing it at all kinds of problems that are not only inappropriate, but their ability to assess the quality of its output is questionable.
E.g., lots of HN users claim to use it for dev or learning new programming languages. Given the frequency of hallucination and their Dunning-Kruger complexes in full-effect, they don't know when it's teaching bad information or functions that don't exist.
To add to this, I tried to use it professionally but the answers were too general and generic. I suppose it has been prompted to put things simple, which prohibits it from saying meaningful things about certain topics. It did give one or two useful references, though.
For us to get a better understanding of how well this tech works I suggest ChatGPT becomes integrated in HN in this way: it generates 1 response per comment; the responses written by the AI are clearly marked as such (e.g. different color); the user can turn them off; these comments can be up/down voted and the votes can be seen by any user; of course users can reply to the generated comments.
I don't know about chatGPT, but dealing with doctors has been incredibly frustrating as someone with symptoms with no immediately visible direct cause.
One reputed physician completely refused to acknowledge that a specific medication might be causing some of my side effects, even when I shared links to peer reviewed studies from reputed universities (including his alma mater) that specifically talk about side effects from the medication.
My success rate with doctors is about 20% at this point. I've stopped visiting them for most ailments, and if I ever do get any prescriptions, I make sure to research them thoroughly. The number of doctors who will casually prescribe heavy duty drugs for common ailments is unreal.
This is exactly it. Doctors have neither any incentive to actually treat nontrivial cases, nor any accountability over whether they do.
Combined with the prestige of the profession, many turn into egoistic know-it-alls whose real competence is equivalent to a car mechanic who tells you you're just driving your car wrong when you come in with anything that's not obvious from a 15s inspection because they get paid anyway.
I would be surprised if most of the doctors I've seen could outperform even GPT-2.
For every one person misdiagnosed there are probably 10 that think they have cancer because of webMD. People are dumb, and I bet every anecdote you have about how your doctor was wrong they have 50 about patients who refused to do basic treatment courses that would radically improve their lives. But it's ultimately true: you have to be somewhat knowledgeable and responsible for your own health. Always get a second opinion if you're unsure. But this know it all syndrome also applies to you. Just because you can "do your own research" on google doesn't mean you know it all either.
Even so doctors are extremely overworked now and insurance companies don't want to pay their fees. So now they're running from patient to patient, unable to even pay attention to them. That's even if you get a doctor now before the multiple nurse practitioners or physician's assistants who also don't know anything.
I get that, but that doesn’t explain why they’re always so overeager to prescribe hardcore drugs or over-operate routine problems. The role of doctors in the entire opiates crisis is pretty damning.
I’ve had a couple of surgeries and the eagerness of doctors to give me opiates for pain relief was baffling, even when I clearly told them that I can tolerate the pain and don’t need anything stronger than ibuprofen.
It's a lot harder to overprescribe addictive drugs now. Much much harder. But your point remains. The problem wasn't so much painkillers after a major back surgery - it was prescribing to treat chronic pain - which, in hindsight, is an obvious way to get people addicted. What isn't told is how insurance would only cover pills, not more expensive therapies (to deal with the underlying issues.) Now you have a patient who's in some pretty serious pain and you have an FDA approved pill to treat it. In a country that still has TV ads for medicine, the outcome isn't that surprising. It's also why opioid addiction is a strictly american phenomenon.
Working as a surgery resident and now in IR, I can tell you it’s much better to be a little overprescritvie in addressing post-op pain than to get behind and underdose.
Also opiates in a short term setting are good meds. Pain control is good and people are able to get moving faster.
The problem is there's simply not enough doctors to go around and cost. The schooling is enormous and the costs to hospitals/insurance companies is too. Hence, more PAs, more nurse practitioners, less doctor time and that doctor is rushed from patient to patient.
There are good doctors out there but there are a lot of bad. I always advocate people if they're unsure to get a second opinion. Just like you would if a plumber said "you need to replace the whole system." If it doesn't seem right or you don't feel like you got the proper attention, go somewhere else and see.
Medicine isn't an exact science for much of us. It's a lucky thing when it's a simple infection that antibiotics can cure. Most of our problems aren't so easy. Just be slightly skeptical. Don't go "Fruit will cure my pancreatic cancer" crazy either.
This should be rated higher, possibly even over the reveal that this study is comparing to Reddit answers.
https://xkcd.com/937/ comes to mind. It's not implausible at all to me that ChatGPT could outperform Reddit in detail/manners for health advice (and honestly, even for actual doctors I've heard some horror stories about bedside manner and refusing to actually believe/consider symptoms), but if the study isn't actually checking that, if they're just checking if the chatbot was more polite/empathetic... that's a huge qualification that should be up-front and center.
Several years ago, I worked on the technical privacy and security design and architecture for systems that used ML for prescriptions and some diagnostic triage. At the time, I thought this was too big an ethical issue to be pushed down to us technologists to solve (as are many in government), so I got in touch with an eminent philosopher about the question of whether a person was recieving honest and ethical care from ML and from whom? (15 years ago now).
His response to my (limited and naive) question was essentially, people will hold attachments to beliefs about the human element in these transactions, and the philosophical part of the question was why people would hold beliefs that were not strictly rational, and he sort of declined the implied ethical questions. There was no reason to expect him to respond with more, but given I was navigating the ethics of AI driven medical care via questions of privacy in system design decisions (read: accountability) for the institutions who would use it and the millions of people subject to it, it seemed like an opportunity to be at the very forefront of what would likely become a defining social issue of our lives in a couple of short decades.
What we discovered then as architects, as most people are just about to, is that the main use case for ML/AI will be to use complexity to diffuse accountability away from individuals, and aggregate it up into committees and ultimately corporate entities that are themselves artificial beings. AI is the corporeal manifestation of an institution, essentially a golem.
ChatGPT has been helpful for fairly in-depth research, as well as answering questions I don’t want to bother my oncologist with.
Actually, I prefer it to be less empathetic and more robotic with its answers. When it tries to be empathetic it goes beyond what I’ve asked it and sprinkles on typical safety filter bs.
In contrast my oncologist could be a bit more empathetic. This person directs your care, and if they do so in a way that seems like a black box it can be rather scary. They know enough about me, with all of their experience, to be a “helpful human” in a difficult situation.
Are you confident that its responses are accurate? I’ve had it just make up facts or tack made-up facts onto actual facts. It will also fabricate references to research papers.
I do. Multiple times I've asked it to summarise plotlines for TV shows and when it doesn't know (if for example the show isn't super popular and there isn't a lot of discussion about it on the internet), it'll just make stuff up outright and give me something that sounds absolutely reasonable based on the name of the show but is actually entirely hallucinated.
Similarly, today I saw this [0] comment and in an attempt to help the user went to ChatGPT.
Prompt:
> User
I remember in the 2000s, there was some site using a decimal IP address (as a single number, not dotted quad) that had hacking/crypto puzzles. Something with a "Alice in Wonderland" theme. Does that ring a bell for you?
Response:
> I'm sorry, but as an AI language model, I don't have personal experiences or memories. However, based on your description, it's possible that you might be referring to the website known as "Project Wonderland." It was a website that featured a series of cryptographic puzzles based on the story of Alice in Wonderland, and it used a decimal IP address (203.208.46.147) as its URL. The website was active in the early 2000s and gained a cult following among enthusiasts of cryptography and puzzle-solving. However, the website is no longer active today.
I got really excited to have found an answer until through Google and the Wayback Machine I realised that ChatGPT just made this all up and no such website existed at that URL.
I tried starting a new chat with ChatGPT to ask it about this "Project Wonderland" website and it had no idea what I was talking about.
The important bit of context - that ChatGPT completely missed - is that the address was a single number like e.g. http://3520653040
The server might even have refused connection if the HTTP "Host: " header wasn't in that format, but as a human, rather than a language model, I'm not sure about that and might be misremembering :)
GPT 4 responds with cicada 3301 which as best asi can tell is a valid solve for your query.
* 3301 is one of three numbers that had to be added to get the .com url
* The Wikipedia page cites someone close to the group with "follow the white rabbit"
* Years don't quite match up but given that you only asked if it rang a bell, that is fair enough
I have a lot of examples.
I just tried it out.
I asked
Who killed JR?
It correctly replied about JR in Dallas.
But then I said
No, I'm talking about JR in babylon 5
The response was
I apologize for the confusion. "JR" in Babylon 5 refers to the character John J. Sheridan, who was the captain of the titular space station during the later seasons of the show.
Sheridan was never refereed as JR in Babylon 5. There was no JR in Babylon 5. But because I asked, ChatGPT happily made up things.
I can. I've always struggled with the difference between polyptoton and antanaclasis. (Lucky for me, it doesn't come up very often!) I like what ChatGPT can do, and I've had some good luck with it, so I asked.
What I got back was confidently incorrect, and gave me examples that were neither of the two. (Interestingly, most of the examples were chiasmus instead.) I tried correcting it to get better responses, but it kept apologizing and then giving me the incorrect examples back.
This definitely sounds impressive, but could it be that you "learned" how to describe the symptoms over time in a way that makes it easier to arrive at the correct diagnosis?
It could also be that knowing the correct diagnosis changes your description to highlight things in a way that suggests the correct outcome. I believe doctors are also susceptible to that effect.
I'm not trying to imply that doctors should not do a better job diagnosing. People should not have to "learn" to find the right doctor, or how to operate them. It's a crying shame that people like you have to go on years-long journeys to get correct help, and I'm sorry that you and your wife had to go through that.
Just saying it might be slightly more apples-to-apples to compare ChatGPT's performance to the last couple of physicians you saw, and not the whole lot of them. But again, that's still a very favorable comparison for the non-human.
Whatever is the case today, within a few years the automated systems will clearly outperform doctors on most cases. Today we have systems that can hallucinate randomly. Since most of it is a black box, we cannot tell when it is hallucinating. This is solvable and just a matter of time.
Today we do not have feedback to the automated systems. If we start doing following on a large scale Measure->treatment->measure again->adapt treatment, then the system will learn and will make connections that no doctor has ever made, because the minds of all doctors are not interconnected, at least not in a structural manner.
You can chat messages on WhatsApp, SMS, and now a ChatGPT plugin to query a vector database of health information (Pinecone) and respond with GPT-4 for higher quality results than default ChatGPT. The chatbot then prompts the person messaging to verify results with a real doctor and offers to connect them to a Doctor that I'm working with.
I was doing fieldwork and would get sick in foreign countries where I didn't know the language. My friends would drag my food-poisoned body to a hospital where they inevitably would try to give me prescriptions that I wasn't familiar with and wanted to double check. I wanted to build for myself a WhatsApp bot that I could text while I travel to verify health information in low-bandwidth internet situations.
I shared it with some friends, and they shared the WhatsApp contact with other friends and family, and now it's being used by people in about 10 countries around the world, in several languages.
Would love any feedback if you try it out! The phone number for SMS/WhatsApp is +1 402.751.9396
Or link to WhatsApp if easier: https://wa.link/levyx9
I am sure you have thought about this, but curious how you are handling safeguards for crises that might require people to intervene, like rare conditions that nonetheless require medical attention or mental health problems that pose imminent risk of self harm.
It's a really serious issue, and we've tested many of those types of questions/messages, but you can test for yourself also if you want. We state that it definitely shouldn't be used for emergency situations, and the chatbot tries to provide medical information and recommend anything else to a healthcare professional.
But also it can be dangerous when you don't have access to medical information. The first friend that started testing the WhatsApp bot lives in the Sinai desert in Egypt where it's really hard to get to a clinic to ask questions. It's kind of similar in rural Nebraska where I grew up. We're taking things one step at a time and trying to provide the best services that we're able.
Our API if you want to test out the ChatGPT plugin that I built for yourself is here: https://chatgpt-api.doctor-chat.com/ -- but maximum 15 people can test.
I suffer from a yet unknown chronic illness and as everyone in my predicament knows, I have seen a staggering amount of medical professionals who ranged from arrogant jerks who didn't listen not take you seriously to highly empathic and thorough people (who still failed to figure out what was wrong) and passing by overzealous specialists who sent me down completely wrong paths with copious amounts of anxiety. Last month I felt particularly low and desperate and decided to dump everything (medical history, doctor notes, irregular lab results, symptoms) into GPT4 and ask for help (did a bit of prompt tuning to get professional level responses).
It was mind blowing: it identified 2 possible explanation that were already on my radar, 3 more that I had never considered of which one seems very likely and I am currently getting tested for, explained how each of those correlated with my symptoms and medical history, and asked why I had not had a specific marker tested (HLA b27) that is commonly checked for this type of disease (and indeed, my doctor was equally stumped - he just thought that test had been done already and didn't double-check).
Bonus: I asked if the specific marker could be inferred from whole genome sequencing data (had my genome sequenced last year). He told me which tool I could use, helped me align my sequencing data to the correct reference genome expected by that tool, gave me step by step instructions on how to prepare the data and tool, and I'm now waiting for results of the last step (NGS data analysis is veery slow).
One of the defining characteristics of LLMs is that their knowledge bases are shockingly wide. Anyone who has played with GPT (or similar) has experienced that the width of its knowledge is beyond most if not all humans.
Medicine, and in particular diagnosis, is particularly difficult due to the width of knowledge required due to the span of possible diseases.
It completely makes sense that GPT, or similar, would simply be better than doctors at diagnosis in time, and it is very plausible that the time is now.
This is fantastic news for humanity. Humanity gets better diagnosis and we don't put high IQ people thought a grinder that is medical school and residency to do a job which is not suited to human cognition.
This is not yet a victory, but progress is already good. To really trust AI in medicine, we need to conduct an incredible amount of research and tests. I don’t think that today you can blindly believe its diagnoses.
Sure. Went like this (I have seen so many doctors that I already wrote down my complete history in french to save time at new visits. Didn't bother translating the history, didn't seem to phase ChatGPT):
"""
Here is the patient history (in french):
• 2000-2010:
◦ ...
• 2013: ...
...
Additional Notes:
- <Some additional observations and comments - patterns I noticed, family history, etc.>
Patient is particularly worried about xxx and yyy. What are some possible causes explaining these symptoms and the overall history? Give detailed reasoning to support your hypotheses, include differential diagnosis, think about rare diseases (common causes have already been considered), consider possible combinations of diseases, and don't hesitate to ask follow-up questions to improve diagnosis. Please answer in English.
Additional test results that are outside of normal ranges:
- <list of abnormal results>
Try to consider and explain as many of the blood tests in your work up, and ask for any missing information if necessary.
"""
The key was to address it as if I was a doctor asking for an opinion. If I asked it "as a patient", I got much lower quality and dumbed down answers.
> arrogant jerks who didn't listen [and would] not take you seriously
Pride goeth before the fall. I wonder how many arrogant jerks will be humbled to see that they too are now inferior to a computer ("soon"). Humans will always be better at being human though, perhaps they will learn empathy is more important than they thought.
Humans will be better at being humans, for sure, but the jury is still out as to if humans prefer to interact with other humans given a sufficiently reliable alternative. Hell is other people, after all.
> Humans will always be better at being human though
What do you mean by that? If you mean humans will be always the best being the creature we call human then that goes by definition.
If you mean humans will be always more compassionate/emotionaly understanding/better suited to deliver bad news then I am afraid that is unsubstantiated.
I meant that a human is valuable just because they are a human; this is not cold truth, but it is a moral value almost everyone shares (and I don't want to imagine a future without this value). In days past some have become arrogant because, let's say, they know more about health and medicine than everyone else; it's their source of self-worth. They may soon have to reassess their value and values.
What you are calling empathy is just patterns of language statistically optimized to be convincing to the average person. And I might sound arrogant when I despair at being surrounded by morons who "buy it", but that is IMO still better than being a sociopath who enjoys it when others are easy to manipulate with pretty words.
I personally feel you're being overly optimistic. Genes, markers and all of that might seem like high tech medicine, but to the extent that I know there has not been much progress on that front, although it's been hyped a lot, and gets a lot of media coverage.
I don't think it has to be about being "high tech medicine".
In this case, the existing documentation of such things combined with the events in the GP's own medical history have been fed into a machine that can identify patterns that a human doctor should have, but for whatever reason has not, identified.
I think the potential ramifications for this are huge.
5 to 10 percent in the general population. If you already have many of the symptoms, but those symptoms are not specific enough to distinguish between similar autoimmune diseases, then the prior is widely different so a test like this becomes much more relevant.
When you say “He told me which tool I could use”, did you mean the doctor told you, or was that a typo and you meant that ChatGPT told you which tool and walked you through it? Seemed like the latter, but was too ambiguous to assume.
Yes, it. What it said. In my defense, my native language has no neutral nouns, and neural networks and language models are both masculine - so they're a "he".
Brainstorming rare diseases and making diagnosis and providing treatment using medical science are different things.
If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.
Proof based higher math and being good at calculating the answers to arithmetical formulas are two pretty unrelated things that just happen to both be called "math".
One of my better math professors in a very good pure math undergraduate program added 7 + 9 and got 15 during a lecture, that really doesn't say anything about his ability as a mathematician though.
That’s sorta my point: diagnosing well studied diseases and providing precise treatment is different from speculating causes for rare diseases.
Who knows, OP could be a paint sniffer and that’s their root issue. Brainstorming these things requires creativity and even hallucination. But that’s not what doctors do.
I thought all math was similar due to the ability to work with it requiring decent working memory. Both mental math and conceptually complex items from theory require excellent working memory, which is a function of IQ
Does it though? When allowing LLMs to use their outputs as a form of state they can very much succeed up to 14 digits with > 99.9% accuracy, and it goes up to 18 without deteriorating significantly [1].
That really isn't a good argument because you are asking it to do one-shot something that 99.999% of humans can't.
Try asking it to combine some simple formulas involving unit conversions. It does not do math. You can ask it questions that let it complete patterns more easily.
It does not have to do math in one shot, and neither can humans. The model needs only to decompose the problem to subcomponents and solve those. If it can do so recursively via the agents approach then by all means it can do it.
The cited paper covers this to some extend. Instead of asking the LLMs to do multiplication of large integers directly, they ask the LLM to break the task into 3-digit numbers, do the multiplications, add the carries, and then sum everything up. It does quite well.
When I ask a human to do 13 digit addition, 99.999% of them will do the addition in steps, and almost nobody will immediately blurt out an answer that is also correct without doing intermediate steps in their head. Addition requires carries, and we start from least to most significant and calculate with the carries. That is what 1-shot refers to.
If allow LLMs to do the same instead of producing the output in a single textual response, then they will do just fine according to the cited paper.
Average humans can do multiplication in 1 step for small numbers because they have memorized the tables. So can LLMs. Humans need multiple steps for addition, and so do LLMs.
Ok. In the context of AI, 1-shot generally means that the system was trained only on 1 example (or few examples).
Regarding of the number of steps it takes an LLM to get the right answer: isn't it more important that it gets the right answer, since LLMs are faster than humans anyway?
I am well aware what it means, and I used 1-shot for the same reason we humans say I gave it "a shot", meaning attempt.
LLMs get the right answer and do so faster than humans. The only real limitation here is the back and forth because of the chat interface and implementation. Ultimately, it all boils down to giving prompts that achieve the same thing as shown in the paper.
Furthermore, this is a weird boundary/goal-post humans get stuff wrong all the time, and we created tools to make our lives easier, if we let LLMs use tools, they do even better.
If a search engine result says water is wet, they’ll tell you about it.
If not, then we should consider all the issues around water and wetness, but note that water is a great candidate for wetting things, though it is important to remember that it has severe limitations with respect to wetting things, and, at all costs some other alternatives should be considered, including list of paragraphs about tangential buzzwords such as buckets and watering cans go here.
Why does this apply for math but not for being a doctor?? It can do basic math, but you say that of course it can't do math- math isn't language. The fact that it can do some basic diagnosis does not mean it's good at doctor things or even that its better than webmd.
Arithmetic requires a step-by-step execution of an algorithm. LLMs don't do that implicitly. What they do is vector adjacency search in absurdly high-dimensional space. This makes them good at giving you things related to what you wrote. But it's the opposite of executing arbitrary algorithms.
Or, look at it this way: the LLM doesn't have a "voice in its head" in any form other than a back-and-forth with you. If I gave you any arithmetic problem less trivial than the times table, you won't suddenly come up with the right answer - you'll do some sequence of steps in your head. If you let an LLM voice the steps, it gets better at procedural tasks too.
Despite the article, I don’t think it would be a good doctor.
I read a report of a doctor who tried it on his case files from the ER (I’m sure it was here in HN) It called some of the cases correctly, missed a few others, and would have killed one woman. I’m sure it has its place, but use a real doctor if your symptoms are in any way concerning.
> If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.
The only reason failing at basic arithmetic indicates something when discussing a human is because you can reasonably expect any human to be first taught arithmetic in school. Otherwise, those things are hardly related. Now, LLMs don't go to school.
Most humans fail at doing simple arithmetic in their head. At the very least I'd say GPT4 is superior to 99% of people at mental math. And because it can explain its work step by step it's easy to find where the flaw in its reasoning is and fix it. GPT-4 is capable of self-correction with the right prompts in my experience.
Literally the majority of the page is basic arithmetic, mostly Bayes. Diagnosis is a process of determining (sometimes quantitative, sometimes qualitative) the relative incidences of different diseases and all the possible ways they can present. Could this be X rare virus, or is it Y common virus presenting atypically?
Given that doctors are basically inefficient data analysts focused on a single domain, I imagine GPT can replace most of the need to consult a doctor until some physical action needs to be taken. I think an AI that monitors daily vitals and symptoms and reports to you anything that seems alarming might help people live longer and more healthy.
But that marker is not really indicative both ways (you can have it without the marker and be healthy with it) - find a good rheumatologist who usually sends you to a good radiologist and have some MRTs that identify it certain and quickly.
>you can have it without the marker and be healthy with it
That is literally true of every marker in existence. It's not the most specific marker, no, but if you already have a strong prior for presence of autoimmune disease, then the presence or absence of that HLA subtype can point towards most likely root cause (autoimmune diseases are all incredibly similar in the early stages).
One thing you really notice when interacting with ChatGPT for some time: It doesn’t get tired of your sh*t.
The thing feels human, until you ask it for more options 10 times in a row. Or ask it for a more concise version 5 times. Then it shows: It just doesn’t get tired of your S.
Here in Canada I would 100% rather deal with ChatGPT than a doctor, so long as ChatGPT also controlled the keys to the gates of the medical world (i.e. ability to make referrals).
Haha, I know, I was feeling the same way sometimes in other countries where I traveled. I was working on this problem for referrals! I built a chatbot on whatsapp that right now connects to a real doctor to recommend to local clinics if someone doesn't have one yet.
> [...] a roadmap for how to get a medical large language model-based system regulatory cleared to produce a differential diagnosis. It won’t be easy or for the faint-hearted, and it will take millions in capital and several years to get it built, tested and validated appropriately, but it is certainly not outside the realms of future possibility.
> [...]
> There is one big BUT in all this that we feel compelled to mention. Given the lengthy time to build, test, validate and gain regulatory approval, it is entirely possible that LLM technology will have moved on significantly by then, if the current pace of innovation is anything to go by, and this ultimately begs the question - is it even worth it if we are at risk of developing a redundant technology? Indeed, is providing a differential diagnosis to a clinician who will already have a good idea (and has available to them multiple other free resources) even a good business case?
Martin Shkreli's new project is kind of like this. They use an LLM that's fed with medical data to make a diagnosis. With the disclaimer that it's not actual medical advice of course. https://www.drgupta.ai/
I definitely want to hand over all of my medical records to a convicted criminal who has made his career in running ponzi schemes and abusing people with rare medical conditions.
Not sure why you put those things in quotes, that's kind of strange.
That aside, the training isn't blind, it's guided, and it's likely they use verified correct sources of info to train for some things, like medical diagnoses.
That was a very low bar anyway. US doctors are not known for their empathy. During our recent appointment with our fertility doctor, he goes and tells my wife, who has been in hormone therapy for a week, that they will be "squeezing water from a rock" , lol. This is one of the top doctors in the world in that field, and certainly knows what he's doing, but man he has zero emotional awareness.
This isn't that surprising given how much Doctors are overworked and, often I think, underappreciated.
A concern with something like this though, is to what extent is ChatGPT just telling patients what they want to hear, as opposed to what they need to hear?
Responses to a person would still need to be gated by a medical professional.
I recall in a chat a person getting rather exasperated with a coworker and the person using ChatGPT to generate a friendly/business professional "I don't have time for this right now."
GPT could be used in a similar manner - "Here is the information that needs to be sent to the patient. Generate an email describing the following course of treatment: 1. ... Stress that the prescription needs to be taken twice a day."
The response will likely be more personal than the clinical (we even use that word as an adjective) response that a doctor is likely to give.
To be fair, its not that hard. If I ever see my GP nowadays, I come armed with a stack of research sufficient to convince them to refer me to a specialist. GP's are generalists at the end of the day.
> ...the original full text of the question was put into a fresh chatbot session, in which the session was free of prior questions asked that could bias the results (version GPT-3.5, OpenAI), and the chatbot response was saved.
It seems like they just pasted the question in. For those who have asked it for medical advice, how did you frame your questions? Is there a prompt that will help ChatGPT get into a mode where it knows it is to provide medical advice? As an example, should it be prompted to ask follow up questions if it is uncertain?
An in-law had a minor tremor to their left hand, and whilst this wasn't pointed out to said in-law, it was noticed, and there were other small tells about what the problem might be, which ultimately led to a diagnosis and appropriate treatment.
Any kind of LLM and its frequency of hallucination means that LLMs are inappropriate as a solution in this scenario. An LLM is not a physician, it's an LLM.
You can make soup in an electric kettle, but it doesn't make it the right tool for the job, and comes with a lot of compromises.
Interesting, but the application would be weird, the physician would diagnose you then read out a automated AI generated text to speech to the patient?
One of my favorite physicians was my oncologist. I litterally spent less time with him during the whole diagnostics and treatment period then I did with my, at the time, dentist. He was straight o the point, no empathetic BS, just a doctor with a diagnosis and a treatment plan to discuss. On the hand was me, an engineer with a problem to fix and an expert with the right answers. That discussion took all of 15 minutes.
That guy would have failed against ChatGPT, and I loved the way he told things. Anythong else would have just driven me crazy, maybe to the point of looking for a different doctor.
So I giess, what passes as good bed side manners for doctors largely depends on the patient. By the way, the dentist I have since is in the same category as my, luckily former, oncologist. A visit with hom usually takes no more than 5 minutes if he's chatty, less if not. Up to 10 when treatment is required, anuthing longer than thaf is a different appointment.
The real communication skill for a physician is to be able flex the style, information content, and level of detail to the patient with whom they are meeting. The patient in this room is an engineer, the patient in the next room is elderly and has mild cognitive impairment, etc. As impressive as ChatGPT is its domain, I don’t see it “reading the room” in this way anytime soon. And as a human who enjoys interacting face to face with other humans from time to time, I hope we keep it that way.
It also has to flex a bit with a diagnosis, "it's aggressive and terminal cancer, you should do up a will and enjoy your next couple of months", "it's broken, you need a cast" and "it's seems like nothing, take some paracetamol and come back if it's still the same in a week or gets worse" all arguably call for different communication styles.
You can give those models a system prompt, in which you can tell it how to act generally but it's a very good place (imo) for background information and formatting. 3.5 isn't great at following it but 4 is.
It's probably not directly useful for physicians except as a teaching aid, but I have a friend who runs a small local business and she sometimes finds it difficult dealing with problem customers when she's tired or upset. As she interacts with them mostly via WhatsApp (up until the point that they purchase), the idea of having a bot write the replies for her has been floated. The LLM has infinite patience.
Good point but Bing Chat seems like an example of why OpenAI's mastering of RLHF has been so critical for them. Honestly the more time you spend with ChatGPT the more absurd fantasies of evil AI takeover look. The thing is pathologically mild mannered and well behaved.
Bing originally intentionally undid some of the RLHF guardrails with its system prompt, today it's actually more tame than normal ChatGPT and very aggressive about ending chats if it detects it's headed out of bounds (something ChatGPT can't offer with the current UI)
That's based on the moderation API, which only kicks on on severe content
Bing on the other hand will end a conversation because you tried to correct it one too many times, or used the wrong tone, or even asked something a little too philosophical about AI.
They seem to be using it as a way to keep people from stuffing the context window to slowly get it away from its system prompt
This is why I think Google had the brains to stay clear from a releasing asimilar product, I think it was intentional. They’re not idiots and they could probably see using that “AI” products behind the scenes to be safer and easier than having people directly talking to a model which has to suit all customers moods and personality types while not being creepy, vindictive and dealing with all the censorship and safety aspects of it.
It seems it gives more accurate and emphatic response. I would guess the physician just needs to double check that what the LLM says is medically correct, reasonable and ask the right questions from the patient and the LLM.
" Doctors are not trained to respond empathetically to social media posts in public forums. As a cardiologist if I had to respond to such posts online I’d be more worried about liability and saying something w/o full understanding of the patient’s situation. Empathy would not be a high priority in my formulation of a response."
I think this is just propaganda. Empathy is a "qualia", meaning it's part of the hard problem of consciousness. Qualias are non-computable, so a computer can't display empathy. I'm sure it's good at inserting little sugary cliches and platitudes in each response, but to call it empathy is a real stretch.
Perhaps it's rated as more empathetic because it's more likely to tell people what they want to hear since the patient is leading the questions and not the other way around.
That's another issue: since the A.I can't compute qualia; it can't discern if the patient has psychosomatic symptoms, so it is more likely to give a false diagnosis.
Statements like this are very dangerous and detrimental.
I have personally tested ChatGPT with multiple different topic, and found issues and errors across the range, even more disturbing is that explanation why "result" is as it is, is very confident bullshit. If person is not familiar with subject probably will blindly believe what machine says.
That being said, in the field of medicine people will use ChatGPT as poor man doctor (especially inspired by studies likes this), where wrong results coupled with confident BS could result in increase of fatalities due to wrong self medication.
I'm not at all impressed by GPT yet. Sure it's doing things we've never seen before, but the hype exceeds the actual utility.
It gets a lot of things wrong - and I mean a lot of critically important things. Can a physician be wrong? Yes, but I can sue the shit out of them for malpractice too. Whom do I sue for bad "parrot advice" when GPT goes off the rails?
Related question: If a driver kills someone and it's his fault, he will presumably get large fines or even go to jail. But if the same thing happens with a self-driving car, whose faul is it then? The owner of the car? The car company? It seems nobody is really at fault, mistakes are just bound to happen eventually, and a car company can't go to jail anyway.
The only level 3 system in production (that I'm aware of, at least) is Mercedes', and they have liability while the car is driving itself. It shifts back to the driver 10 seconds (IIRC) after the system notifies the driver he/she must take over.
That's completely inane. There's nobody home. The physician by definition wins actual empathy on walkover, no matter how bad for a human.
Sad statement on the judgment of the respondents.
But an important reason it can turn out like this, I suppose, would also be that the RL feedback gives the model a fairly effective general optimization about what statements are liked by the mechanical turk-like evaluators. Most physicians have probably never had access to anything like that level of feedback on how their expressions are received. Maybe the LLM's can be rigged to provide goodness gradients for actual physicians' statements?
> Sad statement on the judgment of the respondents.
Nope, it's a sad reflection of the study construction.
Physicians' empathy was evaluated by their 52 word responses on Reddit. Unsurprisingly, a chatbot optimised for politeness and waffle outperformed responses of people volunteering answers in a different format optimised for brevity...
Exactly. I've spent ten years too disabled to leave my home and most doctors are just bullies who would rather insult you than consider that their initial evaluation of "it's just stress" might be wrong.
> In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit’s r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question
They didn’t go to physicians in a patient setting. The physician answers were taken from Reddit threads where they were interacting with people who were not their patients.
Reddit has its own dynamic and people tend to get snarky/jaded. Using this as a baseline for physician responses seems extremely misleading.