Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Towards accurate differential diagnosis with large language models (arxiv.org)
105 points by gwintrob on Dec 4, 2023 | hide | past | favorite | 93 comments


This doesn't really surprise me given my experience with doctors. If it's a common problem then doctors do fine. As soon as you're not the common case or have something somewhat rare, it becomes a bit of a crap shoot, at least in the UK when you deal primarily with your GP (general practitioner).

I have a friend with Crohn's who was feeling low energy. I was a gym bro at the time and convinced him to take a testosterone test (because all problems re caused by low T when you're a gym bro).

His doctor wouldn't even entertain the idea, saying he's a young man and it's very unlikely that he'd have low T. He did the test privately and his T is significantly below normal. If you Google, there are actually many papers showing correlation between Crohn's and low T. I bet an AI would find it.

Similarly, doctors missed my mum's recent cancer diagnosis. She also had factors that would make her more susceptible to breast cancer, googling finds many papers that show causation.

The problem is that those things aren't extremely common and haven't made their way to NICE guidelines or whatever GPs use.

Not that I'm blaming doctors, they have 10 minute appointments and don't have time to do anything. I'm sure AI would recommend significantly more lab tests which would put even more pressure on the NHS.


Part of this is by design - from medical school, doctors are quite literally taught "when you hear hoof beats think horses, not zebras." The progression and flow the decision trees in diagnosis are built to find the most common ailments in the most efficient way. It sucks for people whose conditions are further down the tree, but generally it does what it was designed to do.

Hopefully technology like this can help us rethink those decision trees, or else crunch more patient-specific information to adjust the diagnosis process on an individual level more efficiently.


I dropped out of medical school fifteen years ago — entered another [non-tech] profession entirely - and there's one thing I always tell either of the following groups:

Said to physicians: "I dropped out and I TELL EVERYBODY I KNOW you cannot pay a doctor enough for all the unnecessary sacrifices they had to make, just to prescribe you an antibiotic."

...to Non-physicians: "While I believe that you cannot pay physicians enough for their sacrifices, you should definitely investigate your own ailments, and seek multiple opinions whenever your `gut feeling` indicates `abnormal`. Physicians simply DO NOT HAVE TIME to give a fuck about your specifics.

More recently, I have included (to all) "it is probably ALREADY UNETHICAL TO NOT BE CONSULTING WITH LLMs during differential diagnoses."

Just my ¢¢


Not yet. Your faith in people not being led astray by the LLM is too high right now IMO.

I've seen way too many people who I consider generally intelligent, completely falling for the convincing bullshit. Mostly in my online communities, but also a few times in person.


I have hypertension, as well as diabetes. I wanted my cardiologist to test me for hyperaldosteronism, because I had read it was not uncommon and could be a factor in hypertension that continues to get worse and worse, without an obvious explanation. He refused, saying that it was the job of my endocrinologist to do that test.

So, I talked to my endocrinologist and he said it wasn't a normal test they do, but he was happy to do it if I wanted it. Well, the test came back positive. So they sent me in for a CAT scan of my adrenal glands to see if anything was obviously abnormal there. Which came back negative. So, now they want me to do a salt test to see how bad my hyperaldosteronism is, so that they can give me the right amount of the drug for that condition.

But the key thing that set me off was that original article that said this was now recommended by cardiologists as a standard screening test. Maybe only 10% of patients have this problem, but it's common enough that it's still worthwhile screening for.

So, why didn't my cardiologist screen me for it in the first place? Why didn't my endocrinologist screen me for it without me having to explicitly request it?

I'm pissed.

I don't want to sue anyone for malpractice, or even mention that word to any of my doctors. But I do want to convince them that they really do need to screen for this condition.

Meanwhile, I'm going to continue to follow this diagnosis to see where it goes, and if they can help me get my blood pressure under better control.


You don't know how these doctors view you. They view you as a "non-expert" as one commenter put it. Basically you're stupid to them and your thoughts are pointless.

The only way to convince them is to sue. But there's nothing worth suing here for, you got the test. You can only sue if damage was done, then the doctor would be responsible for blocking the test when you explicitly requested it.


I need to find a doctor who can be a more active partner in helping me diagnose and address my problems.

If you have any suggestions in the Austin area, please let me know.


Best bet would be make a lot of money and get a concierge doctor haha. Otherwise you're what, probably a 10 minute a month appointment to them and whatever happens to you, they can add it to their statistics.


It seems they usually test for hyperaldosteronism if they see the obvious signs like low potassium levels. Maybe they did not observe this at the beginning?


My endocrinologist did mention that my potassium levels had tested out as normal. That is certainly true.


As almost every comment of this kind on this site, you are missing the wider socioprofessional context. Yeah sure, you may find papers about plenty of things that may affect you without your doc's knowledge, perhaps even guidelines that are to be followed.

The problem is that respecting every detail of every guideline all the time is pretty much impossible, and even a debatable stance. There are many committees producing guidelines left and right. Because those people's jobs are to produce guidelines, not care for people. The exact same goes for research papers, and validity for patients in practice is a difficult issue.

So, docs all practice mostly from habit and experience. It's a craft supported by science, but clinical medicine itself is science as much as bricklaying is a science. But that doesn't make AI the use-all-do-all, because if everybody starts to test for every little thing, diagnostic probabilities will decrease massively and guidelines would have to adjusted in conséquence. It's a feedback loop. And also, you'd pay 5x the price for healthcare, so not realistic.


> But that doesn't make AI the use-all-do-all, because if everybody starts to test for every little thing, diagnostic probabilities will decrease massively and guidelines would have to adjusted in conséquence. It's a feedback loop. And also, you'd pay 5x the price for healthcare, so not realistic.

It's hard for me to write out all my thoughts on the subject, but I completely agree with this. The paper cherry-picked difficult cases and compared how AI deals with them vs regular clinicians. I think that is something AI will excel at because there is no cost to being "wrong", as long as something it spews out is correct.

In difficult to diagnose cases, that kind of makes sense. Find the answer at "any cost". But you cannot apply that to every case. If you let the AI loose on fairly simple cases, I bet it would over complicate many simple things, because it couldn't handle the nuance of people coming in and just whining about a regular cold. "Patient comes in with a headache, might be brain cancer but might be a cold" ~ type DDx.


[flagged]


> statements like this literally set me off. The patient being right and the doctor ignoring the patient and treating the patient like an idiot

There's a huge number of people who vastly overestimate their medical knowledge, google some symptoms (I'm sure you know the pitfalls of that but many don't), do self-diagnosis as a hobby with friends over coffee, come to absurd conclusions and then bother their doctors about it.

It's not ideal but somewhat understandable that under time pressure experts want to to know the object level (symptoms, history) and don't want to waste time listening to the speculation of non-experts, even if that speculation is sometimes right, and even if some (the doctor can't tell who) of those non-experts actually make good points.


Given the amount paid by the patient to the doctor, the doctor is honor bound to listen to the patient, to consider what he said. This is not a trivial amount of money that the patient is forking over to the doctor.

It's understandable from the standpoint of irritation, if the doctor is presented with the same crack pot theories all the time then it's understandable how he would ignore such theories.

It's not understandable from a moral standpoint. If the doctor isn't delivering to the patient what equates to the thousands of dollars the patient forked over to him, then the doctor needs to be paid the amount he deserves.

When a drunk driver kills someone on accident he is honor bound by morality to be responsible for the death. Being drunk is not excuse. Same with a doctor. If he claims he's an expert and he's paid thousands of dollars for a check up he is responsible for being right and if being right means listening to every single theory from the patient because that's what he's morally obligated to do. Not wanting to "waste time" because of "pressure" is not an excuse. Ever. Lives are at stake.


If the doctor was free, would any of these things change?


A patient isn't just damaged by bad treatment. He's damaged by medical costs. He's damaged by the very thing that makes doctors rich. And often given in return incompetency. Bled dry by "experts" holding their lives hostage.

https://www.cnn.com/2023/01/31/health/us-health-care-spendin...

It's disgusting.

Oh and check this:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139931/#:~:tex....

It's not just incompetence. Trustworthiness is also an issue.


You can make the right call but still be wrong sometimes.

There's a cost to tests of rare diseases.


Then let the person who is wrong foot the cost. The cost isn't just the cost of the test. It's the cost of what occurred because the doctor decided to block a test.


No one is preventing you from going to a private clinic and paying for the test yourself. The doctor is not 'blocking' the test or preventing you from getting it, they are just not authorising it. Sounds like you have a problem with either your insurance provider or your wallet.


Or the jurisdiction in which you live in which labs are not permitted to test without a doctor’s order? This is very common.


https://www.cdph.ca.gov/Programs/OSPHLD/LFS/CDPH%20Document%...

This is a cartel based law. Basically patients can't order tests or even look at results from tests by law.

The "made up" reasoning for why this law exists is because the law makers acquiesced into agreeing with doctors that all patients are fucking idiots. That's the bullshit reason.

But really it's business reasons. Same reason why the doctor supply is made artificially low, all these things are legislated by the AmA to lower the supply of doctors and give them more control.

>Sounds like you have a problem with either your insurance provider or your wallet.

If you're a doctor. My problem is you. How does a doctor even have the gall to say this? It's flat out wrong. So if you're a doctor there's only two logical possibilities for why you said what you said. 1. You're a liar and are therefore untrustworthy. 2. You don't know what you're talking about and are therefore incompetent. Which is it?


I am a doctor, and I encourage you to learn about how diagnostic test performance is modelled, and the consequences linked to overtesting and testing without contextual knowledge. It may change your perspective and lead you towards less extreme opinions.


Are you a US doctor? I urge you to explain why you charge so much and have the poorest performance out of all 1st world countries? It may change your perspective on treating patients like idiots when really the US doctor is the epitome of fraudulent incompetence.

Over testing is a problem. The way to lessen over testing is to not be an incompetent doctor so that patients don't take things into their own hands. Barring that you have no choice but to let the patient test himself. It's the moral thing to do.

I guess here's an interim solution. Even if the disease is extremely unlikely, if the possibility remains valid based on the given symptoms and the patient requests the test you should allow it. The reason is the patient is more intimately familiar with his own symptoms and your just pattern matching based off of a high level probability. If you've never seen a patient with a certain condition you often bias towards the conditions you've seen.

Yes patients can be biased but so can you. And there's no denying that patients can do more detailed research and they have a perspective with a resolution you will never achieve. The good doctor needs to listen to them.


I have worked in both the US and Europe. I am back in Europe now, as I vastly prefer it over the US.

> Even if the disease is extremely unlikely, if the possibility remains valid based on the given symptoms and the patient requests the test you should allow it.

Precisely not. That's exactly how you get unusable results with super expensive healthcare, because the follow up that a positive test dictates is often worse than the initial problem, and can impact the population at large (e.g. antibiotics use). All that in a context where the post test probability is very low. There are situations where this does not apply, because we have good exploratory AND confirmatory tests, but that's far from being the norm.

Regarding your other points, sure we should always listen to what the patient has to say. There even is a saying about it: "90% of diagnoses are made on personal history". That's because diagnostic tests often are far less useful than you'd expect. Again, I encourage you to learn about how medical testing works. You say you root for AI to replace docs, but apart from not being human anymore, I doubt you'll find your robot doc to be so different regarding your particular situation.

All that aside, I'm not rich. I chose to stay in Europe where I earn less because both US doctors and patients are so toxic. You are a prime example of this. In Europe, we test much less and overall both docs and patients are far less aggressive. Yet healthcare here is cheaper, universal and IMO better overall. Everybody profits from more relaxed doc-patient relationships. US Healthcare is first and foremost a cultural issue, and US patients are part of the problem as much as docs are.


>All that aside, I'm not rich. I chose to stay in Europe where I earn less because both US doctors and patients are so toxic. You are a prime example of this.

I'm not gonna deny this. A lot of people in the US hate doctors. Part of it is the ineffectiveness of the health care and the other part is the price. We aren't getting what we pay for and we also don't trust the doctors. So there's both a lack of trust and a lack of respect. I enter any doctors office the same way I enter into a hardcore business deal which is I anticipate that the doctor is not in it to care for me, he's in it to get the most money out of me possible without breaking the law.


Also, the other part of this that's fishy is why can't the test results be reported to the patient? Your reasoning makes sense for over testing. But does it make sense to censor a doctor prescribed test from the patient? No it doesn't. The law is made to make the doctor central to treatment. The patient must schedule another appointment and must go through the doctor in order to treat himself.

There's too much business incentive here. You label me as toxic but there's a reason why patients in the US have such a vicious attitude toward docs.


Many people feel the way you do, and it's understandable. Those laws vary from country to country, but variations of the same are found throughout western countries. It's in part a remain of the paternalistic medicine of olden times, where docs were supposed to decide for patients. But the main goal today is to avoid patients learning about life-changing news (AIDS, cancer, etc.) without being supported emotionally and informed about what results imply, to avoid rash or violent reactions (suicide) due to misunderstandings. I personally experienced grave misunderstandings over news that seemed trivial to me, but that the patient took as extremely aggravating. Of course this rule is very profitable as well for private care providers (which I am not, BTW). At this point, whether or not the rule is justified becomes a political issue, and I don't think there's a solution that would satisfy everybody.


Well how about letting the patient choose how its reported. And even letting the patient be aware that the results might be life threatening. This will satisfy everybody.

Most tests aren't for things like testing for HIV. So your reasoning makes zero sense as this law is applied to every test. And the whole suicide thing is obviously a miniscule issue. You're just making that crap up to sow controversy. Yeah I'm sure you encountered dozens of patients ready to kill themselves over test results but only your kind words were able to completely assuage them and stop them... Yeah sounds like a load of bullshit to me.

There's no controversy here. It's a business incentive through and through.

Patient choice would satisfy all patients. Only doctors wouldn't be satisfied. And that's the problem.

Patients are dealing with there lives here. And Even you as a third party are painting the situation as "controversy" because you've been so desensitized by the patients you're supposed to take care of.

Even though you're not in private care you are part of the problem because you're blind to the problem. I know multiple patients who had surgery in America only to go to another country for a check up for the doctor in the other country to say the surgery was completely pointless.

You know how serious that is? I don't think you get it. Painting a crime as "controversial" or "political" issue is extremely aggravating for the person who's life is at stake. The only reason you characterize it this way is because you're not the one who's life is at stake.

Patients aren't toxic. Patients are desperate. And most doctors including you just don't give enough of a shit. I'm sorry but that's the truth.


For the record, I'm not in favor of such rules. Yes, you're really toxic, and display the typical US reaction. I'm fucking relieved that I don't have to deal with US egos and all this US political shit anymore. Peace out, and take care.


Like every other patient my life and my money are more important than anything else. It has nothing to do with ego.

I can't respect a doctor who doesn't understand that and thinks it's just toxic. You're lucky to be practicing Europe.


They suck the *insurers* dry?

Insurance is one of the major problems in the US, and they're a parasite on all of the US society, so this line of thinking is completely backwards.

However, the idea that physicians should make less is reasonable, in order to get people who are more interested in the real medicine, rather than money seeking people that is often found now. Getting rid of insurance and the structures that involve it essentially solves this issue.

Unfortunately, it's unlikely to occur, so sidestepping everything using capitalism is the only way forward I can see. One good example is making diagnostic tools at the house more common place. It's pretty typical to be able to purchase an entire machine for the price you will be charged at a physician's office.

For instance, a good 12 lead ECG may cost about $1300 to get for your house, but it will also cost that much to get one test done at the physicians office. So it's clear that buying one for the house is the way to go, as you can do it everyday for years for the price of one time through 'the system'. It could also be much cheaper if targeting every household through economies of scale.

This is true for essentially all diagnostic tools. So it's about time people just expand their first aid kits to involve some of these simple diagnostic tools, like urinalysis kits, etc.

The interpretation is of course valuable, as it takes a little while to understand the output of different tools, but much of this is very capable to be automated and sold with the tool or plugged into a foss solution.

Perhaps it's not ideal to some, but it's the easiest to occur in capitalistic settings and where insurance keeps costs at institutions so high.


Insurance is parasitic but they are the symptom not the root of the problem. They suck insurance dry so insurance has to suck you dry. The root is low supply of doctors. Thus unhealthy competition and thus doctors and the whole industry can charge exorbitant prices. They rip off insurance so insurance in turn rips you off. Make no mistake insurance is ripping you off, but there's healthy competition in this area. One insurance company can undercut another if they could... But they can't. Because the root of the problem is the cost of care itself.

Of course the reason why there's a low supply of doctors is a deliberate business maneuver made by the AmA to lower the amount of doctors by lowering the amount of medical schools. In short Doctors are the reason why there's a low supply of doctors.

The problem isn't purely capitalistic. It's sort of a hybrid of capitalism and government thats what causes this problem. When doctors aka the AmA are able to influence law to give them an unfair advantage that's what is responsible for the current state of things.

In fact hybrids are responsible for most of corruption. You have to keep a clear separation between business and government for a healthy free market economy as in business cannot influence government but government can influence business. The incentives are misaligned for both parties. One is profit the other is honor, thus they cannot mix.


If it's not already obvious, LLMs are going to be doing most of the mental work currently performed by doctors, lawyers, accountants, etc.

I have already nearly stopped using Google search for anything, in favor of GPT-4.

GPT-4 has helped me very quickly prototype things that I normally would have had to spend hours researching.

GPT-4 has also created custom curriculum for me to help me learn various things for which I have struggled to find good books/tutorials online.

There will always be many areas in which a solid human intellect and well-honed human judgment is still useful, but much of the less critical work will yield to LLMs.

If we call the difference between GPT-3 and GPT4 1x, then I would expect to see a 2x-5x improvement in LLM capability within the next few years just based on how much great work has recently gone into shrinking really big models so they can run on smaller hardware.

LLMs are not digital human minds, they are simply very good information synthesizers. Information synthesis happens to be what most white collar professionals get paid to do with their brains.


That is not obvious, and at least in the case of physicians is unlikely to come to pass. They don't just take a written list of signs and symptoms as input, and produce a diagnosis as output. Often the evidence they use to make a diagnosis isn't even written down anywhere. And for most primary care cases, the diagnosis is only a minor part of the workflow. The hard part is in working with the patient to make a practical treatment plan, and then course correcting as needed based on results. This is as much art as science, and relies on tacit knowledge that no LLM can access.

In order to automate most medical care we'll need multiple breakthroughs that go way beyond LLMs. Linear improvements in LLMs won't get us there.

LLMs do have potential to improve other areas of clinical practice such as charting.


> If we call the difference between GPT-3 and GPT4 1x, then I would expect to see a 2x-5x improvement in LLM capability within the next few years

it's interesting this is probably the assumption of many people leading to the current hype cycle.

But in reality it's not how progress tends to happen which is a much more punctuated equilibrium type phenomenon. My guess is it's actually quite likely that most of the progress will be made on maturing things and underlying capability will plateau.


I’m very glad you set the argument this way, since it very neatly sets the stage for the core disagreement to be argued.

> Simply very good information synthesizers

> Simple very good syntactic synthesizers

I argue that theorists tend to adhere to the former, and practitioners the latter.

Consider prototyping: For an expert, it costs nothing (or near nothing) to check the output. You even start better, because you know how to ask the right questions to the tool in the first place.

LLMs dont get the semantics right, they get syntactic correlations right. When a subject is common place (literature), the correlations are good enough to get the reasoning right.

There is an IMPRESSIVE amount that gets done with just this. However, expecting to get reasoning right is the bridge too far.

Wrong expectations of tools, result in bad management projects and failures.

You can test this out right now. IF an LLM is able to do information synthesis, it is trivial to set up parallel prompts to work as teams. Try it out.

I did, I know why it can’t work as a result. The simple question to ask is “whats your error rate, and your hallucination rate?”

LLMs are always ‘hallucinating’.

I do hope, that the difference eventually becomes academic. That there is so much training material, that correlation is equal to reasoning. However, it will not be reasoning/semantic prediction.

Finally - the real world work has properties that are emergent. You can read every spec sheet you like, but when you start assembling components together, they are going to do weird things.


> IF an LLM is able to do information synthesis, it is trivial to set up parallel prompts to work as teams. Try it out.

What do you mean? I've had GPT-4 (3.5 I think too) talking to itself, setup as a planner and someone critiquing a plan, the end result is much better.

> However, expecting to get reasoning right is the bridge too far.

Hmm, how do you position othello-gpt in this? It builds a world model and makes moves based on that, so it's not just correlation of input (x,y,z typically followed by A so respond A).


1) Set up multiple prompts to work together and make a web page.

See how far that goes, before it’s nonsense talking to nonsense.

2) Production is proof. Get a working LLM based reasoning system that doesnt need its hand held. The process of failing is educational enough.

For the record, I would love it if these things work.


Interesting comment.

I think it is clear that GPT-4 contains a LOT of information. You can ask it explicit factual questions and it often gets the right answer. When it gets the answer wrong, its wording is typically syntactically correct English, or syntactically correct code.

I'd argue that just because some of the errors it commits are "semantic errors" such as calling a method by an incorrect (but often similar) name in a segment of code or printing false statement in well-crafted prose, it nonetheless gets a lot of the semantics right.

What is reasoning besides a set of language patterns that we define as valid reasoning? Imagine evaluating statements in various formal logics. Nonsense in one can be valid in another based on semantic rules alone.

One could derive the model (model as in model-theoretic semantics) of a formal logical system by sampling a list of valid and invalid statements.

LLMs are doing that kind of thing, it seems. There are gaps, but they are not necessarily gaps that the LLM itself cannot notice.

For example, I will often ask GPT-4 to formulate a plan for something or to create a list of priorities/considerations for an undertaking. I will then ask it to draft an initial plan. After that I will ask it to review/critique its draft based on the initial goals. It typically points out exactly the kinds of gaps that a human would point to as deficiencies that indicate the LLM is not reasoning.

In my view, this indicates that the "knowledge" of how to do the task was always aviailable to the LLM, but the interface (or some aspect of the internal implementation) did not allow the knowledge to be applied all at once. This is not necessarily dissimilar from human intellectual work, in which drafts and self-critiquing is not an unreasonable series of steps.

I have done some work with parallel promopts and various "roles" for different LLM interlocutors toward the same task. While it does sometimes go off the rails, it seems clear that multiple prompts with role-based instructions do achieve a greater level of analytical rigor than a single prompt.


I have to admit, that discussing this topic is forcing a precision in the vocabulary used to discuss reason.

However, it is a TEDIOUS process. It’s simplest to just make something - the parallel prompts scenario for example.

Can you leave your LLMs to their own business, and will they have a workable product at the end of it?

As you said it would seem as if they have the “knowledge” of the task. It should not be an issue.

However you will not get anywhere. The issue isnt in the LLM - the issue is in misunderstanding what is going on, and therefore expectations.

LLMs don’t notice things - in the human manner you assumed they do. It isnt point out gaps, its repeating text patterns.

Our habit of dealing with humans is filling in these gaps and supporting an assumption of ‘noticing’ or ‘improvement’.

I think it is hard for us to consider changes in text, without seeing the changes in meaning as well.

With LLMs you have to accept that it’s not seeing meaning, it’s just seeing correlation.


> LLMs don’t notice things - in the human manner you assumed they do. It isn't point out gaps, its repeating text patterns. Our habit of dealing with humans is filling in these gaps and supporting an assumption of ‘noticing’ or ‘improvement’.

True. It is important to avoid anthropomorphizing LLMs, etc. I do this intentionally now and then but I agree it is dangerous to do it accidentally.

> I think it is hard for us to consider changes in text, without seeing the changes in meaning as well.

True, but since LLMs were trained on text sequences that had meaning, much of the meaning was accidentally embodied in the resulting model. Areas where LLMs seem to reason well happen to be the areas where the training data was sufficiently generalized and the language tokens used similarly enough... such as recipes.

> With LLMs you have to accept that it’s not seeing meaning, it’s just seeing correlation.

True. I think it is interesting how much it often feels like knowledge.

I think we also have to be careful not to overly glorify human "knowledge" as something other than producing a pattern of output signals in response to a pattern of input signals.


I agree. I think precision in all cases makes it possible to pursue the technology, as opposed to almost dabbling in superstition and philosophy.

However, it’s taken an inordinate amount of time to achieve even the finesse shared in this comment chain. It’s a challenging topic.

We have to carve out levels of utility for systems that generate output given inputs. Some way to distinguish between what our wetware achieves, and what LLMs achieve - Without putting our models on a pedestal.


> If it's not already obvious, LLMs are going to be doing most of the mental work currently performed by doctors, lawyers, accountants, etc.

While I don't know you personally and I'm not talking about you specifically, "just" feeling very excited about something doesn't Make This About You, the prognostications don't actually help you Be a Part of This.

This is the same energy as being really into COVID-19 (what was it called, "corona-scrolling?"), the gambling-fueled crypto boom, the retail stock trading booms. It's kind of adjacent to the fallacy Feelings that are More Strongly Held Are More Valid.

It's like you could use these things as a barometer for the health of a social media channel.

Anyway, I'm sure there's a name for this part of the hype cycle - tapping into how people want to be a part of the number one biggest, hottest trend by, at the very least, talking about it all the time, trying to compete against each other by sucking the most air out of all rooms through ever-greater hyperbole.

One thing's for sure: there's some guy who found some massive low hanging fruit in some neural network designs by carefully choosing brackets in chained matrix multiplications. That's one guy, and he definitely Made It About Him in a way that makes sense. How many people on Earth have enough knowledge to see things like that, maybe 1,000-10,000? It's so exclusive, in a sense.

I guess my point is, you're pretty far off the mark. In the interest of curiosity: Standard practice in medicine is to examine the patient before giving an opinion, a big roadblock for even the most competent of multi-modal models. Also, most people are biased in that they are intimately familiar with their own lived health problems, so they fill in tons of blanks, discounting the value of inquiry during a consult, whereas doctors may see you for less than a few hours a year, LLMs much less so.


I strongly doubt that the LLM club will remain exclusive for very long, or that the innovation in this space was restricted to 10k people.

At a minimum, I’ve already begun using gpt-4 as a consultant for my fathers medical issues. I can reasonably get confirmation/dis confirmation of whether a medical experts opinion is BS or not. As anyone who has dealt with the medical system in the US can attest - the majority of opinions you get from experts are either BS, outdated, or folks talking their own expertise.

Asking the question to GPT-4 can help you decide when to talk to a different doctor, or try something new entirely.


This adds a whole new dimension to AI safety, it's one thing to dumb it down so it can't do bad stuff, but when you do that you're making it give worse medical advice to people who are going to take that advice.


It's possible most publicly available models (or at least "models the general public uses") will eventually refuse to give medical advice


Apologies, I may misunderstand: there's a lot of Novel All Caps Concepts Even Though Caps Is For Well Worn Cliches. I Think Maybe You Think They're For Accusations? Insults? Big Concepts? For Handwaving At Big Concepts You've Divined Other People Are Thinking?

The last paragraph caught my attention though you were a bit abstract _and_ obtuse, due to the structure being "brief aside for the curious: LLMs don't have bodies"

My job and existence is centered around enabling doctors via LLMs and it absolutely is astounding to and for them. Funnily enough, it's the simplest thing: citations and easy UI to reach them. They perceive it as quick literature review on demand, as far as I can tell, that means they get to skip about 5 minutes across 4-8 articles checking links for relevancy.

I don't think OP was proposing that LLMs will replace doctors as you seem to think. I can't identify anything in their post suggesting that.


I have heard it described as a form of Pascal's wager. If you shout from the rooftops that LLMs are about to spawn a god machine, you get rewarded with attention, and if you're a researcher, job opportunities. There isn't nearly as much room for critics right now.

And if a few years down the line it turns out that you were wrong, everyone else was to, so no big deal.

In fact, you can already see it happening. Scaling up LLMs was surely enough to get to AGI, but now that apparently OpenAI is working on Q*, it's obvious that LLMs alone aren't enough, but LLMs + Q* are!


This is sort of like reading a 5 year old saying hamburgers are supposed to be good but adults are surprising the truth, the top secret new Big Mac sauce is required to make it edible, not to mention a bun.


Q* isn’t separate from LLMs. Q* supposedly gets you a shortcut to AGI using LLMs of current scale.


My GP starts with phone or video consultations 100% of the time. There is no option to see them in person without a referral after a phone or video consult. (I chose them because of the video focus)

Will a good LLM based "consultation" fairly often have to refer you to a face to face consultation? Probably, but of my consultations over the last few years, a significant majority were resolved purely remotely.


The real mental work is when your client, patient, etc. struggles to articulate the problems, data is wrong or mislabeled, etc.


Call me when an LLM can ingest a pts medical record from 3 different sources in multiple formats ranging from structured data to free form text stored (I shit you not) as TIFF files, extract the actual information, and present it for review prior to a visit

and for a follow up visit, read the prior visit's note and orders, search/track down all the test results (may be at more than one institution with multiple EHR systems that don't interchange) and summarize

or even just be able to take verbal orders like it used to be: "get a CBC, CMP, TSH, and CXR"

The NEJM CPC's are after the hard work has already been done to get the data to present the case


> Call me when an LLM can ingest a pts medical record from 3 different sources in multiple formats ranging from structured data to free form text stored (I shit you not) as TIFF files, extract the actual information, and present it for review prior to a visit

LLMs were born to do this.

> or even just be able to take verbal orders like it used to be: "get a CBC, CMP, TSH, and CXR"

GPT-4 of course knew what that meant, although I didn't. I told it to roleplay a potential follow-up:

> "Patient with fatigue, weight loss, mild fever. CBC shows anemia, elevated WBC. CMP reveals elevated liver enzymes. TSH normal. CXR clear. Consider ESR, CRP, ANA, and abdominal ultrasound. Possible infection or autoimmune condition. Referral to a hematologist may be warranted."


I know an LLM can do this when given the data in an ingestible form

it's the taking of data from disparate sources including literal paper faxes that's the hard part


You’re being dogmatic about your, what seems to me, objections to LLM skills.

Making data ingestible is just laborious and unappetising to an engineer. So hand it over to a data entry operator who’ll do this happily for a wage.

I’m not sure why the issue of making data available to an LLM is a failure of the LLM.

Btw, OpenAI’s LLM can very well parse images, read handwriting, fix sloppy spellings, and extract objects from unstructured data.


There's no need for paper faxes - fax is a digital format. But of course, if you intentionally and pointlessly output to formats that make input to an LLM more work, it will be hard.

Other than that dealing with disparate data formats is something LLMs can already do very well, and that can be improved fairly significantly by putting a "straightjacket" on it (making it use APIs or selecting only tokens that fits a given grammar)


I mean, have you used GPT-4? Free form text is its bread and butter, and yes, it can absolutely ingest images and read text in them, and combine that with other information from multiple sources in structured or unstructured text or image form, no problem!

What you are describing sounded decades away last year. Today it's not something you can rely on quite yet if you need accuracy. But if you still believe it's decades away then you need to update your perception of what's going on in AI right now. Forget the free GPT-3.5 or Bard, you really need to try GPT-4 to understand what's about to happen.


This reminds me of what happened in self-driving. It went from "decades away," to "it works but you can't quite rely on it for accuracy" in a year and then it stopped cold for some reason.


Useful Full-Self-driving is arguably harder than "useful automated medicine" in two ways. First, you need insane reliability to match human performance. Much higher reliability than doctors even. Second, it's not "just" information processing somewhere in the cloud. It has to run in real time inside a car. Both are very hard in that the conditions are very context dependent and vary a lot (a.k.a. the real world is messy).


> and then it stopped cold for some reason

By stopped cold you mean "there are actual real world self driving cars delivering actual real world passengers in select locations"?


Because it kept crushing peoples knees.


see my comment above

Pt sees Dr in office A, has labs and scans ordered. Labs order is printed out and given to pt, who takes it to lab A. Scan #1 is ordered, and order faxed to facility B, who independently calls, schedules, performs, and interprets scan #1, same for scan #2, but at a different facility. Office A has no electronic link to any of these facilities

How does LLM get the data?

A human knows the patterns in the area, and calls around for it, then collates it and feeds in to office A's EHR, where if lucky, LLM can do it's thing

LLM is only saving the bare minimum of the work

In an industry that is still reliant on the fax, I'm not holding my breath for the advent of the LLM savior


In the situation you describe humans fail constantly. My experience with the medical system has decidedly not been that magically competent back office people call every office you've ever visited to assemble your complete medical history from scraps and effectively organize it for the doctor. It's a constant game of telephone where you have to do most of the work yourself, calling around and giving instructions to receptionists to give to other people to get this stuff done. They can't even communicate reliably between the nurses and doctors in the same damn office to avoid asking the same questions multiple times every time you visit.

What you describe, if it worked reliably, would be an advance over current human-level average performance. And AI will absolutely be able to do it, too. Probably within ten years, though it will take longer to deploy after the capabilities are there.


No, you’re right, it doesn’t work reliably

I just think the y combinator crowd underestimates the human obstacles and ancient tech in the path of getting AI implemented


That only concerns the time needed to implement it though, not capabilities. If an AI is known to be significantly better at diagnosing than your GP, and you know and believe this (this of course has yet to happen and then take time to be noticed by people), then you will use it. You will demand it is used. People actually care about their health.


What you're describing is a process and information management failure that ought to be addressed first, sure.

I book a video consult. I talk to a doctor, who has an initial description of my symptoms from my booking on her computer. She takes notes on a computer (I have access to them). If needed, she orders labs on a computer. I go in, and there is no paper, just an electronic record, and while there may well be paper in the process in some instances behind the scenes, data entry people ensures it is all added to the electronic record. If labs are not needed, she'll submit my prescription electronically to my local pharmacy.

This is how it's worked when I talk to my doctor for years (UK; not nearly all GPs have gone this far)

It may well "only save the bare minimum" of the actual labour happening behind the scenes, but with a well functioning system, it saves the expensive, resource constrained part of the work.


> How does LLM get the data?

* Run database query to see where the type of scan is typically ordered (we know they can generate sql pretty alright)

* * Alternatively someone at one point needs to just make a list

* Call the office (voice conversations is clearly possible, it's a feature in chatgpt as an app)

* Receive fax

* gpt-4-vision or OCR then GPT-4 or however you want to interpret the data

It's not like it's not work to build something like this, but all the pieces are right there.


Not hard to strap GPT voice to a phone line, and provide it with some information of local labs and their contact information.

Also you said the office doesn’t have an electronic link but your example workflow includes the phone ;)


So you describe a couple of things LLMs seem near ideal for, and one thing that is a horrific failure of information management that it's shocking if doctors are wasting their time on, and a case for other automation.

At my GP at least, all of this is presented to doctors digitally, all notes are digital, and all requisition of follow up tests are digital, and my initial consultations are all phone or video and only follow up visits are in person and only if a physical examination or tests are needed.


We are doing similar work at GenHealth.ai and getting sota results on some evals (not yet published). Our approach is very different from LLMs in that we are using a medical coding vocabulary and we are training transformers on actual patient histories. We have an API if anyone here wants to build on it. Oh and we are hiring


I wonder if they made any effort to check whether the NEJM case studies that this whole study is based are in the PALM-2 training dataset.


Did you see section 6.5 Contamination Analysis?


No I missed it, was looking in the methods section as opposed to the results, thank you!


Damn, really?


In poor countries there aren't enough doctors. Billions of people don't have access to healthcare and that won't change until the country becomes wealthier. Public healthcare that covers everyone can't happen because the reality of poverty makes it impossible. For these billions of people, the valid comparison isn't human doctor vs AI, it's no treatment vs AI. I am extremely optimistic about the positive role of AI in medicine for this reason.


The irony is that we may see another leapfrog moment similar to analog cellphone networks and Internet infrastructure (where many poor countries largely skipped analog and modems because they had their boom later), where it will be easier to deploy AI for these purposes in countries where the existing healthcare system is underfunded and lacking in political clout than in richer countries even for situations where it delivers better outcomes.


Sure, but how will they train in an appropriate level of snark, abuse, and sexism to convey authority?

(I miss House and hate how it ended)


Differential diagnosis > Machine differential diagnosis: https://en.wikipedia.org/wiki/Differential_diagnosis

CDSS: Clinical Decision Support System: https://en.wikipedia.org/wiki/Clinical_decision_support_syst...

Treatment decision support: https://en.wikipedia.org/wiki/Treatment_decision_support :

> Treatment decision support consists of the tools and processes used to enhance medical patients’ healthcare decision-making. The term differs from clinical decision support, in that clinical decision support tools are aimed at medical professionals, while treatment decision support tools empower the people who will receive the treatments

AI in healthcare: https://en.wikipedia.org/wiki/Artificial_intelligence_in_hea...


Yikes, do I read this correctly and the LLM alone outperforms clinician + LLM?


This has come up before in similar contexts. Model based decision decisions tend to do better than clinicians alone or clinicians with the model based decision system in general. Despite this it’s been effectively impossible to deploy such systems in a clinical setting due to many issues but one is that clinicians aren’t willing to cede their ground and patients aren’t willing to believe the machine over a human.


granted, I feel like the training of some physicians also doesn't make them that good at diagnosis. Over time, intuitively understanding the narrative history of the illness/symptoms is key; that being said, nowadays, this sort of thing seems scattershot in some younger physicians.

Med schools tend to try and rejigger their curriculum frequently to try and justifiably make the experience friendlier and w/ less workload for the students, but sometimes, it backfires in terms of this.

I had an older school professor who liked to remark, in his day, no such thing as a differential diagnosis, just the right one and a bunch of wrong ones.


I don't see how those models couldn't be used to at least offer a second opinion. They are pretty cheap to run. I doubt patients would complain about that.


Is there any way for me to get ahold of one of these models for my own private use?


That surprised me as well. The case for humans being merely assisted by AI may not be the strongest one. It could be that they work better without human input, which is at the same time amazing and a bit frightening.


Is that across all subspecialties?

I hear internists talking about +LL and -LL all the time and I'd hope that reflected some higher degree of rationality when reasoning out diagnoses, but perhaps not?


It also outperforms clinician + googling which the study observed (many doctors use google search to help with diagnosis, terminology, recent studies, etc)


Hell, the LLMs themselves outperformed clinician plus LLM...


Just to make analysis more complicated, not all errors in are created equal: It may be better to have lots of safe errors versus a few huge ones.


Makes sense. In many applications the ‘human in the loop’ will become the weakest link. Imagine giving alpha go to players as a suggested move assistant. Even the best players would have seen alpha go’s suggestions and thought they were an error. Can we eventually trust AI even if we can’t understand its reasoning?


They'll be able to explain their reasoning......


Explaining and understanding the explanation are different.


Yeah, it's a bit awkward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: