Screwing up gender and years sounds pretty serious to me?

beng-nl · 2025-03-04T15:05:01 1741100701

Maybe they mean that it either doesn’t matter in context or it’s easy to catch and correct. Either way it seems reasonable to trust the judgement of the professional reporting on their experience with a new tool.

stvltvs · 2025-03-04T15:13:36 1741101216

I worry that we'll get complacent and not check details like that when they are important, not just the medical field but everywhere.

bluefirebrand · 2025-03-04T15:33:41 1741102421

Yes, I think this is likely

The OP says there aren't many hallucinations, but I think that observation is almost impossible to verify. It relies on the person making it to have a very strong ability to notice hallucinations when they happen

Most people do not have the attention to detail to really spot inaccuracies consistently. Even when someone is very good at this normally all it takes is being overtired or stressed or distracted and the rate of misses will go way up

I trust coworkers to write good code more than I trust them to do good code review, because review is arguably a harder skill

Similarly, I think reviewing ML output is harder than creating your own things, and I think relying on them in this way is going to be disastrous until they are more trustworthy

visarga · 2025-03-04T20:31:44 1741120304

> I worry that we'll get complacent and not check details like that when they are important, not just the medical field but everywhere.

The performance level goes up all the time, it won't be this bad for long.

johnisgood · 2025-03-04T15:45:34 1741103134

If it is easy to catch and correct, why cannot Copilot do it? Sounds like something that it should know.

hulitu · 2025-03-04T15:54:26 1741103666

Like with every Microsoft product, the testing is done by the user.

dumbmrblah · 2025-03-04T15:57:04 1741103824

It's more in scenarios where I enter the room and I ask the patient whether this is their wife/husband etc. It's not like I'm going into the room and saying "hello patient you appear to be a human female". The model is having difficulty figuring out who actors are if their are multiple different people talking. Not a big issue if all you're doing is rewriting information. But if multi-modal context is required, its not the best.