Maybe they mean that it either doesn’t matter in context or it’s easy to catch and correct. Either way it seems reasonable to trust the judgement of the professional reporting on their experience with a new tool.
The OP says there aren't many hallucinations, but I think that observation is almost impossible to verify. It relies on the person making it to have a very strong ability to notice hallucinations when they happen
Most people do not have the attention to detail to really spot inaccuracies consistently. Even when someone is very good at this normally all it takes is being overtired or stressed or distracted and the rate of misses will go way up
I trust coworkers to write good code more than I trust them to do good code review, because review is arguably a harder skill
Similarly, I think reviewing ML output is harder than creating your own things, and I think relying on them in this way is going to be disastrous until they are more trustworthy
It's more in scenarios where I enter the room and I ask the patient whether this is their wife/husband etc. It's not like I'm going into the room and saying "hello patient you appear to be a human female". The model is having difficulty figuring out who actors are if their are multiple different people talking. Not a big issue if all you're doing is rewriting information. But if multi-modal context is required, its not the best.