Let's say you're driving through an intersection with a green light, and there's a pedestrian waiting to cross. The robot has the right of way and goes, but suddenly the pedestrian decides to cross in front of the vehicle. Even if the reaction time was 0.00 seconds it's too late to avoid a collision. The problem is the robot didn't anticipate that the pedestrian was going to cross despite not having the right of way. Humans are better at reading social cues than robots. Maybe robots can learn that, but it's a significantly harder problem than path planning and image segmentation. This applies further than pedestrians and also drivers and predicting their behaviors on the road. And if you try to drive cautiously to avoid this potential scenario, you effectively stop and crawl every time you see a pedestrian and are not very useful for moving from point A to point B (not to mention all the pissed off traffic behind you).
The reason it's difficult is because it's an uncontrolled environment, and the robot has to be able to anticipate what other drivers/cyclists/pedestrians will do. Robots have done wonders in controlled environments, but trying to bring them to the real world has always been a struggle.
The standard isn't "perfect under all conditions", it's "better than a human". Humans are, honestly, pretty bad at driving. The bar is not that high, perhaps unfortunately.
Why does a robot driver need to anticipate this? Does a human driver need to?
If I'm walking up to a pedestrian crossing and a car is approaching, I don't just step out into the road, even though I have the right of way. I try to make eye contact with the driver to see if they recognize I'm crossing. They'll often nod or do something similar to signal that they're letting me cross.
A machine has to understand these social cues as well. It might even be helpful if the machine has a way to signal its intentions back to pedestrians.