Predicting motion once you have small time slices and very accurate 3d representations is very very easy. You can easily calculate expected paths. You have to remember that computers see the entire situation at the same time. A bike doesn't just cut off a self-driving car the same way it does for a human. Humans are slow, our increments of time are large and in the hundreds of milliseconds and we can only focus on a couple of things at a time. A computer will notice the slight change in velocity and acceleration within single-digit milliseconds. Then it just has to predict the probability of collision. These calculations are simple.
Deciding what to do in these situations can very much be efficiently hardcoded using decision trees. No one right now working on self-driving cars dares to use a neural network or any other unexplainable & unbounded ml algorithm for policy. You have to be able to hard code in new edge cases as they emerge. You have to be able to study specific crashes or incidents and then adjust the decision-making scheme to specifically avoid that situation in the future.
Truly, the hardest problem is taking in data from multiple sensors, segmenting it, and then labeling it. All in real-time. The sensors are faulty and super expensive. There are also so many different objects out there. If you actually look at the ancillary startups in this industry. They're not working on "common-sense" general intelligence algorithms. They're working to make better & cheaper lidar. They're working on computer vision problems. They're working on image segmentation.
Let's say you're driving through an intersection with a green light, and there's a pedestrian waiting to cross. The robot has the right of way and goes, but suddenly the pedestrian decides to cross in front of the vehicle. Even if the reaction time was 0.00 seconds it's too late to avoid a collision. The problem is the robot didn't anticipate that the pedestrian was going to cross despite not having the right of way. Humans are better at reading social cues than robots. Maybe robots can learn that, but it's a significantly harder problem than path planning and image segmentation. This applies further than pedestrians and also drivers and predicting their behaviors on the road. And if you try to drive cautiously to avoid this potential scenario, you effectively stop and crawl every time you see a pedestrian and are not very useful for moving from point A to point B (not to mention all the pissed off traffic behind you).
The reason it's difficult is because it's an uncontrolled environment, and the robot has to be able to anticipate what other drivers/cyclists/pedestrians will do. Robots have done wonders in controlled environments, but trying to bring them to the real world has always been a struggle.
The standard isn't "perfect under all conditions", it's "better than a human". Humans are, honestly, pretty bad at driving. The bar is not that high, perhaps unfortunately.
Why does a robot driver need to anticipate this? Does a human driver need to?
If I'm walking up to a pedestrian crossing and a car is approaching, I don't just step out into the road, even though I have the right of way. I try to make eye contact with the driver to see if they recognize I'm crossing. They'll often nod or do something similar to signal that they're letting me cross.
A machine has to understand these social cues as well. It might even be helpful if the machine has a way to signal its intentions back to pedestrians.