There we go again. These assertions by Pearl discredit him. Yes, mathematics can capture this perfectly well, it's the implication relation in a nutshell: X → Y means that X causes Y but we don't know whether Y causes X.
I guess, strictly speaking, implication doesn't say anything specific about causality, so you could conceivably claim that A → B ("If A, then B") is not a causal relation, because even when B always follows A we can't be assured that A is the cause of B (e.g. two alarms might go off one after the other always), but that is really splitting hairs. The semantics of implication are broad enough that they can cover both strict causality relations _and many more besides_. The intended meaning can always be made explicit in language where it's not clear from the context ...and that's what Pearl is most likely complaining about. He basically wants a stricter interpretation that can only cover causal relations so that the meaning doesn't depend on the context. Because!
The whole point here is that Pearl wants his causal reasoning framework to be accepted by everyone, even if it's not really offereing anything radically new.
You do make a good point about his agenda though, out of the entire world it's hard to imagine someone more bias to wanting to see causality be the new machine learning topic du jour!
Full disclosure: I work with Meta-Interpretive Learning, a class of algorithms that can do exactly that, for my PhD. See  for an overview and  for a more detailed explanation (I am not an author in any of those papers). MIL can go a lot further than deriving implications in propositional logic- it learns first-order logic theories, which should cover professor Pearl's expectations quite adequately.
For instance, in a MIL setting it's perfectly possible to learn a theory for e.g. cause_of(A,B), from examples of what A causes what B.
As to learning counterfactuals, my thesis advisor has published papers on a Robot Scientist, a machine that carries out scientific experiments from start to end: it forms hypotheses and carries out experiments to test them, then refines its theories, etc .
That last paper was published in Nature- which makes it even harder to see how Pearl can claim that nobody knows how to do this sort of thing in machine learning. I understand if people just want to ride on the coattails of deep learning and only talk about that in the press, but it's a bit annoying all the same, to see well-established results ignored by someone so learned.
 Meta-Interpretive Learning: achievements and challenges
 Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited
 Functional genomic hypothesis generation and experimentation by a robot scientist
It looks like MIL would generalize quite well to bayesian networks, a perfect bridge between his past research and what he's depicting as the future of ML research.
Interested in persuing this further? See http://www.michaelnielsen.org/ddi/if-correlation-doesnt-impl... and, I suppose, read Pearl's latest book.