To a rough approximation they differ in the theorems/results they care about. In...

To a rough approximation they differ in the theorems/results they care about. In statistics one would care about consistent estimate of parameters -- as sample size goes to infinity and the model is such and such here is an estimator that will converge (according to some interesting mode of convergence) to the true parameter generating the data.

In ML one wouldnt care much about recovering the parameters. The results/theorems of interest would be that with large enough samples the predictions and the new data will converge (according to some interesting mode of convergence). If this comes at a cost of doing poorly in terms of parameter recovery, ML wouldnt be bothered.

According to ML the cycloids and epicycloid based geocentric model of planetary motion would be perfectly acceptable.