
Building Thousands of Reproducible ML Models with pipe - datadem
https://data.blog/2019/01/08/building-thousands-of-reproducible-ml-models-with-pipe-the-automattic-machine-learning-pipeline/
======
dinedal
No source link? No access to this "magical pipe"? Is this a showcase of
proprietary software?

~~~
achernik
looks like they are "introducing" some internal component
[https://mobile.twitter.com/automattic/status/106436688085984...](https://mobile.twitter.com/automattic/status/1064366880859844608)

~~~
nerdponx
Then what's the point? Low-key flex to try and attract talent?

------
kelvin0
I started taking the Coursera ML class. Reading this article, something jumped
at me:

[https://datadotblog.files.wordpress.com/2018/12/Screen-
Shot-...](https://datadotblog.files.wordpress.com/2018/12/Screen-
Shot-2018-12-22-at-18.23.34.png)

It mentions how it's 'impossible' to separate the data points in cartesian
coodinates. Isn't logistic regression exactly the use case for this? Thus
making the transformation irrelevant?

Anyone with ML experience have an opinion on this?

~~~
nerdponx
No, linear regression does not imply separation.

Yes, this is why we use regression, soft-margin SVM, etc. instead of hard-
margin SVM. Because perfect linear separation is unrealistic.

~~~
kelvin0
Please note I wrote 'Logistic Regression' and not 'Linear Regression' (as you
seem to think).

Logistic Regression based classification (with quadratic theta parameters)
would seem to certainly be able to handle the cartesian case (without having
to resort to convert into polar coordinates).

~~~
nerdponx
I meant to write "logistic", but it's worth noting that logistic regression is
a linear model from which you derive a linear decision boundary.

And yes, it _can_ handle it, by finding a "optimal" boundary according to a
criterion other than "is it separated or not?". But that's not the point. The
data remains inseparable.

And yes, while logistic regression can technically handle this case (by
returning a solution and not blowing up), it will perform poorly unless you
transform the data, because the decision boundary is still linear.

~~~
kelvin0
Really appreciate your feedback, I'll certainly look into your claims in the
next few days.

What's your background? Have you 'been' in ML long? Feel free to give me as
much details as you feel comfortable with.

Thanks!

