
AutoMLPipeline – Create and evaluate machine learning pipeline architectures - bwidlar
https://github.com/IBM/AutoMLPipeline.jl
======
sgt101
This is a lovely bit of programming, and showcases how amazing Julia is. BUT :
standard warning, pouring data into a machine and looking for the best results
according to your test of choice is highly unlikely to yield a good real world
outcome. You might get something that appears to be useful for a while, but
there is every reason to believe that it will blow up in your face further
down the line.

You are literally playing with things that you don't understand! Don't do this
kids!

~~~
willj
Can you expand on this? If you’re monitoring for data drift and retraining
every so often after deployment (not just “set it and forget it”), what are
the problems that can happen?

~~~
wrkronmiller
WARNING: NOT AN ML EXPERT.

I believe this falls under the category of "data snooping" wherein you are
effectively creating a model-of-models and increasing your degrees of freedom.
That increased level of complexity/number of DoF means you are far more likely
to over-train.

You are more likely to pick a model with no predictive power that "happened to
be right" about past data.

------
scottlocklin
I'm confused, possibly because I never invested the time in Julia: is this
some improvement over the standard Julia data pipeline? It looks like a
Hadleyverse flavored R pipeline.

~~~
ddragon
It's seems like a new competing approach in the ML space. The standard julia
data pipeline right now usually involves either directly using an independent
package for each step (or rolling your own since Julia has strong native
support for fast data/number manipulation) or using a package that combines
all those functionalities like MLJ.jl [1] for ML and Queryverse.jl [2] for
data manipulation and visualization. The advantage is that since Julia lang
focuses on composability, those frameworks can easily share core functionality
so library creators don't need to reinvent the wheel and all frameworks can
benefit from each other works.

[1] [https://github.com/alan-turing-institute/MLJ.jl](https://github.com/alan-
turing-institute/MLJ.jl)

[2] [https://www.queryverse.org/](https://www.queryverse.org/)

------
isusmelj
Haven't seen the |> operator before. Is this Julia specific?

e.g. used here: pdec = @pipeline (numf |> pca) + (numf |> ica)

~~~
truculent
I believe that it appears in a few other functional languages. Off the top of
my head:

* F# and Elm have the same operator

* Haskell’s lens library (I think) has the same one. In the standard library there is `$` which functions as a “reverse” pipe

* R’s magrittr (part of the tidyverse which almost functions as an enhanced standard library) has `%>%` which works similarly

------
pizza
That's a nice readme

