
Ludwig, a code-free deep learning toolbox - beefman
https://eng.uber.com/introducing-ludwig/
======
jamesblonde
Technically, it is not code-free - it is declarative programming in YML. You
still have to specify your input_features, output_features, and training
architecture/specification. This is not a drag-and-drop UI (although, you
could probably layer one on top of this).

This should work well with a Feature Store, where features are already pre-
processed and ready for input to model training. With a feature store, this
could be like the Tableau/Qlik/PowerBI tool for Data Science.

~~~
foxes
Aren't most popular ml libraries declarative, e.g. keras, you aren't exactly
specifying how to transform the state (for example all the specific matrix
multiplication is hidden), rather you are declaring the logic of the
computation (you list out the layers).

~~~
jamesblonde
TensorFlow has mappers/reducers for feature engineering, and it's Python - so
you will inevitably do what you want in there. But yes, you have a point
there. How do you debug this thing, though?

~~~
w4nderlust
You have a good point about debugging. So far my approach has been to write
quite long and explanatory error messages (like this
[https://github.com/uber/ludwig/blob/9c9e5de56dcad89461c5d2ec...](https://github.com/uber/ludwig/blob/9c9e5de56dcad89461c5d2ecbd2d218e2273bdcf/ludwig/models/combiners.py#L170)
for instance) and as detailed documentation as I can and, at least when people
used it internally, that has been enough. But I'm definitely open to
suggestions here.

------
mmq
I have attempted to do something similar in 2017 [1]. A couple of issues I
noticed:

* The field is evolving quite rapidly, and so most options will require a lot of configuration, which IMO is not suitable for a declarative approach.

* It's hard to debug, eventually you will need to dive into the code at some point.

* Extending the library would mean touching the code anyways.

One major difference here, is that Ludwig tries to expose one interface, that
for instance users without python knowledge can use, and inside a company,
some machine learning engineers can extend the tool and do support by
debugging other users use-cases.

I think at some point such approach can be useful for a very limited mature
use-cases.

[1]: [https://github.com/polyaxon/polyaxon-
lib/blob/master/example...](https://github.com/polyaxon/polyaxon-
lib/blob/master/examples/configs_examples/yaml_configs/alexnet_flower17.yml)

~~~
w4nderlust
Your configuration file looks really good! Why did you stop pursuing it? To
your points:

* That may be true, but I think it also depends on the level of abstraction. Fon instance in Ludwig the encoder, although configurable, are pretty monolitic. It's a trade off with flexibility, in Caffe or in your configuration file you are super flexible in specifying each single operation, while in Ludwig that is abstracted from you, but the advantage is that, as long as you trust the encoder implementation, those encoders mimic papers / state of the art models and require much less configuration to be written in order to run (in many cases if you are happy with the default parameters you don't have to configure them at all). So if the field moves and a new text encoder comes in, one can easily add a new encoder to Ludwig too. It's a dangerous game to play catch up, but hopefully releasing it as open source and making it easy to extend may help spontaneous contributions from the community.

* debugging is an issue, that is true, I answered to another post about that, but again, it's a matter of tradeoffs. Debugging is kind of a nightmare in SQL too for instance, or is in TensorFlow (even if the tfdbg improved things a bit).

* extending requires coding, that is also true, but if you have an idea for a new encoder for instance, all you have to do is implement a function that takes a tensor of rank k ad input and output a tensor of rank z as output, and all the rest (preprocessing, training loop, distributed training, etc.) comes for free, which kind of a nice value proposition imho and let you focus on the model rather than all the rest.

Thanks for the interesting discussion!

------
Dolores12
That looks pretty cool. The official website [0] has more examples.

[0]
[https://uber.github.io/ludwig/examples/](https://uber.github.io/ludwig/examples/)

------
cowb0yl0gic
"identification of points of interest during conversations between driver-
partners and riders" Wait...what?! I hope this was an opt-in study.

------
master_yoda_1
It looks like Uber ai lab don’t have anything to show to management to prove
their existence. That’s why they come up with this kind of (you know what I
mean). The code-free toolbox is a myth.

------
__blacksite__
Personally I've been working on the same declarative approach over the past
couple of years at my company. This year I changed things to make heavy use of
scikit-learn's `make_column_transformer` and `ColumnTransformer` capabilities.

It's nice to see other, (much more) reputable engineering organizations taking
a similar approach and treating the construction of different predictive
models as an exercise in configuration.

Although in my solution I haven't looked at any DL models, and typically
default to feeding everything through XGBoost and performing a grid search for
the best hyperparameter config there. My product is basically focused entirely
around taking a raw dataset and a configuration file and producing an analytic
dataset off of which algorithms can be tested.

I'd be _really_ interested in hearing others' experiences with this type of
stuff.

~~~
thanatropism
I make little DSLs using TOML to further hide data wrangling and estimator
pipelining "hydraulics" in the context of a specific class of models.

So for example I'll have a section that says "data" and with variables "binned
= ['var1', 'var2']" and likewise for log/power-transformed variables, and have
some Python code turn that into column transformers. In other examples there's
even more custom logic hidden (some stuff from my master's thesis that I'm not
at liberty to discuss) so it's not just a matter of reusing scikit-learn.

I use TOML for little languages that are really flexible configuration files,
and YAML for situations where there may be multiple similarly-specified
objects (because the hierarchical syntax in TOML is somewhat obscure).

~~~
__blacksite__
I've never seen TOML, but I really like it. In my pretty simple pipeline, I've
just been using a JSON-like structure that I can easily import into Python:

    
    
      strategies = [
      {
          'name': 'column1',
          'kind': 'categorical',
          'strategy': 'ohe'
      },
      {
          'name': 'column2',
          'kind': 'continuous',
          'strategy': 'center'
      }
      ...
      ]
    
    

It has worked very well thus far. I received a request from a stakeholder a
few weeks back about building a new model using a slightly different target. I
told him it'd take a couple of days at least (it was a _very_ similar target),
but I had it completed (at least from a re-engineering perspective) in about 5
minutes. I simply changed the target variable config and removed any leaky
features after changing the target.

I'm convinced that this approach, coupled with configurations for tree-based
models like minimum samples per leaf and max depth, is the most efficient way
of building predictive models. Those configurations specific to tree-based
model software help to skirt things like the rule of five, etc., IMO.

~~~
thanatropism
That's basically my approach. TOML is chosen for readability and easy
maintenance, as well for the ability to cut and paste some chunks of python
defining constants etc. directly which makes the first steps much less
bureaucratic.

I'm looking into more general parsing that would allow me to define semi-
verbose little languages. In my heart Stata is still the gold standard for
rapid fire usability, even at the cost of idiosyncrasy; I'd like to further
make the engineering of, say, REST APIs more and more code/language
independent and more logically specified, since a lot of the new "data
science" crowd coming from stats and applied maths can't code their way out of
a Tequila Sunrise.

------
Odenwaelder
I think this is really cool. I shall try and see if I can make a Docker image
out of it.

------
minimaxir
After scanning the documentation, it's an even higher-level Keras which is
good, but in order to make smart use of it, you really need to know all the DL
tricks, which makes the push toward nontechnical users misleading.

~~~
w4nderlust
You may be right about that, but also it depends on the requirements that you
have. Ludwig gives you a lot of options for those tricks, like for instance
gradient clipping or regularizers, or learning rate and batch size scheduling,
but those things usually are useful for squeezing that extra 3% performance,
and ieven in those cases, having them already implemented is an advantage. My
personal experience is that in many cases doing the first step, getting 80% of
the final performance, is enough then to convince someone of the value of what
you are doing and then you can spend time later improving over it, and with
this regard, Ludwig gets you from 0 to 80% really quick.

------
afrnz
Looks cool! Is there any particular Ludwig the name is referring to?

~~~
pixelpoet
There are lots of us unofficially taking credit. RIP any googleability I might
have had before...

------
mark_l_watson
Looks good. I manage a machine learning team and we write custom models. For
data science teams with limited engineering support, Ludwig looks very good.

------
monkeydust
Looks good (non developer product person here)... Could this be used for time
series data predictions?

------
hustle1
Sorry for the laymen question, but, is this language independent or only for
English?

~~~
w4nderlust
Text is just one of the possible types of features supported in Ludwig. For
those features, no, you can train models on any language, with a couple little
caveats: you can train both character based models and word level ones. For
character ones you don't really need anything, and you can train also of
languages without explicit word separation like chinese. For word based ones,
you need a function to separate your text into words. By default a regular
expression is used, which generically kinda works for most languages that
separate words, but the tokenizer from the spaCy library can also be used.
SpaCy provides models for a few languages, and at the moment we are wrapping
just the english ones, but it would be extremely easy to add also the other
ones supported in spaCy.

------
graphememes
The polarity of comments between this post, and the driver post, scares me.

------
russfink
How might one do speaker (voice) recognition?

~~~
option
checkout this
[https://github.com/NVIDIA/OpenSeq2Seq](https://github.com/NVIDIA/OpenSeq2Seq)
\- a similar framework but with voice recognition and voice synthesis.

