
Show HN: Lambdo – Feature engineering and machine learning together - asavinov
https://github.com/asavinov/lambdo
======
ericand
> Feature engineering is a mechanism of creating new levels of abstraction in
> knowledge representation because each (non-trivial) feature extract and
> makes explicit some piece of knowledge hidden in the data. It is almost
> precisely what deep learning is intended for. In this sense, feature
> engineering does what hidden layers of a neural network do or what the
> convolutional layer of a neural network does

Very intriguing and thoughtful statement. I hadn't ever thought of it that
way.

~~~
tvladeck
This is essentially repeating your quote, but an "aha moment" clicked for me
when I read that what successful neural networks are basically doing is such
good feature learning that the problem can be solved by a simple linear model
in the end.

E.g. if you have an N-layer neural network, N-1 layers are doing feature
learning, and the Nth layer is a simple {logistic, multinomial/softmax,
gaussian, poisson, ...} model

~~~
halflings
Related:"the kernel trick" [1].

"The kernel trick avoids the explicit mapping that is needed to get linear
learning algorithms to learn a nonlinear function or decision boundary."

(what powers Support Vector Machines, the neural networks of the 90s, and
still alive and kicking today)

[1]
[https://en.wikipedia.org/wiki/Kernel_method#Mathematics:_the...](https://en.wikipedia.org/wiki/Kernel_method#Mathematics:_the_kernel_trick)

------
asavinov
Lambdo is a workflow engine which simplifies data analysis by combining in one
analysis pipeline

* Feature engineering and machine learning: Lambdo does not distinguish them and treats them as data transformations

* Model training and prediction: both feature definitions and ML models can be trained as part of one workflow

* Table population and column evaluation: workflow consists of nodes of these two types. This makes it similar to Bistro: [https://github.com/asavinov/bistro](https://github.com/asavinov/bistro)

Lambdo is intended for the following use cases:

* Numerous derived features with parameters derived from the data

* Regular re-training is required by using the same features as those to be used during prediction

* Time series analysis because it is where the quality of derived features is especially important

* Customization via user-defined Python functions

------
kmax12
I definitely see the need for packages like this. So much of a data
scientist's time is spent on feature engineering, but there are relatively few
tools out there that are trying to improve that step in the process compared
to tools for the modeling step.

I see this tool as a something that can help with the deployment piece of the
feature engineering. As things stand, it's "easy" to package and deploy
modeling code, but much harder to package up your feature engineering
workflow. In part because there is no agreed upon standard for developing
feature engineering pipelines.

I'd be curious how this could be combined with a library like Featuretools
([http://github.com/featuretools/featuretools/](http://github.com/featuretools/featuretools/))
which helps automate the discovery of features, but currently has less
functionality related to deployment.

(full disclosure: I work on Featuretools)

~~~
pplonski86
I think there is a need for deployment approaches for both: feature
engineering and modeling. For example, please consider feature scaling and
case when the mean of the feature is drifting. Then the feature engineering
and probably ML model needs to be updated. I'm not aware of ready solutions
for such problems.

------
mooneater
Things I always want when looking at something new:

\- Where does it sit in relation to other data science components?

\- What does it integrate with, and what is it agnostic to?

\- Smallest self-contained use case? I see some examples in the repo readme
but they are not self-contained so its harder for me to imagine its use

~~~
srean
Your comment reminds me so much of my PM who asks questions like these to give
an appearance that he is putting in some serious effort and energy to
understand. Add a few words like 'value', 'leverage' and 'resonate' and the
impersonation would be pitch perfect .... Looks up contact detail oops spoke
to soon should have guessed.

~~~
mooneater
> 'value', 'leverage' and 'resonate'

You could not be more wrong about my perspective though.

My question is from this perspective: I use tensorflow, keras, numpy, scikit-
learn, pandas, and Im looking to understand how I would integrate these.

~~~
srean
Oh I know, from your past comments {you are more famous than you think :}

~~~
mooneater
Not sure what you mean by this vague statement.

~~~
srean
Nothing sinister. I take interest in ML related threads so I remember your HN
handle.

