
Show HN: Smart Fruit – A Python schema-based machine learning library - madman_bob
I&#x27;ve made a small Python library, designed for quick-and-easy prototyping of machine learning models. It&#x27;s built on top of scikit-learn, to serialize and deserialize data from the forms you&#x27;re likely to have, to the format used in scikit-learn.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;madman-bob&#x2F;Smart-Fruit" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;madman-bob&#x2F;Smart-Fruit</a><p>It&#x27;s pretty bare-bones at the moment, but I thought I&#x27;d see if there was any interest before spending too much time on it.<p>Let me know what you think.
======
IanOzsvald
I've recently written a related library - given a DataFrame it'll run
sklearn's RandomForest to check which columns predict other columns. The goal
is to learn which relationships exist within a DataFrame. Typically in the
exploratory process in machine learning we want to learn how the data holds
together - this tool helps with that discovery exercise. It'll auto-
LabelEncode text and allows classification or regression. There are two
example Notebooks (Titanic & Boston) to show what it is doing. Correlations
(Pearson, Spearman, Kendall) can also be calculated. The RandomForest result
can show non-linear relationships that aren't exposed by correlations.
[https://github.com/ianozsvald/discover_feature_relationships](https://github.com/ianozsvald/discover_feature_relationships)

------
pX0r
I liked the boldness of this idea. But 'something' needs to select the sklearn
model, tune its hyper-params - how long can you keep it all hidden away from
the user?

The training phase can be considerably long. Have you thought of some kind of
an async wrapper that Smart Fruit might provide or will the user be expected
to code it up?

This is more of a user experience comment - when the interface is designed to
feel as if one is interacting with a DB / ORM the user may come to assume that
the outcomes will be deterministic... While the returned results will remain
deterministic given the training data, model and hyper-parameters remain the
same - it won't feel as deterministic when either of these is updated... I am
not sure if I communicated my concern clearly. I am trying to understand who
the intended end-user is, of this package...

~~~
ghukill
I would propose a potential user as someone interested in some of the meta
considerations and patterns of statistical reasoning, aka machine learning.
There are is a _vast_ amount of particulars the second hand on my watch
operates (e.g. vibrating quartz, digital), but I can use that mostly reliable
device to investigate higher level phenomenom, like calculating distance of
planets by timing their movement. This library opens a direct line to these
algorithims such that one might intuit, and apply, their high level behavior;
as I could not time planets if consumed with the fidelitity and reliability of
resonating quartz, it would slow my ability to explore this kind of reasoning
if concerned with the minutiae.

That said, all points taken. If this sparks interest in someone, as is stands,
it would be on them to dig in to all the considerations you've outline.

------
ghukill
I love it. Pasted in the column headers to `iris.data` from the Iris website.
Voila, up and running per instructions on Github. For prototyping / exploring
ideas, for the syntactical layman, but conceptuallly familiar, what a boon.

------
zitterbewegung
This looks like a good porcelain to sklearn. Many including myself find it
intimidating at times.

