
Show HN: MLJAR – build machine learning models without coding - pplonski86
https://mljar.com
======
pplonski86
Hi HN,

MLJAR is an automated machine learning platform. With mljar, you can train
great machine learning models without coding. It works with binary
classification and regression tasks. It can do hyperparameters tuning and
model selection. The preprocessing to deal with missing values and categorical
columns is available. All models trained in the service are by default
deployed in the cloud and can be accessed be REST API or python or R APIs.
User can also download a model and use it locally.

Right now it is offered as a SaaS. I'm working on the open source version. The
AutoML engine is already open source [https://github.com/mljar/mljar-
supervised](https://github.com/mljar/mljar-supervised)

I've compared mljar performance on binary classification tasks with auto-
sklearn and H2O and it works very well
[https://github.com/mljar/automl_comparison](https://github.com/mljar/automl_comparison)

In the long term, I would like to connect machine learning with databases.
User will be able to train machine learning models by writing a SQL query to
database. After the best model is trained (with AutoML of course!), all new
rows that will appear in the database will be used for computing predictions.
But first I would like to create AutoML platform :)

~~~
mailshanx
Congratulations, I read your comments on the Google Auto ML thread a few days
ago with great interest.

If Auto ML has commercial value, why are you open-sourcing it?

I think it would be a bad idea to open-source your solution, unless you plan
on competing on services, rather than an Auto ML product.

~~~
pplonski86
Thank you! I believe AutoML has huge value and can be a new standard in ML. I
would like to see the adoption of AutoML - that's why I'm open sourcing it.
Otherwise, I would need a lot of money for marketing and sales to make it
popular.

Before open sourcing I was looking at Metabase and Redash solutions, and I was
very impressed with their business model - I would like to achieve something
similar. The goal is to be ramen profitable.

~~~
mailshanx
Look at DataRobot - they are massively successful at this point, and I think
the key is that they didn't release the source.

Open sourcing the core solution hugely dilutes your value proposition - I hope
that you will reconsider your decision.

~~~
pplonski86
DataRobot has over 220M in total funding. They have resources for sales.
Though, I will think about it.

------
gambler
I wish someone built something like this for generic classification. You give
the system a bunch of folders. Each folder is a label containing corresponding
samples. The system then creates a replica of the folder structure with no
content. Each time you give it a new piece of data it places it in one of
those newly created folders.

This is an interface truly anyone could use. Just call it "intelligent
folders" or something. The user doesn't even need to know which algorithm it
uses. Just split sample into training and test data at random and choose the
algo that give best results.

I was working on this for text data, but then switched jobs and don't have
energy to make this in my spare time right now.

~~~
_mme
I built something similar for images. It's command line based, so one could
simply create a bash script to copy data to a folder based on a prediction.

[https://github.com/vergeml/vergeml](https://github.com/vergeml/vergeml)

------
pk78
Aren't there many similar tools out there? like IBM cloud, Google's AutoML
etc., Why did you decide to create something like this? I'm interested in
hearing your opinions on what additional/new values does your service provide
over the existing similar SaaS tools.

~~~
pplonski86
I've built mljar for myself. When I was starting working on it (2016) there
were no such solutions or they were very expensive (over 50k/year).

I want to have a service where I can train many models and be able to check
every model (for example check learning curves). I want a solution that can
train many models in parallel in the cloud, so I don't need to heat my laptop
and dont need to wait a lot. I think I achieved this. Is it better than other
solutions? Hmmm, it is very similar to others (at the end, they all train some
ML models), but ...

After creating the AutoML solution I come to the conclusion that AutoML is
broken: [https://pplonski.github.io/automatic-machine-learning-is-
bro...](https://pplonski.github.io/automatic-machine-learning-is-broken/) \-
even if you can easily train ML model (good or bad - doesn't matter), there is
still hard to use/apply machine learning in real life.

Right now, I think that AutoML is just a brick in the solution that should
offer automatization. There should be a service similar to Zappier but with
machine learning - you can join your data and services with ML models which
live in your data ecosystem and use ML for automatization.

~~~
pk78
I did not realize you started working on this from 2016. I guess you could
have easily commercialized this then since were way ahead of the competition.
But I do understand that it's hard to offer this as stand-alone service.
Anyways, good luck with your project!

