Hacker News new | past | comments | ask | show | jobs | submit login
Amazon SageMaker Autopilot (amazon.com)
140 points by jpetrucc 2 days ago | hide | past | web | favorite | 43 comments

Google's AutoML produces black box models that are only available over a network call. This services seems to produces downloadable models, and a notebook with Python code that creates the model. If that is the case, this is substantially better than GCP's offering.

AWS consistently releases similar products after GCP... but they are much more well-thought-out, as AWS has to support them indefinitely...

I’m not sure why the other post was marked dead, but AutoML models can be exported - you can export them to TFLite format[0] and then run them on edge devices, such as a SparkFun Edge[1] or a Coral SoC / board.

[0]: https://cloud.google.com/vision/automl/docs/export-edge#expo... [1]: https://www.sparkfun.com/categories/tags/tensorflow

You can also export your model as a server within a Docker container:


GCP supports training scikit / xgboost/ Tensorflow models and exporting them to Cloud Storage for use elsewhere.

More here: https://cloud.google.com/ml-engine/docs/scikit/custom-pipeli... https://cloud.google.com/ml-engine/docs/algorithms/xgboost-s... https://cloud.google.com/ml-engine/docs/tensorflow/getting-s...

Creating custom notebook containers is now also supported with AI Platform: https://cloud.google.com/ai-platform/notebooks/docs/custom-c...

Disclaimer: I work for Google.

Right, but your automl solution is using Tensorflow, no? (even for automl-tables).

I agree, it was good to see the option to generate Python code that can be reused, tweaked, etc.

Can we stop with the Google FUD re shutdowns? Browsing hacker news to any post about Google at any time in the past year you'll only ever see discussions on how "google shuts things down".

I get the trope, you want upvotes, it's an easy joke, and you can say it without real affliction at this point, but can we instead discuss the technical merit at hand — in this case AWS?

In this specific case, as noted by several other comments, you can also export GCP models.

Unfortunately what you consider to be a joke for upvotes, is something I see as a major business threat, and a downside for any future deal with google.

Would be useful to have a comparison to Google's AutoML Tables: https://cloud.google.com/automl-tables/docs/features

Wait, really, I just upload tables of input and the expected output data and it tries various models for me?

Any other places do this?

Azure Machine Learning has an Automated ML service that does this: https://docs.microsoft.com/en-us/azure/machine-learning/serv...

Pay-as-you-go compute pricing means this is $$$, which I'm guessing is the tradeoff. But hey, classic time/money tradeoff for some level of automated (non-customized) performance -- should be good for a bunch of common use cases that would otherwise require hiring a human, like many generic technologies.

I am not 100% sure but I think that is also exactly what datarobot.com does as well. I don't work for them, and haven't used their tools, so I have no idea how good they are, have just seem some demos before.

You say that like that's the easy part of ML.

I think you're misunderstanding the comment.

I read "just" as surprise it's this easy from the end user perspective, as opposed to the actual technology.

I assume this won't do things like add convolutional layers if you give it pixel or signal data, right?

Like is this just adding standard layers to a neural net, maybe trying a few activation functions, fiddling with the number of layers and just seeing which give the best results?

No. auto ml for DNN is a different ball game (Also known as Neural architecture search).

If I read it correctly this is using traditional "classical" ML models (e.g. XGBoost, GBM and even linear models).

Pretty neat, but unfortunately I cannot see a lot of business cases for this. I haven't worked with a ton of models, but especially if you are not dealing with pretty much solved problems like classification, the results won't be great.

First of all, which models are going to be used? How many combinations of hyperparameters are going to be tried? The combinatorial explosion is certain.

And then if you don't know how to prepare the right dataset everything is in vain.

Not really a critique to AWS, but to AutoML in general.

EDIT: After a deeper read it seems it's regressions on textual data only.

In general, If you're interested in looking into AutoML landscape and its adoption here's a Kaggle kernel based on recent Kaggle Survey https://www.kaggle.com/nulldata/carving-out-the-automl-niche...

Do any of these autoML offerings have a way to use the generated model in JavaScript/nodejs? I know of [sklearn-porter](https://github.com/nok/sklearn-porter) which transpiles scikit-learn models to JavaScript among other targets, but not sure if this nicely connects with any of the solutions discussed.

If you use GCP AutoML service, you can export AutoML Vision edge models (for image classification and object detection) directly for TensorFlow.js for use in browser or Node.js. Please see this: https://cloud.google.com/vision/automl/docs/tensorflow-js-tu...

Since this produces a notebook with python code, you might be able to tweak it so the final model works in tensor flow.js. But depending on model size / hardware requirements, it might be better to just make a network call.

You can use web-grpc to generate js api from grpc interface. There is also tensorflow.js .

I wonder how this compares features and price to similar products from H2O.ai’s driverless ai, datarobot, bigsquid, etc.

H2O's main pitch is that except Driverless they're open source. Data robot seems to have got a strong Salesforce for each domain and driving sales. In terms of features, Google cloud AutoMl seems better as it makes the entire productionising part easy

Any experience with the datarobot tool? The few demos/YouTube’s I’ve seen, it looks really slick. Lots of pre-built performance evaluation, model deployment/monitoring tools, etc. hard to find any pricing info on the web.

We recently purchased dataiku and I've found it to be really quite handy.

Any info on pricing? I can’t find anything on their website. Don’t like taking to sales goons.

It looks pretty slick. Going to install the free version to check it out.

This is rather interesting development. Just last week I saw similar feature in IBM Watson being demoed on IBM Cloud. And now AWS Sagemaker has this capability.

Does this mean that going forward, for small-to-mid size IT companies and Corporates, the demand for Data scientists and ML developers would decrease?

My guess is it will, on average, increase demand.

ML is finicky; the model training pipeline itself isn’t the hard part, compared to setting up for the right question and examples used to train the model.

For small-to-mid firms, data scientists are super expensive. And they might only deliver a valuable project every six weeks (or, at bigger firms, every year...).

If automl increases their productivity, suddenly they don’t look so expensive.

Right. It will increase demand, since many area of the business will start using machine learning.

However, the job of the DS will move toward the business side (e.g. req gathering, data gathering and prep) and less about the modeling itself.

Also, there are a lot of data issues that are still in the releam of humans (e.g. imbalance data, correct labeling, etc).

good point!

How does the algorithm analyze the results and look for overfitting?

Built-in regularization, probably, plus cross-validation. These techniques aren't new; they're included in a number of ML libraries already - just not at this level of automation.

Probably the same way a human would do it: By splitting the dataset into training, testing and validation buckets.

Probably some rule based expert system to analyze results . Looking for overfitting could be done by creating validity set. So you would have test, train, and validity .

This is the second science-fiction-level announcement from Amazon in as many days. Either they're about to take over the world with effective AGI and Quantum Computation, or they're being a bit silly.

What's science fiction about doing auto-machine learning? It's basically automation of a lot of the tasks that data scientists do manually to build simple ML models.

That is definitely a technically challenging problem but not an impossible one.

If anything AWS is late to this.

Right, AutoML != AGI.

In case your comment is about the frequency of announcements rather than the subject I just wanted to mention AWS is having their main yearly conference this week so they will be announcing many new services and features.

This is not new technology. Th is is AWS catching up to the competitors. There's Google AutoML, AutoKeras, Azure Automated ML, IBM AutoAI, H20, and others...

Do any of those services reliably give useful results?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact