Cloud AutoML: Making AI accessible to every business

cdl · on Jan 17, 2018

I think "making AI accessible to every business" is a bit of stretch. While there's no doubt that the AutoML suite will bring tremendous benefits to businesses with recommendation and speech and image recognition needs, it falls short of providing more useful insights such as those gleaned by association rules, clustering (i.e. segmentation), and general probabilistic models.

I think that if AI is to be accessible to every business then it will deliver insights rather than the machinery to produce the insights. This is especially true in the context of small businesses.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

Can you say more about the specific insights you'd like us to provide? The more specific the better :) Happy to see what we can do!

cdl · on Jan 17, 2018

I'm not personally looking for the insights myself, just an observation from working with SMBs trying to leverage data science more generally to improve their businesses.

namikaze · on Jan 17, 2018

Google and others are looking ahead into the future to make AI a commodity. This is just another small step towards that direction.

cdl · on Jan 17, 2018

And I'm not discrediting the work or the fact that this is progress... I just don't think that the title of the post is entirely accurate. Mostly with respect to the "every" part.

geocar · on Jan 17, 2018

> it falls short of providing more useful insights such as those gleaned by association rules, clustering (i.e. segmentation), and general probabilistic models.

I would also hesitate to build a business relying on Google for those things since I'd likely be competing with Google's actual moneymaker.

kmax12 · on Jan 17, 2018

Despite the claim to make AI accessible to every business, this release is fairly limited in that it only applies to images. We will have to see how they extend it going forward. Given the technology it's based on, I'd expect things like text, audio, videos to come next.

However, I'm curious if they plan to support structured/relational datasets which are definitely something every business needs. In Kaggle's 2017 State of Data Science [0] survey, data scientists said they spent 65% of their time using relational datasets vs 18% for images. Given that Kaggle is owned by Google, this must be something on their radar.

For those data scientists, I maintain an open source library for automated feature engineering called Featuretools (https://github.com/featuretools/featuretools). For people interested in trying it out, we have demos (https://www.featuretools.com/demos) to help you get started.

[0] https://www.kaggle.com/surveys/2017

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

We're just getting started! Stay tuned for lots more AutoML goodness.

innagadadavida · on Jan 18, 2018

Recently YouTube started pulling off kids eating Tide pods video. Can AutoML figure this out or do they use manual labor to do it? Couldn’t this have been done before? I mean can it detect stupid/dangerous videos automatically and pull it when it threatens to become an epidemic?

jorgemf · on Jan 17, 2018

> data scientists said they spent 65% of their time using relational datasets vs 18% for images

It will be interesting to see the trend over years. One year doesn't say anything about the trend in the industry.

TuringNYC · on Jan 17, 2018

> data scientists said they spent 65% of their time using relational datasets vs 18% for images

Part of the reason is that use cases are driven by limited definitions of ML or "AI" -- AutoML for example note they are "Making AI accessible to every business..." but they are just managing a small part of one type of ML (images with conv nets and res nets.) A senior exec who reads this might develop a narrow view of what AI is.

A general annoyance is how obvious techniques like regression, tree models, Bayesian models, etc on tabular data are so ignored while everyone gets hyper-obsessed over GANs or whatever. Almost 90% of low-hanging-fruit I see can be captured with simple classic ML applied to tabular data.

jorgemf · on Jan 17, 2018

90% of the time companies apply regression to problems because is the tool they know and works well for a lot of problems. But I think that has changed with deep learning. Deep learning works well for a lot of complex problems (as images). So now companies can solve problems that before couldn't solve. That is why I think it is relevant to see the trend.

zengid · on Jan 17, 2018

I don't think this is 'democratizing' AI but rather centralizing Google's control of a utility service.

ovi256 · on Jan 17, 2018

You would have been right if ML would be as accessible as your hypothetical "utility service" implies. It is not, and getting farther from it.

If you compare ML to electricity, we're still in the stages where a few players have found that electrifying their manufacturing plants makes sense. Small players can't afford he investment in machinery and skills. Maybe when the machinery is hidden behind a "utility" provider (which would also bring down the skill level) they will.

zengid · on Jan 17, 2018

What makes it inaccessible? Are GPUs prohibitively expensive? Are pretrained models unavailable? Is the software source code closed off?

eggie5 · on Jan 17, 2018

I would say training a CNN from scratch or even fine-tuning one takes a lot of domain knowledge and best practices which often are not standardised yet. Besides, we don't even know why they generalise in the first place! See: https://arxiv.org/abs/1611.03530, https://arxiv.org/abs/1711.11561

frahs · on Jan 17, 2018

I think the limit is developer talent. It's hard to find people with the right background to train an AI model.

whoisjuan · on Jan 17, 2018

ClarifAI has been doing this for 4 years... I really like their service and they have a fair price. It would be interesting to see how it compares (on quality and pricing).

TuringNYC · on Jan 17, 2018

Curious if anyone from the Product or Tech team for AutoML could describe how this differs from MetaMind (sadly, now subsumed into SalesForce.) Richard Socher seems to have achieved some of this in 2015 with MetaMind (...or...perhaps he just had a lot of turkers behind the scene hand-crafting networks to fit data drops...)

jorgemf · on Jan 17, 2018

I don't think they have been doing it for 4 years as AutoML was quite recent. The idea maybe was there before but noone publish any paper about it before.

Bear in mind that this service creates the architecture of the model for you. I think ClarifyAI has a predefined model that is fine tune with the data, which it is not even similar.

strin · on Jan 17, 2018

Democratization should allow users to "own" their models. This is not the case in Cloud AutoML. Users cannot download their models and host them elsewhere. This dependency means Google can have control over the business's AI capabilities.

jorgemf · on Jan 17, 2018

Does it really say it somewhere? As far as I know when you train TensorFlow models they are stored in gs, I thought this would be similar. Otherwise I don't know how they are planning to integrate this with the api they have to upload your models and make requests.

benkarst · on Jan 17, 2018

Google wants to monopolize, not democratize, AI.

I wonder if Google tests for Doublethink skills before you can get hired there now.

manigandham · on Jan 17, 2018

Perhaps you mean monetize instead? They certainly aren't the only ones who can do ML.

benkarst · on Jan 18, 2018

Monetize and monopolize are synonymous in Silicon Valley. Read Thiel's Zero to One.

sharemywin · on Jan 17, 2018

If I collect a bunch of data and train a model is the data/model mine or theirs?

what if in the future they change their minds and decide to change the TOS? did I just build on top of quicksand?

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

The data remains yours and the model is yours - eg if you delete your account, the data and model goes away (think of it like the data or model you stored on a VM). However, what I think you're looking for is to be able to actually download the model, and I'm afraid that's not possible.

Are you looking to avoid lock-in? Or something else?

RubenSandwich · on Jan 17, 2018

I can't speak for OP, but I'd expect to download the generated model otherwise my business is locked into this service.

If Google is worried about people just generating models and running, then charge for training time/resources.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

That's fair - the reality is that it's less about the worry of people "taking of models and running it themselves", it's more about the amount of custom stuff that has to go on behind the scenes in order to enable this kind of model exploration and training. We're exploring all options but, yes, you'd be committing to the platform (for now).

If you're interested in a portable solution, may I recommend (extremely selfishly) Kubeflow[1]? We don't provide AutoML, but we do provide a very portable solution for running an ML stack.

[1] https://github.com/google/kubeflow

chimtim · on Jan 17, 2018

Then what is "Auto ML"? It sounds like just another cloud service.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

Correct, it's a cloud service, based on the research Google published for model exploration[1][2]. There are research examples today where this service provided better models than humans were able to achieve by hand or with genetic algorithms (models trained faster and/or with better error rates)[3]

[1] https://static.googleusercontent.com/media/research.google.c...

[2] https://blog.acolyer.org/2017/10/02/google-vizier-a-service-...

[3] https://arxiv.org/abs/1712.00559

charls_Aws · on Jan 17, 2018

Excuse me for saying this, but you don't have to put that disclosure sentence with every comment ...(personally, I find it kind of irritating)

TheIronYuppie · on Jan 17, 2018

My worry is that people deep link to a comment and think that I'm astroturfing. Trying to balance spam vs. full disclosure.

Note I left it off of this one. :)

rasmi · on Jan 17, 2018

Compared to ML Engine, where you provide the labeled data and model code, with AutoML, you just provide the labeled data (with your custom labels for whichever domain you may be working in) and AutoML builds the model for you.

jmngomes · on Jan 17, 2018

Similar to Amazon's ML: https://aws.amazon.com/aml/details/

Eventually useful for building a very basic model to serve as a benchmark for a real model.

chimtim · on Jan 17, 2018

This does not sound very useful. The entire market and AI consulting is for that last 5% of accuracy. Is there a reason why Google is investing so heavily in this space? Perhaps a simple model/UI is a good starting point for many customers, and many of them end up renting the classic cloud after they gain some expertise. And the rationale for all this PR is that Google does not want to miss these customers to Amazon

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google at Kubeflow

There are a few different reasons:

1) Most companies have very limited skill in building advanced models. While many folks are trying to achieve 5% or better accuracy, MOST folks are trying to achieve ANY accuracy (since they have nothing)

2) Many problems are not very complicated and do not require a custom model. From the blog post, the "cloud" example only requires a small amount of changes to classify for a specific domain - to have to build an entirely new model for that, or train on MM of images seems like overkill

3) AutoML (often) is better than humans already[1]. So if you want to achieve that 5%, you MIGHT need to use a machine anyway.

[1] https://arxiv.org/abs/1712.00559

chimtim · on Jan 17, 2018

Exactly, most non-IT companies have very limited skills in this area. This is why they outsource. As an analogy, when I want to furnish my office, I look for an interior decorator who takes care of ordering furniture etc., within a budget. I don't screw around with a 3D printing API to make chairs and tables. It is too low level and not my core expertise.

And second, your comment that most folks achieve any accuracy is strange. These are not real businesses, they are mostly developers and hobbyists trying to learn. These folks sign up for kaggle and poke a few scripts and view half a class on coursera on ML. They are not real businesses and they have no money. Most of the real businesses are hiring startups or large companies that hire data scientists with domain expertise like in oil, manufacturing etc. (the IBM model). ML as an API is a disaster as a business model.

Also, AutoML is not even close to being better than humans even for a specific problem (across datasets). These click-bait titles don't fly outside of AI conferences.

TheIronYuppie · on Jan 17, 2018

I think we may be talking to different customers. I talked with ~200 customers last year, and the most common question was "What do I use ML for?" and the second most common was "How do I get started?"

Put another way, the average customer has ~zero ML usage today. I'd guess that 95%+ of all businesses have ML usage today. Further guessing would say that <1% of ML usage actually care about levels of accuracy beyond "it's better than the hacked together set of rules/filters I use today." These are very large businesses with lots of money to spend on a solution.

There are many ways to measure "better", and AutoML does apply here. This includes "better == faster to train or develop" [1], "better == you need less data" [2], "better == lower error rates"[3]. While I agree that many of measures do not apply across datasets, most customers only have one dataset per problem.

[1] Predictive accuracy and run-time - https://repositorio-aberto.up.pt/bitstream/10216/104210/2/19...

[2] Less data - https://arxiv.org/abs/1703.03400

[3] https://link.springer.com/article/10.1007/s10994-017-5687-8

Disclosure: I work at Google on Kubeflow

TheIronYuppie · on Jan 18, 2018

Sorry, that total should be-

- Average customer has zero ML

- Nearly no customers are using any ML (difference between median and mode)

- Of those that are using, very few care about better than human perf

chimtim · on Jan 18, 2018

I actually have worked with lots of customers in deploying ML. This was my perspective. Thanks for sharing your perspective at Google.

TuringNYC · on Jan 17, 2018

From the article "There’s a very limited number of people that can create advanced machine learning models." -- Curious if this is really the case? It is certainly the case with my generation of engineers but half the student interns I interview from top-20 comp sci programs do this on weekends for hackathons.

Is the argument that it is easy to implement stock models but hard to tune the models for specific types of image inputs? Inst that pretty easily solved with some parameter grid searches? How much specialized skill does it take to re-do networks from traditional inception architecture or what not into something specific for hot-dogs or satellite imagery or medical images?

nl · on Jan 17, 2018

(I build machine learning models professionally)

* half the student interns I interview from top-20 comp sci programs do this on weekends for hackathons*

It's trivially easy to take what someone else has built and modify it slightly for a similar problem, especially in a hackerthon environment where you can ignore edge cases etc.

See if they can build a new model from scratch for a new type of problem. I'm not saying that AutoML can do this either, but I interview large numbers of PhDs who don't know where to start on doing something new.

jmngomes · on Jan 18, 2018

Hence my use of the word "eventually".

"It's very basic but you can use the model as a benchmark" is actually the pitch you get from AWS.

sharemywin · on Jan 17, 2018

outside audits of data handling practices.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

I'll make sure we have clear statements/auditing on this! We're already committed to GDPR[1] - is there a specific additional service/certification you'd like to see?

[1] https://www.google.com/cloud/security/gdpr/

bistro17 · on Jan 17, 2018

and addressing model interpretibility?

sharemywin · on Jan 17, 2018

curious is that Europe only?

prats226 · on Jan 17, 2018

For those who don't like to wait for access, https://nanonets.com Just upload your training data and we will provide machine learning API automatically.

alanlewis · on Jan 17, 2018

It's not clear from the demo video, but will this help with labeling data? In my experience, that is the most time consuming part of creating models.

T-A · on Jan 17, 2018

From https://cloud.google.com/automl/

Integration with human labeling

For customers with images but no labels yet, we provide a team of in-house human labelers that will review your custom instructions and classify your images accordingly. You will get training data with the same quality and throughput Google gets for its own products, while your data remains private. You can use the human labeled data seamlessly to train a custom model.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

Well.... no. BUT, it _does_ support unstructured data so you may not need to label your data at all. As always, YMMV.

jorgemf · on Jan 17, 2018

This has to be very expensive for companies. A good business for Google

illumin8 · on Jan 17, 2018

This seems to be a reaction by Google to the Amazon SageMaker release in November: https://aws.amazon.com/sagemaker/

It's great to see that other cloud providers are acknowledging the talent and training data gaps that many large enterprises face when adopting deep learning.

Disclaimer: I work for AWS

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

This is an externalization of the service we use at Google internally called Vizier[1], first discussed publicly in June[2].

The idea is that instead of having to build a model yourself, we can use ML (yes, it uses ML to provide ML) to autotune your model and solve your business problem. Basically, instead of having to deal with all the steps in opening an editor, choosing a algo, tweaking, debugging, etc etc, just provide your structured or unstructured data and we'll help you answer your question (which is what customers actually care about).

[1] https://research.google.com/pubs/pub46180.html [2] https://www.youtube.com/watch?v=Z2YL4XJKVpQ

TheIronYuppie · on Jan 18, 2018

/clarification

AutoML vision is actually built on Google Brain’s proprietary image recognition technology, and Vizier is one of the components of their broader solution. You can see their earlier research announcement here[1]. Sorry to leave off the additional teams that helped in building this!

[1] https://research.googleblog.com/2017/05/using-machine-learni...

TuringNYC · on Jan 17, 2018

So an orthogonal approach here might be crowd-sourced centralized model zoos for better idea sharing across the entire industry. Curious how others see this (automated point solutions crafted to the data set) vs [hopefully soon popular] ONNX model zoos where we have more collaboration across orgs?

illumin8 · on Jan 17, 2018

Same idea for Sagemaker. Nice to see I get a bunch of instant downvotes - I sometimes wonder why even bother participating in this community.

rasmi · on Jan 17, 2018

I didn't downvote you, but I think the comparison to Sagemaker misses the point. This is literally just uploading labeled data and getting a finely tuned classifier out. Hyperparameter tuning is neat, and both Cloud ML Engine and Sagemaker have that, but (correct me if I'm wrong), only AutoML actually handles all of the model architecture decisions itself using transfer learning and learning2learn. See here for details: https://research.googleblog.com/2017/11/automl-for-large-sca...

This significantly reduces the level of expertise required to train models, and the AutoML models outperform "expert" human-created architectures.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

Interesting! I read up on Sagemaker here[1] and didn't see any AutoML style training/tuning features, but you would certainly know better than me :)

[1] https://aws.amazon.com/blogs/aws/sagemaker/

Bollack · on Jan 17, 2018

As far as I know, they haven't implemented the HPO on Sagemaker. They're planning to implement it soon. There's still no date announced.

jorgemf · on Jan 17, 2018

HPO means Hyper Parameter Optimization? Because AutoML has nothing to do with it, AutoML is mostly about the architecture of the model, not about Hyper Parameter Optimization.

joshuamorton · on Jan 18, 2018

model shape is a hyperparameter ;)

jorgemf · on Jan 18, 2018

for the same logic the researcher is another hyperparameter

(I know you are right, but so many people here think AutoML is exactly the same that the HPO they were doing since long time ago)

joshuamorton · on Jan 18, 2018

That's fair, yes automl is not simply tuning the learning rate and picking your favorite nonlinearity, its fancier than that, but its still tuning hyperparamters.

jorgemf · on Jan 17, 2018

How SageMaker is similar to AutoML? I don't see any reference in SageMaker about defining the architecture of your model for you based on your data.

eanzenberg · on Jan 17, 2018

“Making AI accessible to every businees” with image classifiers (because every businees needs image classifiers) ^_^

rasmi · on Jan 17, 2018

"Soon, Cloud AutoML will release other services for all other major fields of AI."

It's a start!

https://cloud.google.com/automl/

theDoug · on Jan 17, 2018

Not everyone _needs_ accessibility ramps, but we all benefit from them. :)

Many businesses need them, but don't have the staff or expertise, many may want them for fun functionality (building their own "Not Hotdog" app), but the aim is ease of model creation for anyone with a bit of data and time.

(Disclosure: I work in Google Cloud)

peatmoss · on Jan 17, 2018

I get that it's a metaphor, but I seriously am having a hard time equating automated image classification to accessibility ramps.

pathseeker · on Jan 17, 2018

You wouldn't if you worked at Google.

eanzenberg · on Jan 18, 2018

Because google is notoriously bad at profitable products, besides search?

petra · on Jan 17, 2018

Leaving aside for a sec how capable and complex this is, an extra set of eyes seems like a useful thing for manufacturing businesses.

m3kw9 · on Jan 17, 2018

Manufacturing recognition require speed and probably 99.999% up time, I didn’t see anything that can guarantee this and the fact that they are on the cloud almost certainly doesn’t guarantee this. It would require local or very close to the edge connection to get manufacturing to practically use this

dweekly · on Jan 17, 2018

FYI this is Cloud AutoML Vision being showcased here as the first of a suite of AI services that can learn to learn efficiently in different domains. The overall vision is to democratize AI and this is intended to be a meaningful step in that direction.

(Disclaimer: I work for Google but not on this.)

cdl · on Jan 17, 2018

And even if every business did need an image classifier, they would still need a technical person that understands how this AutoML model they train can be integrated into business processes or software applications...

tabeth · on Jan 17, 2018

AI without tons of data is, well, overkill. That being said, it seems Google is sharing a few pretrained models, which is nice.

rasmi · on Jan 17, 2018

Disclaimer: I do ML-based work in Google Cloud, but I am not on the AutoML team.

The post says there is transfer learning involved, which means in practice you need much less data than you would if creating a classifier from scratch. Of course, more (good) data may yield better results, but it seems one of the goals behind this release is specifically to give custom (your own labels, not just generic object detection) high performance image classification to those who don't have access to Google-scale training sets.

petra · on Jan 17, 2018

What kind of datasets ? what size ? Is it something practical for a small business to gather ?

eggie5 · on Jan 17, 2018

Figure 4 of the The Decaf paper shows meaningful learning w/ only 10 examples! https://arxiv.org/abs/1310.1531

ska · on Jan 17, 2018

Transfer learning is hardly a panacea, however much some would like it to be.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

Can you say more? I don't think anyone is saying it's magic pixie dust, but it does dramatically reduce the amount of data you need.

_delirium · on Jan 17, 2018

I'd probably phrase it as "can" dramatically reduce the amount of data you need rather than "does". Getting transfer learning to work in any kind of reliable way is still very much open research, and the systems I've seen are heavily dependent on basically every variable involved: the specific data sets, domains, model architectures, etc., with sometimes pretty puzzling failures.

I don't doubt Google has managed to make something useful work, though I'm more skeptical of how general the ML tech is. One advantage of an API like this is that it allows control over many of those variables. I'm not sure if this is what it does, but you could even start out by making a transfer-learning system that's heavily tailored to transfering from one specific fixed model, which coupled with some Google-level engineering/testing resources, could produce much more reliable performance than in the general case.

TheIronYuppie · on Jan 17, 2018

Disclosure: I work at Google on Kubeflow

As you can see here[1], we do provide quite a bit of information about the accuracy and training of the underlying model.

Additionally, the AutoML already (often) provides better than human level performance[2]. Your comment about transferring a heavily tailored model from one model to another is basically what it's doing - it's taking something domain specific (vision) and allowing you to transfer it to your domain.

[1] https://youtu.be/GbLQE2C181U?t=1m15s

[2] https://static.googleusercontent.com/media/research.google.c...

ska · on Jan 17, 2018

I was about to type a very similar comment, but this is much of what I had in mind.

I've also seen it used to justify insufficient validation - resulting in strange generalization failures.

eanzenberg · on Jan 17, 2018

It depends on the domain. It works for images because images are the same in time. It doesn’t work as well for text because there’s tons of nuance to speech patterns between groups (yelp vs google reviews)

colochef · on Jan 17, 2018

you might want to test www.monkeylearn.com for text

zeroxfe · on Jan 17, 2018

Um... I don't think anyone here is saying (or even implying) that.

ska · on Jan 17, 2018

Maybe I was misreading, but I read it as something like : transfer learning is going to be a general solution for not having enough data. That's really not the case.

rasmi · on Jan 17, 2018

That's a fair point. There are certainly technical challenges involved in bringing this to other domains.

fatjokes · on Jan 17, 2018

On what basis do you make that assertion? Other comments have alluded to transfer learning which can be used to take advantage of small datasets, but independent of neural nets, small datasets can be quite suitable to train some useful SVM/linear regression models.

tabeth · on Jan 17, 2018

I define small as in small business, as in less than 100k records. I also don't consider a linear regression to be "AI". Things like genetic algorithms and neural networks are probably overkill for an average small business. There's also little evidence to my knowledge that transfer learning is really effective on non-vision applications.

Besides, the fact that you said that linear regressions can be useful for small datasets is exactly my point.

bob_theslob646 · on Jan 19, 2018

>There's also little evidence to my knowledge that transfer learning is really effective on non-vision applications.

Source?