Hacker News new | past | comments | ask | show | jobs | submit login
Cloud AI Platform Pipelines (cloud.google.com)
145 points by jamesblonde 18 days ago | hide | past | web | favorite | 53 comments

Just make sure you build some abstractions and have a backup plan. Don't get locked-in their API. It may/will ruin your business.

Cloud AI Platform Pipelines appear to use Kubeflow Pipelines on the backend, which is open source [1] and runs on Kubernetes. The Kubeflow team has invested a lot of time on making it simple to deploy across a variety of public clouds [2], [3].

If Google were to kill it, you could easily run it on any other hosted Kubernetes service.

I haven't used Cloud AI Platform Pipelines, but have spent a lot of time working with Kubeflow Pipelines and its pretty great!

[1] https://github.com/kubeflow/pipelines

[2] https://www.kubeflow.org/docs/aws/ (Deploy to AWS)

[3] https://www.kubeflow.org/docs/azure/ (Deploy to Azure)

This product appears to have a lot of overlap with Cloud Composer (https://cloud.google.com/composer) except with more of an AI focus.

Does that mean Cloud Composer might get depreciated?

Disclosure: I work on Google Cloud.

Different things! Composer (based on Apache Airflow, which we contribute to) is a general purpose workflow system. Cloud AI Platform Pipelines is really focused on ML pipelines, specifically, like Kubeflow (the related OSS project that we contribute to) is meant to be.

Instead, I would argue that Kubeflow has lots of overlap with Airflow. But I don't think either will be cancelled :).

Curious but what’s driving the need to have a different pipeline system? As you say, Kubeflow has lots of overlap with Airflow so what’s different?

Of course, this is Google. Everything overlaps with something can get shut down after a promo cycle.

This argument is getting old for B2B products. Would you mind sharing a list of Google Cloud Products or even just Business to Business related products that have been shut down?

App Maker, Google Cloud Print, Google Hangouts, Hire by Google, Fabric, Google Station, Google Correlate, Google Translator Toolkit, Google Fusion Tables, G Suite Training, Hangouts on Air, Google Cloud Messaging, Inbox by Google, Google URL Shortener, Google Realtime API, Google Site Search, Google Spaces, Picasa, Google Checkout.

I got bored and stopped before finishing 2012, but you can go back and find more.

App Maker is probably the worse on that killedbygoogle.com copy & paste list. Rest are B2B really?, the product evolved and got re-branded/consolidated or meh, was it really widely used anyway? Not a single one is a Google Cloud product.

I am no Google fanboy and get frustrated by a lot of things they do. But I think the Google kills everything argument for B2B products is getting tiresome. Especially in a Google Cloud Platform context.

Their actions on the Google Maps API absurd price hike and the recent GKE pricing structure change debacle is a whole other story and worth a lot of criticism.

I absolutely consider the rest of the best to be B2B. Some of them are also used by consumers (in the same way that many products are both B2B and B2C), but all of those products have business use cases. Whether it was "widely used anyways" is a different argument and a bit of a shift in the goalposts.

Were any of them paid B2B products that required any effort by the "client" to migrate? Because if its not paid, its not really a B2B product.

Any business tool requires effort to migrate. No matter if paid or not.

I don't think you understood the two different requirements:

Was it a paid business tool? (If not, it's not really a business tool, it's a tool someone was relying on for business, which is different).

Then secondarily, was there a migration cost? Which there sometimes isn't if, for example, two tools are API compatible.

There are literally zero GCP products in that list. (Despite the unfortunate naming, both Cloud Print and Messaging predate GCP and have nothing to do with it.)

GCP has been around for a fraction of the time of Google as a company. It is perfectly valid to have at least some concern here as Google as a company has a long track record of shutting down products. Conversely, it is tenuous at best to use the argument that since GCP has never shutdown a product that they won’t given how many of their products were launched very recently.

Yes, that argument has been made ad nauseam, but the previous poster was asking specifically for GCP products that have been shut down -- and the response doesn't seem to contain any.

I think you misread-

> a list of Google Cloud Products or even just Business to Business related products

They asked for GCP or B2B products, not only GCP products.

Yes, many of the GCP products that were launched in the past six months were not shutdown.

App engine has been around for 10+ years.

I remember the Prediction API was deprecated.


It used to be here https://cloud.google.com/prediction/ which now redirects to what I assume is its replacement.

They've done the bait and switch with plenty of B2B products. Eg Google Maps API and GKE.

Has nothing to do with killing B2B products. But I share your frustration and distrust in them for those two changes.

With the Google Maps API price change they demonstrated that they are very capable & willing of abusing their market position with shocking price hikes and with the GKE structural price change that they are no longer interested in the trust & business from small & medium size companies and unless you are enterprise size you can expect unpleasant price changes going forward.

One example, the python2 libraries for google app engine were abandoned, and google forced you to rewrite to totally different libraries for python 3.


“Starting with the Python 3 runtime, the App Engine standard environment no longer includes bundled App Engine services such as Memcache and Task Queues. Instead, Google Cloud provides standalone products that are equivalent to most of the bundled services in the Python 2 runtime. For the bundled services that are not available as separate products in Google Cloud, such as image processing, search, and messaging, you can use third-party providers or other workarounds as suggested in this migration guide.

Removing the bundled App Engine services enables the Python 3 runtime to support a fully idiomatic Python development experience. In the Python 3 runtime, you write a standard Python app that is fully portable and can run in any standard Python environment, including App Engine.”

Python2 itself has been sunset, and Python3 the language is not compatible. It doesn't seem outrageous to me to require developers to use a new SDK if they want to migrate their apps to Python3 -- and that's optional too, since Python2 remains supported on GAE.

That's not the issue. I would have expected them to migrate their libraries to python3 (like every other major python library in the world). Instead they wrote entirely new libraries that aren't remotely the same, requiring a more or less rewrite of our app.

I wonder how this might work with NLP workflows that use a custom embedding? I have not found a documentation or blog describing such a memory intensive step in the pipeline without running a custom instance/docker. In that case I can just run the whole model on that instance..

Trying to understand how is any this is new or different from AWS SageMaker or even Azure ML? IIUC AWS has way better tools/services to integrate with.

Most Google's Cloud Services have been very flaky and inconsistent.

The pre-built pipeline components look interesting.

I wish Cloud AI offered an API for the kind of results that I get when I use Google Lens. None of the ML offerings seem to come even close with labels.

Their Cloud Vision API [1] might be what you're looking for.

[1] https://cloud.google.com/vision

Yeah that's the one I was talking about. It gives some labels about stuff in the image and it works well. But Google Lens is on a different level -- you can send it a picture of a breed or a snake and it will return species and subspecies. Can't find anything like that in the GCP offering.

What sort of results do you get back from Cloud Vision API? The Google Lens example you mentioned is what I assumed it would do.

Nothing near that specific unfortunately. Not much more than "snake" (and occasionally wildly inaccurate species). At least with the ~20 or so photos that I just tested with.

Looks like you're not the only one who's interested: https://support.google.com/photos/thread/17424160?hl=en

Ah, thanks for pointing that out. Added my upvote. Cheers!

Also take a look at:


My first thought was how long till it’s discontinued?

Longer than the time it will take for the anti Google HN bubble to burst.

I'm sorry but GP is exactly right. Google has a horrible track record for their products. They can't even profit from vendor-lockin, as they kill off all the products sooner or later leaving a lot of angry customers.

Name the GCP products that have been killed off?

google deployment manager. it has incredibly subpar support for other GCP services. all support interactions resort to them suggesting migration to Terraform. we do use it in production, but not without great headache.

this isn't an explicit kill-off, but certainly purposefully offering bad support

When AWS continues to run a service looong after it stops being actively developed (looking at you, SimpleDB), they get praised. When Google does the same thing, it's paradoxically more proof that Google deprecates things!

Just another day on HN

Lol your best attack was a product that wasn't killed off. Good that I called you out.

not op...

That's more of a reflex than a thought, the latter usually involving at least traces of creativity.

I have a couple of quick takes on this:

* Remember kubeflow - the open-source ML platform? Well, on GCP, it doesn't look so open-source anymore.

* It looks like TensorFlow Extended has been subsumed (killed off?) by this new managed platform. No Beam support - to be replaced by tf.data service? https://github.com/aaudiber/community/blob/rfc-data-service/...

This looks like it's basically just an installer for Kubeflow -- this is not "managed" at all.

And where did you get that TFX has been replaced by this? You can run Kubeflow pipelines (including those created by TFX) on this.

"Kubeflow pipelines SDX and TFX SDK ... Over time, these two SDK experiences will merge" from the article.

That's just the SDK that they're saying is merging (with TFX being the one that "wins"). If you've used them you'll know they have huge overlap but TFX abstracts a ton of what it's actually doing.

TFX is just a SDK (with a few libraries - like data validation, model analysis, statistics). So, i don't know about saying it wins? The battle of ML pipeline ecosystems is the engine - not so much the API. It had been Beam vs Spark. Now, Google are changing tack and saying it is TensorFlow on Kubernetes with distributed processing vs Spark-based ML pipelines (that may use PyTorch/Tf for training).

I work on TFX. TFX started as an internal Google project [1] but over the years we've been steadily open sourcing more and more of it. Portability and interoperability are key focus areas for our project, more information in [2]. TFX OSS currently consists of libraries, components, and some binaries. It is up to the developer to pick the right level of abstraction for the task at hand. More information about this in [3]. Many of the ideas in TFX were inspired by an earlier production ML system at Google called Sibyl, more information about Sibyl in [4]. The two systems combined represent over a decade of production ML experience of many engineers and we hope other engineers will find it useful, too.

Also of interest may be this overview blog post [5] by yours truly :)

[1] ACM KDD '17 "TFX: A TensorFlow-Based Production-Scale Machine Learning Platform" https://dl.acm.org/doi/10.1145/3097983.3098021 [2] https://www.tensorflow.org/tfx/guide#portability_and_interop... [3] https://www.youtube.com/watch?v=zxd3Q2gdArY [4] https://www.youtube.com/watch?v=3SaZ5UAQrQM [5] https://blog.tensorflow.org/2019/05/research-to-production-w...

I am getting downvoted here without anybody actually addressing my arguments - TFX/Beam being replaced by TFX/KubeflowPipelines and the new platform being plugged in tightly to Google's managed services (AI platform, BigQuery, GCS, etc). I didn't think my tone was negative - they are observations that nobody in this subthread is addressing. As a systems person, TFX to me was a layer on top of Beam to tie ML pipelines together. Now, Beam is gone - dataflow appears as a managed service you can use in your pipelines (see the figure on the blog post). TFX is being repackaged as the API for Kubeflow pipelines. That was my take and what i wanted to discuss.

For what it's worth, I can't even downvote :)

I don't know the _eventual_ direction of Kubeflow/TFX -- but in our TFX pipelines you still get to choose where it runs. From the docs:

> Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. TFX uses Apache Beam to implement data-parallel pipelines. The pipeline is then executed by one of Beam's supported distributed processing back-ends, which include Apache Flink, Apache Spark, Google Cloud Dataflow, and others.

You can also choose to run it on Kubeflow itself.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact