Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Good Topics to Study for Machine Learning Operations (MLOps)
2 points by ThinkCritically 46 days ago | hide | past | favorite | 1 comment
I am putting together an online course on how to process data in real time and how to deploy models into production. Right now, I am planning to cover the following in the course: • Streaming data with Docker • Apache Beam / DataFlow • BigQuery • Intro to TensorFlow and Keras • Creating a basic CNN model • Model deployment on Google Cloud (AI Platform) • Basic visualization in Data Studio

Even put together an introduction video on this and a “detailed course syllabus” at https://whiteowleducation.ghost.io/machine-learning-mastery-course-outline/ .

What are your thoughts? Is this the right amount of material for an intro MLOps course? Is the course missing any essential topics?


The link you shared is password protected. Here's what I think:

As someone in a boutique consultancy that has profitably served organizations for the past seven years or so, and that is building an MLOps platform, most of the content on ML and MLOps is useless to me. I have blocked most of the largest publications on Medium about data science and machine learning, because most of the posts/articles are written by data virgins who've never actually worked on real data, on a real problem, with real stakes [lives, infrastructure, money, or careers on the line]. For me, it's like those people who do survival/tactical/martial arts videos that clearly do not know what the hell they're doing. The content is useless at best and misleading/wrong at worst. Therefore, I often think of this as Judo/Jiu-Jitsu (works on real, hostile, adversaries) vs Aikido (only on imaginary adversaries, or compliant adversaries who fall on command. Useful to keep in shape with a veneer of martial arts). That content is so out of touch and removed from the realities of data products and ML projects.

Most of the content is relevant for someone working alone on a toy project on the Titanic/Iris dataset, downloading the dataset to disk, then deploying the model to an API with Flask, and maybe building an image for the thing. Most projects that actually matter are not done by one person, and things start to break really fast when more than one person work on an ML project for several months.

I have even gone as far as blocking certain accounts on Twitter because they belonged to loud people who are way more into Tweeting/audience building/AI enthusiasting/Mediuming/"what's holding you back" without ever touched real stuff.

The content that makes me allergic is therefore that type of content. Someone writing a post about end-to-end machine learning project lifecycle management. Writing comparisons of tools and frameworks they've never used. Someone grabbing some marketing brochure or some material from a summit and pasting it with "with X, it's now easy to deploy models".

Now, I would be totally be fine with a post with the tone "I've become interested in X recently, and this is an example of what I've been learning/doing". But a data virgin telling me that it's now easy to manage the complete ML project lifecycle with "this library" because they attended a summit and were told so by some company, it's beyond me. That made the signal to noise ratio of content about ML and MLOps really low.

So I pay close attention to "who is this person? Has this person done it before? Where does this person work? Is this content marketing to push a product or a platform where and the article is intimately tied to the product, or is it a more generic piece? Is this person legitimate?".

Even people who are "architecture gurus" are going on and on about this. Coming up with acronyms and a satellite view of the problem without any experience. Oceans and mountains look tiny viewed from a satellite.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact