Metaflow passes state using an object store (s3, azure blob store, etc.) even within Airflow - short circuiting Airflow's xcom machinery. But agreed, Airflow presents many more scalability challenges - this integration addresses a few in-place as well as preserves the ability to swap out Airflow with a more scalable workflow orchestrator if needed.
Outerbounds | Systems dev | SF / Remote | Full-time
Outerbounds was founded recently to commercialize Metaflow, an open-source Python framework that makes infrastructure easily accessible for machine learning/data science projects. Metaflow was originally started by us at Netflix and it is now used by hundreds of companies across industries.
We care a lot about thoughtful design, overall product experience, and quality of code. We are looking for backend/systems engineers who are experienced in at least one of the following areas: Delightful API design, deep Python experience, distributed systems, or low-level systems programming.
I agree. A huge GCP VPS with a good GPU attached is very inexpensive when you only start it when you are in a work sprint.
Just this week I have been experimenting with SageMaker and SageMaker Studio. Too early for a real evaluation, but it looks like SageMaker Studio hits many requirements: good for experimenting, run large distributed jobs, good model and code versioning tools, easy to publish REST APIs, etc. Just yesterday someone asked me to review 3rd party tools, and I look forward to getting a better understanding of how SageMaker Studio stacks up against turn-key systems.
I have built my career from standing on the shoulders of giants. I am not shy about just using the results in academic papers, using open source libraries, tools and frameworks, etc. that other people have written.
So, I agree with you that so much can be done on a single beefy VPS, but services and frameworks that allow easy use of multiple servers are also important.
I think another very important piece of the puzzle is *when* to make the transition from a monolith service to a microservice stack. I have seen many startups getting distracted by committing to a microservices transition rather too early in their lifecycle, and on the flip side - big organisations postponing and accruing a huge transition tax.
We are seeing more and more companies doing the transition earlier as the tooling gets more mature. But worrying about microservices before the product is not a good idea
Metaflow was built to assist in both developing ML models and deploying/managing them in production. AFAIK, TFX is focused on the deployment story of ML pipelines.
It's focused on building ML pipelines (similar to what Cortex aims to be).
In addition, it also conveniently supports integration with Orchestrators like Airflow, Kubeflow, Beam etc. This book "Building Machine Learning Pipelines: Automating Model Life Cycles with TensorFlow" (https://www.amazon.com/dp/1492053198/ref=cm_sw_em_r_mt_dp_Ig...) goes into great details.
I was curious to see what advantage Metaflow offered over TFX.
Hi! Metaflow ships with a CloudFormation template for AWS that automates the set-up of a blob store (S3), compute environment (Batch), metadata tracking service (RDS), orchestrator (Step-Functions) notebooks (Sagemaker) and all the necessary IAM permissions to ensure data integrity. Using Metaflow, you can then write your workflows in Python/R and Metaflow will take care of managing your ML dev/prod lifecycle.
> 350M Tokens Don't Lie: Love And Hate In Hacker News, to
> LLM-based sentiment analysis of Hacker News posts, to
> LLM-based sentiment analysis of Hacker News posts between Jan 2020 and June 2023