More

savin-goyal · 2024-08-14T01:00:59 1723597259

what's up with the title flips from

> 350M Tokens Don't Lie: Love And Hate In Hacker News, to

> LLM-based sentiment analysis of Hacker News posts, to

> LLM-based sentiment analysis of Hacker News posts between Jan 2020 and June 2023

t-writescode · 2024-08-14T03:20:49 1723605649

A/B testing? Possibly increasing accuracy from high click-bait, low signal to low click-bait high signal?

savin-goyal · 2024-07-22T21:10:00 1721682600

Metaflow sits on top of Maestro, and neither replaces the other

> ...Users can use Metaflow library to create workflows in Maestro to execute DAGs consisting of arbitrary Python code. from https://netflixtechblog.com/orchestrating-data-ml-workflows-...

The orchestration section in this article (https://netflixtechblog.com/supporting-diverse-ml-systems-at...) goes into detail on how Metaflow interplays with Maestro (and Airflow, Argo Workflows & Step Functions)

savin-goyal · on Feb 21, 2023

Metaflow passes state using an object store (s3, azure blob store, etc.) even within Airflow - short circuiting Airflow's xcom machinery. But agreed, Airflow presents many more scalability challenges - this integration addresses a few in-place as well as preserves the ability to swap out Airflow with a more scalable workflow orchestrator if needed.

savin-goyal · on Feb 1, 2023

Outerbounds | Systems dev | SF / Remote | Full-time Outerbounds was founded recently to commercialize Metaflow, an open-source Python framework that makes infrastructure easily accessible for machine learning/data science projects. Metaflow was originally started by us at Netflix and it is now used by hundreds of companies across industries.

We care a lot about thoughtful design, overall product experience, and quality of code. We are looking for backend/systems engineers who are experienced in at least one of the following areas: Delightful API design, deep Python experience, distributed systems, or low-level systems programming.

- Metaflow: https://docs.metaflow.org

- More about us here: https://outerbounds.com/workwithus

- Join our Slack to see the project in action: http://slack.outerbounds.co

- Email: workwithus@outerbounds.co

savin-goyal · on May 23, 2022

Metaflow provides a similar concept to interface with Step Functions and Argo Workflows in Python - https://docs.metaflow.org/going-to-production-with-metaflow/...

savin-goyal · on Sept 25, 2021

A single machine can take you remarkably far these days, given the availability of high RAM/Disk/CPU machines in the cloud.

mark_l_watson · on Sept 25, 2021

I agree. A huge GCP VPS with a good GPU attached is very inexpensive when you only start it when you are in a work sprint.

Just this week I have been experimenting with SageMaker and SageMaker Studio. Too early for a real evaluation, but it looks like SageMaker Studio hits many requirements: good for experimenting, run large distributed jobs, good model and code versioning tools, easy to publish REST APIs, etc. Just yesterday someone asked me to review 3rd party tools, and I look forward to getting a better understanding of how SageMaker Studio stacks up against turn-key systems.

I have built my career from standing on the shoulders of giants. I am not shy about just using the results in academic papers, using open source libraries, tools and frameworks, etc. that other people have written.

So, I agree with you that so much can be done on a single beefy VPS, but services and frameworks that allow easy use of multiple servers are also important.

m0zg · on Sept 25, 2021

In plain old data science? Sure. In deep learning? Nope. Gotta be distributed unless you want to wait until the Sun burns out.

savin-goyal · on April 8, 2021

I think another very important piece of the puzzle is *when* to make the transition from a monolith service to a microservice stack. I have seen many startups getting distracted by committing to a microservices transition rather too early in their lifecycle, and on the flip side - big organisations postponing and accruing a huge transition tax.

ankushio · on April 8, 2021

Very astute observation, something which resonates strongly with my experience as well.

At some point we should transition from microservices nomenclature to just services to illustrate the true nature of these entities.

nitinagg · on April 8, 2021

We are seeing more and more companies doing the transition earlier as the tooling gets more mature. But worrying about microservices before the product is not a good idea

savin-goyal · on Dec 21, 2020

Metaflow was built to assist in both developing ML models and deploying/managing them in production. AFAIK, TFX is focused on the deployment story of ML pipelines.

https://docs.metaflow.org/introduction/what-is-metaflow#shou...

dustinhopkins · on Dec 21, 2020

It's focused on building ML pipelines (similar to what Cortex aims to be). In addition, it also conveniently supports integration with Orchestrators like Airflow, Kubeflow, Beam etc. This book "Building Machine Learning Pipelines: Automating Model Life Cycles with TensorFlow" (https://www.amazon.com/dp/1492053198/ref=cm_sw_em_r_mt_dp_Ig...) goes into great details.

I was curious to see what advantage Metaflow offered over TFX.

savin-goyal · on Dec 21, 2020

Hi! Metaflow ships with a CloudFormation template for AWS that automates the set-up of a blob store (S3), compute environment (Batch), metadata tracking service (RDS), orchestrator (Step-Functions) notebooks (Sagemaker) and all the necessary IAM permissions to ensure data integrity. Using Metaflow, you can then write your workflows in Python/R and Metaflow will take care of managing your ML dev/prod lifecycle.

https://github.com/Netflix/metaflow-tools/tree/master/aws/cl...

savin-goyal · on Dec 21, 2020

Take a look at metaflow.org/sandbox if you want to test drive Metaflow.

manojlds · on Dec 21, 2020

It takes me into a verification and waiting flow. Useless.

savin-goyal · on Dec 21, 2020

Give it a few minutes :)