More

vtuulos · 2025-07-17T17:22:59 1752772979

If you are ok with executing your SFN steps on AWS Batch, Metaflow should do the job well. It's pretty inhuman to interact with SFN directly.

One feature that's in our roadmap is the ability to define DAG fully programmatically, maybe through configs, so you will be able to have a custom representation -> SFN JSON, just using Metaflow as a compiler

vtuulos · 2025-07-17T17:17:41 1752772661

Stay tuned! We have some cool new features coming soon to support agentic workloads (teaser: https://github.com/Netflix/metaflow/pull/2473)

If you are curious, join the Metaflow Slack at http://slack.outerbounds.co and start a thread on #ask-metaflow

vtuulos · 2025-07-17T11:25:41 1752751541

Metaflow was started to address the needs of ML/AI projects whereas Airflow and Dagster started in data engineering.

Consequently, a major part of Metaflow focuses on facilitating easy and efficient access to (large scale) compute - including dependency management - and local experimentation, which is out of scope for Airflow and Dagster.

Metaflow has basic support for dbt and companies use it increasingly to power data engineering as AI is eating the world, but if you just need an orchestrator for ETL pipelines, Dagster is a great choice

If you are curious to hear how companies navigate the question of Airflow vs Metaflow, see e.g this recent talk by Flexport https://youtu.be/e92eXfvaxU0

vtuulos · 2025-07-17T09:16:48 1752743808

I don't know if it's a coincidence but we just released a major new feature in Metaflow a few days ago - composing flows with custom decorators: https://docs.metaflow.org/metaflow/composing-flows/introduct...

A big deal is that they get packaged automatically for remote execution. And you can attach them on the command line without touching code, which makes it easy to build pipelines with pluggable functionality - think e.g. switching an LLM provider on the fly.

If you haven't looked into Metaflow recently, configuration management is another big feature that was contributed by the team at Netflix: https://netflixtechblog.com/introducing-configurable-metaflo...

Many folks love the new native support for uv too: https://docs.metaflow.org/scaling/dependencies/uv

I'm happy to answer any questions here

theOGognf · 2025-07-17T12:33:58 1752755638

Is it common to see Metaflow used alongside MLflow if a team wants to track experiment data?

vtuulos · 2025-07-17T13:42:55 1752759775

Metaflow tracks all artifacts and allows you to build dashboards with them, so there’s no need to use MLFlow per se. There’s a Metaflow integration in Weights and Biases, CometML etc, if you want pretty off-the-shelf dashboards

vtuulos · 2025-03-03T19:04:44 1741028684

if you want to see similar tricks applied in Python (with a JIT compiler for query-time optimization), take a look at this fun deck that I presented a long time ago: https://tuulos.github.io/sf-python-meetup-sep-2013

we were able to handle trillion+ datapoints with relatively modest machines - definitely a useful approach if you are ready to do some bit twiddling

ciupicri · 2025-03-03T19:41:29 1741030889

Just use `s = sys.intern(s)` for every string and be done with it. Or something like:

    _my_intern_dict = {}
    my_intern = lambda x: _my_intern_dict.setdefault(x, x)
    s = my_intern(s)

Just make sure to delete _my_intern_dict when it's not needed anymore.

vtuulos · on Nov 27, 2024

yes, this. In case you are interested in seeing some numbers backing this claim, see here https://outerbounds.com/blog/metaflow-fast-data

Source: I used to work at Netflix, building systems that pull TBs from S3 hourly

vtuulos · on Aug 14, 2024

I had the same concern. However, the structure of the output was surprisingly stable. We rejected badly formatted responses: https://github.com/outerbounds/hacker-news-sentiment/blob/ma...

The semantics of the topics/tags could be improved for sure with a more detailed prompt

vtuulos · on Aug 14, 2024

here's how the model ranks the discussion on this page after 40 comments:

SENTIMENT 6

:D

vtuulos · on Aug 14, 2024

even simpler, you can just do it in SQL

You can find all titles and dates since the beginning of HN in this public BigQuery dataset: https://console.cloud.google.com/marketplace/product/y-combi...

SubiculumCode · on Aug 14, 2024

whoah. thank you dude!!

vtuulos · on Aug 14, 2024

That's an interesting hypothesis but the words we use to express agreement and disagreement haven't changed much.

We don't try to retrieve articles/topics from the model, which would be affected by the cutoff, just asking it to analyze the sentiment or summarize the content provided in a prompt

jmward01 · on Aug 14, 2024

True. It would be interesting to run these same tests on the 7B model to see if trend information changes or not. 7B had a march cutoff so if the aug-dec dip migrated to oct-march (or just disappeared) it would be strong evidence for training/data bias. If nothing else, comparing 7B to 70B would likely be interesting.

edit I realized too late I had the years off. It is pure coincidence of month, not a real data bias. Sorry! I still think it would be interesting to see a 7B comparison but that is just to see how well a small model could spot big trends compared to a bigger one.

vtuulos · on Aug 14, 2024

yep! And of course the new 3.1 model