Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Large Production Clusters at Alibaba with Partitioned Synchronization (micahlerner.com)
92 points by mlerner 3 months ago | hide | past | favorite | 4 comments

It's nice that they give hard numbers. I wonder what aspect of their house style results in the astronomical task submission rate of 40000 tasks per second. That's 1000x higher than the new task arrival rate given by Google in their EuroSys '20 paper "Borg: the Next Generation". Their number of about 4 million new jobs per day is also about 50x higher than the rate given by Google.

I wonder what a task looks like. They're supporting essentially adding a task to a machine every 2.5 seconds, which is a really high rate. It seems like most systems are built around slower resource reservation and then once resources have been reserved, they can pull from a queue for tasks assigned to it: E.g. Lambda: https://www.youtube.com/watch?v=xmacMfbrG28&t=123s

According to their paper 87% of the tasks live for less than 10 seconds. It sounds like they're solving a characteristically different problem than is solved by Borg.

They are probably adding there equivalent of Nomad Jobs, so some short processing/batching tasks.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact