Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Clockwork- Distributed, Scalable Job Scheduler (cynic.dev)
14 points by uttpal on Oct 6, 2019 | hide | past | favorite | 6 comments

Great post! In the last part of your blog you've mentioned: "To scale out we can increase number of partition in kafka topic and add more clockwork nodes."

Above has a limit. In general, more partitions in a Kafka cluster leads to higher throughput. However, one does have to be aware of the potential impact of having too many partitions in total or per broker on things like availability and latency.

How do we handle too many kafka partitions?

Adding more partitions will definitely have impact on latency and availability, and you will always have a trade-off between throughput and latency however, Clockwork will work fine for most of use cases as kafka supports upto 200k partitions per cluster (https://blogs.apache.org/kafka/entry/apache-kafka-supports-m...). Beyond that we will need to have multi-cluster support, definitely in the pipeline wink

Clockwork-Scheduler is a general-purpose distributed job scheduler. It offers you horizontally scalable scheduler with at least once delivery guarantees. It is fault-tolerant, persistent, easy to deploy and maintain.

Thanks for checking, would love to discuss

Thanks for posting, Loved the graphics and layout. They made for enjoyable reading. However maybe you could add or share the motivation for creating the project? Were their shortcomings of existing job scheduler solutions that you evaluated? Is there an edge case that you have that existing solutions didn't address etc?

One of the popular existing job scheduler is Quartz, it is solid and does work pretty well up to a limit. The problem arises where we need to scale-out from a single master DB, there are some workarounds for that, but each has its limitations. (https://hazelcast.com/blog/distributed-task-coordination-wit...)

While there are some existing solutions like BigBen from Walmart, it might be possible that they were overkill for the task required.

Thanks and I will definitely add that to article,

I will just point to major shortcomings here of quartz (the go to job scheduler) scaling beyond one node is difficult because

* as it uses db locks to adding more nodes does not scale linearly.

* we can add custom partitioning (consistent hashing) on top of multi node multi db setup, but maintaining such setup would be very difficult eg adding and removing nodes, handling node failure (will have to move data around from its db to let other node execute those schedules).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact