Hacker News new | past | comments | ask | show | jobs | submit login
Lessons from Building a Serverless Data Pipeline with AWS Kinesis and Lambda (iopipe.com)
63 points by n0debotanist on Aug 29, 2018 | hide | past | favorite | 5 comments



I guess I took away a different set of lessons when using lambda as a kinesis source/sink for streaming, high volume, time-sensitive data. Which is you may want to rethink it and use the KCL/KPL instead. You lose a lot of application control when using kinesis and lambda together.

A record that spikes your memory footprint? Lambda will fail and continuously retry the batch.

Have a record that unexpectedly takes a long time to process? Lambda will time out and continuously retry.

You also have no control over when the lambda execution context recycles. Which can make a big difference to your datastore usage.


We wanted to use the KCL on the receiver side, but figured that it required building some sort of cluster to mediate which node would receive which shard. Tricky, especially when resharding. With Lambda, you get that for free. So we opted for lambda, but weren't overly happy. Did you solve that problem?


The KCL itself handles shard assignment and balancing. Each application has its own dynamodb table where it stores lease information. If you have 64 shards and want that processed on one machine? Spin up one instance of the application. The KCL will spin up 64 sub processes.

Instead want to spread the load? Spin up as many instances of the application you want, they don't need to be clustered, they will negotiate, using the lease table to take and balance the shards across all instances.


I'll add that the same goes for resharding. The KCL sees and handles the Parent/child shard relationship, getting a lease on the children, but waiting until the parent is "drained" before reading from the children. And it handles this through the shenanigans that occur when resharding in non-multiple/evenly-divisible numbers


Url changed from https://twitter.com/IOpipes/status/1034846728216825856, which points to this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: