
Lessons from Building a Serverless Data Pipeline with AWS Kinesis and Lambda - n0debotanist
https://read.iopipe.com/lessons-from-building-a-serverless-data-pipeline-with-aws-kinesis-and-lambda-4d8cf0ebcbc9
======
thisone
I guess I took away a different set of lessons when using lambda as a kinesis
source/sink for streaming, high volume, time-sensitive data. Which is you may
want to rethink it and use the KCL/KPL instead. You lose a lot of application
control when using kinesis and lambda together.

A record that spikes your memory footprint? Lambda will fail and continuously
retry the batch.

Have a record that unexpectedly takes a long time to process? Lambda will time
out and continuously retry.

You also have no control over when the lambda execution context recycles.
Which can make a big difference to your datastore usage.

~~~
strictfp
We wanted to use the KCL on the receiver side, but figured that it required
building some sort of cluster to mediate which node would receive which shard.
Tricky, especially when resharding. With Lambda, you get that for free. So we
opted for lambda, but weren't overly happy. Did you solve that problem?

~~~
thisone
The KCL itself handles shard assignment and balancing. Each application has
its own dynamodb table where it stores lease information. If you have 64
shards and want that processed on one machine? Spin up one instance of the
application. The KCL will spin up 64 sub processes.

Instead want to spread the load? Spin up as many instances of the application
you want, they don't need to be clustered, they will negotiate, using the
lease table to take and balance the shards across all instances.

~~~
thisone
I'll add that the same goes for resharding. The KCL sees and handles the
Parent/child shard relationship, getting a lease on the children, but waiting
until the parent is "drained" before reading from the children. And it handles
this through the shenanigans that occur when resharding in non-
multiple/evenly-divisible numbers

------
dang
Url changed from
[https://twitter.com/IOpipes/status/1034846728216825856](https://twitter.com/IOpipes/status/1034846728216825856),
which points to this.

