Hacker News new | past | comments | ask | show | jobs | submit login

I've looked into this but saw hugely variable throughput, sometimes as little as 20 MB / second. Even if full throughput I think s3 single key performance maxes out at ~130 MB / second. How did you get these huge s3 blobs into lambda in a reasonable amount of time?



* With larger lambdas you get more predictable performance, 2GB RAM lambdas should get you ~ 90MB/s [0]

* Assuming you can parse faster than you read from S3 (true for most workloads?) that read throughput is your bottleneck.

* Set target query time, e.g 1s. That means for queries to finish in 1s each record on S3 has to be 90MB or smaller.

* Partition your data in such a way that each record on S3 is smaller than 90 MBs.

* Forgot to mention, you can also do parallel reads from S3, depending on your data format / parsing speed might be something to look into as well.

This is somewhat of a simplified guide (e.g for some workloads merging data takes time and we're not including that here) but should be good enough to start with.

[0] - https://bryson3gps.wordpress.com/2021/04/01/a-quick-look-at-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: