I've looked into this but saw hugely variable throughput, sometimes as little as 20 MB / second. Even if full throughput I think s3 single key performance maxes out at ~130 MB / second. How did you get these huge s3 blobs into lambda in a reasonable amount of time?
* With larger lambdas you get more predictable performance, 2GB RAM lambdas should get you ~ 90MB/s [0]
* Assuming you can parse faster than you read from S3 (true for most workloads?) that read throughput is your bottleneck.
* Set target query time, e.g 1s. That means for queries to finish in 1s each record on S3 has to be 90MB or smaller.
* Partition your data in such a way that each record on S3 is smaller than 90 MBs.
* Forgot to mention, you can also do parallel reads from S3, depending on your data format / parsing speed might be something to look into as well.
This is somewhat of a simplified guide (e.g for some workloads merging data takes time and we're not including that here) but should be good enough to start with.