
Ask HN: What is the best way to calculate percentile of streaming data? - rishiloyola
Hello,<p>I need to code python function which will iterate through incoming requests and calculate percentile of size of body dynamically. Which lib or algo do you guy recommend?<p>Example: Requests are coming in batches. Let&#x27;s say first batch has 50 requests, next one has 80 etc. I need to calculate percentile of size of body that each request has.
======
sigmaprimus
I think you need to provide a bit more info, are you using Apache Kafka?
Something else?

The function would be individual batch requests divided by total requests
multiplied by 100, but I dont think thats what your looking for.

Edit: actually, for your question it would be the inverse of batch size
multiplied by 100, eg. First batch has 50 request so that would be 1/50×100 or
2%

~~~
rishiloyola
No I am not using Kafka. It is just basic python server. I want to calculate
what is the nth percentile of size of my incoming request object over past one
hour.

I don't want to store size of each request in memory. It will eat so much of
my RAM.

Incoming traffic:

\- 1st batch

\--> 60 requests

\---> size of 1st request is 10kb

\---> size of 2nd request is 2kb

...

...

\- 2nd batch

\--> 10 requests

\---> size of 1st request is 5kb

\---> size of 2nd request is 8kb

...

\- 100th batch

I am talking about percentile(10th, 50th, 95th) size of request.

