Your scatterplot has just unaccumulated that data - but it's the same data.
I hate to single out anyone, but Philip is obviously a smart guy, and I think he's competent enough to not be hurt by a simple oversight like this. If you look at his write up here, he uses the oft circulated gnuplot template (check the comments):
> On first sight, we immediately see from the graph that the response time using Puma at the end of the 10000 requests is pretty bad with 100 concurrent requests, with the longest request taking around 60 seconds. I’m not entirely sure why this happens or what happens near the end, but here’s one plausible explanation:
> When the benchmark starts, 100 concurrent requests are sent to the web server. A maximum number of 16 threads, and thus 16 requests, are allocated by Puma at once. The 17th request will block until one of the 16 threads currently in use is finished. However, since we’re executing 100 concurrent requests, there will be 84 requests waiting (100-16). Looking at the requests in the generated puma.dat file (generated with ab -r -n 10000 -c 100 -T 'application/x-www-form-urlencoded' -g puma.dat -p ../live_streaming/post http://127.0.0.1:3000/messages), we see that exactly 84 requests have been waiting for execution. These are the requests that were issued first, but have never been allocated to a thread by Puma. As a result, they have been waiting for the entire benchmark. I’m not sure why Puma would behave like this.
That protracted explanation is predicated on the inference that the data is ordered chronologically. Many, many people make this mistake (google "apache bench gnuplot"). I always thought it was as well. I don't know why I never looked at the starttime or seconds columns of the data.
1) Most of the people using the gnuplot template really don't understand what it's doing
2) We all assume that the output of `ab -g plotfile` is a serial log
I guess there should be a better label for the y axis though.
I appreciate that you try to up the resolution to counter this, but it still strikes me as the wrong presentation.
I would rather the author go into a bit of a discussion about what the data represents and what different ways this could be presented. I can't help to get the feeling that the choice of a scatter plot is more or less arbitrary.
If you Google search for "apache bench gnuplot", you'll find a very similar gnuplot template that has been circulated for a very long time, but everyone seems to think that the resulting plot is response time over time.
I'm definitely going to follow up on the issue. I'm a mediocre programmer, so gnuplot is pretty hard for me, but I keep having these "ah ha" moments. My next post will look at each graph in more detail and try to explain just what is being shown in each.