It should be noted that this is a rather poor benchmark, comparing multiprocess SO_REUSEPORT and single process multithreading, as well as no consistent header set. Note that the crystal implementation is outputting about 92 bytes per request, whereas Go put out 142 bytes per request. Also, Go stdlib was tested with different settings then any of the others.