
Silly benchmarks on completely untuned server code - wglb
https://rachelbythebay.com/w/2020/03/10/rps/
======
nartz
Thanks for this - but its the age-old argument of dynamic language vs
static/compiled language mixed with hey bloated libraries! and hey bloated
features! etc.

0\. Performance isn't everything, especially to a lot of companies where
having _something_ at all is more important than just being the fastest

1\. Dynamic languages are generally require less LOC, which generally equates
to faster implementations

2\. Rails, Django, <your favorite dynamic language framework> sure are
generally bloated, but being able to drop in a library for nearly anything you
need cannot be overlooked. Especially for small-midsize companies, spinning up
another app server is generally easier than writing a whole bunch of multi-
threaded code.

3\. The appserver is generally not the bottleneck, rather, of course, its the
database.

~~~
jchrisa
If you want multi-datacenter consistency, then the best of transaction
protocols will still measure latency in terms of the speed of light multiplied
by the distance between datacenters.

------
j88439h84
As mentioned last time, The modern Python async frameworks like Starlette
would have no trouble handling this amount of traffic.

[https://www.starlette.io/](https://www.starlette.io/)

~~~
derwiki
Exciting! I've never heard of Starlette. I set up example.py and ran the same
`ab` commands mentioned in this post and got around 2600 RPS. I fresh started
the server and threw 100k requests at it with `ab -c 100 -n 100000`, which
timed out:

    
    
      apr_socket_recv: Operation timed out (60)
      Total of 49189 requests completed
    

Same thing when omit `-c 100`. Am I doing something wrong?

~~~
j88439h84
Not sure, I can't reproduce the timeout. I'm seeing "5829.85 [#/sec] (mean)"
on my (nearly 10-year-old) computer which is doing other stuff. Not too
shabby, I think.

~~~
derwiki
A friend helped me figure it out:
[https://danielmendel.github.io/blog/2013/04/07/benchmarkers-...](https://danielmendel.github.io/blog/2013/04/07/benchmarkers-
beware-the-ephemeral-port-limit/)

------
jiofih
With the machine she used, there are dozens of options. Node, go, Java, rust,
crystal, h2o, will all _easily_ outperform those numbers while offering a ton
of features and robust ecosystems.

Anyone can cook up a plain http server, making a real application out of it is
the other 98% of the work.

~~~
derwiki
I tried running as simple an HTTP server as possible with go:

    
    
      package main
    
      import (
          "fmt"
          "net/http"
      )
    
      func main() {
          http.HandleFunc("/", func (w http.ResponseWriter, r *http.Request) {
              fmt.Fprintf(w, "Foo")
          })
    
          http.ListenAndServe(":8088", nil)
      }
    
    

and when I run it with `ab -c 100 -n 100000`, it falls over before 10k
requests:

    
    
      $ ab -c 100  -n 100000 http://127.0.0.1:8088/
      This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
      Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
      Licensed to The Apache Software Foundation, http://www.apache.org/
    
      Benchmarking 127.0.0.1 (be patient)
      apr_socket_recv: Operation timed out (60)
      Total of 6388 requests completed
    
    

I'm extremely new to go so maybe I'm doing something wrong. Could someone help
me understand?

~~~
derwiki
Ahh this explains it:
[https://danielmendel.github.io/blog/2013/04/07/benchmarkers-...](https://danielmendel.github.io/blog/2013/04/07/benchmarkers-
beware-the-ephemeral-port-limit/)

------
bufferoverflow
Language / framework and DB choice does matter for performance. A lot.

[https://www.techempower.com/benchmarks/#section=data-r18&hw=...](https://www.techempower.com/benchmarks/#section=data-r18&hw=ph&test=db)

As you can see, the fastest Python implementation is just 11% of the top
solution in Rust.

And your key/value store choice matters a lot. Redis is well known, but it's
much slower than Tarantool or LMDB. Which is slower than RocksDB or Aerospike
(though take this claim with a grain of salt, they can perform differently
under different loads (writes, reads, updates, deletes) and different number
of concurrent requests).

~~~
enitihas
Redis is slower than LMDB? This sounds very hard to believe, since redis
operates purely in memory, and all the data structures are designed with that
in mind. LMDB does disk writes for every key value set operation. So I find it
hard to believe redis is slower than LMDB

~~~
james_s_tayler
Redis writes it's replication log to disk.

~~~
enitihas
That happens on a configured time interval, not on every request unlike lmdb.

------
_bxg1
I come from outside of the Python ecosystem, so I'm not familiar with the
specific frameworks mentioned (I read the original post too).

Can someone simplify the argument that she's making about them? Something to
do with maintaining separate, blocking threads in anticipation of requests vs
spinning things up as you need them? Or is it a criticism of dynamic languages
as HTTP servers in general (it's not clear what language was used in this
article)?

~~~
lumost
Probably a bit of both, but python and other dynamic languages make doing the
smart thing very difficult. While I continue to work professionally in both
static and dynamic languages, I've observed a few limitations that prevent
writing a fast python webserver

\- The Global Interpreter lock - you can't avoid forking processes in order to
handle concurrent python code.

\- Threads are cheaper than processes. A default java thread carries 2MB
overhead, a python process for a typical app can easily be 2GB without very
careful memory consideration.

\- The pure single-threaded python runtime is 10-100x slower than your typical
statically typed language, even when doing everything as carefully as possible
you'll be 1-2 orders of magnitude off the best implementation in a statically
typed language. Conversely a sloppy implementation in a statically typed
language will probably work about as well as the best python implementation.

\- Foreign Function Interfaces(FFIs) are slow, in Python an int32 consumes 24
bytes of memory whereas in C the int32 consumes 4 bytes. Stringing together 2
C calls that manipulate an int with python would require 4 allocations and 4
casts. Applications that avoid this overhead have to adopt symbolic APIs that
inform an underlying C program how to connect multiple function calls and will
grind to a halt if there is any python control flow e.g. TensorFlow.

Most of these limitations tend to be common to other dynamically typed
languages such as Ruby, PHP, and Perl. While one can theoretically "drop down
to C", it's often not that straightforward in common development scenarios.
For businesses that require high(er) throughput, or low(er) latency the time
spent fighting the interpreter or the associated AWS bills may not be worth
the productivity gains from dynamic types in 2020 when compilers can perform
an incremental build in milliseconds.

~~~
_bxg1
I don't think it's productive to argue that non-dynamic languages are superior
across the board. Everybody knows they're faster. For many people, they're
still the right decision.

I think (or at least assumed) that the post's discussion is more interesting
than that. A sibling comment seemed to be zeroing in on the crux of the issue.
"Python is an order of magnitude slower but it's web server is _two_ orders of
magnitude slower" is a meaningful and fruitful thing to talk about.

~~~
lumost
That's fair, however the python language is two orders of magnitude slower
than it's static brethren for a variety of tasks. Pretty much any time you
need to use the native language in the interpreter. [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/fastest/python3-java.html)

It may be worth looking at how this performance gap has trended over time.
Anecdotally I recall the performance gap being much narrower 10 years ago.

------
samkidman
I tried this out with unicorn and this minimal config.ru which I believe
replicates the author's setup:

    
    
      class TestApp
        def self.call(env)
          [200, [], ['Blah']]
        end
      end
    
      run TestApp
    

ab -n 10000 [http://localhost:8080/](http://localhost:8080/) \--> ~1900 RPS
(~8400 RPS)

ab -k -n 10000 [http://localhost:8080/](http://localhost:8080/) \--> ~1900 RPS
(~26000 RPS)

ab -c 100 -n 10000 [http://localhost:8080/](http://localhost:8080/) \--> ~2500
RPS (~14500 RPS)

I have no idea why it performed better on the concurrent benchmark. This was
on my 2.3GHz i5 mac book pro. I think about 1/5th the performance of something
closer to the machine is quite decent.

------
thedance
FWIW, as a Xoogler, I find I can get pretty far with 1) abseil, 2) grpc, 3)
protobuf, and 4) groping around in every open source Google project to find
the thing I need (for example, until recently, Cord could be found in random
projects but not in abseil). I don't know what it's like for ex-bookfaces but
this gets me a good way to my goal, and as a bonus I get to use stuff I wasn't
able to at G, like C++17, or io_uring, or even jemalloc.

~~~
enitihas
Outside of google, what other C++ libraries do you think help in writing
services in C++. C++ lacks a huge amount of libraries compared to something
like java, e.g, I can't even find a half decent http client which handles all
the important stuff.

~~~
thedance
I don't know. I'd be very squicked out to expose a C++ HTTP client to the big
bad world, i.e. as a web crawler. For private use, technically speaking gRPC
contains an HTTP client.

More likely I'd fork a process in a safer language and direct it to make HTTP
calls.

~~~
hermitdev
Boost Beast seems to be pretty robust. Personally, I'd feel fine using it. I
know the author Vinnie is very active with the community and very receptive
and responsive to bug reports.

------
MapleWalnut
Another great post! I'm curious what people would suggest as a tech stack
that's not python/gunicorn. Something with a GC, decent ORM, that can replace
database-backed services written in Python with similar high level code.

~~~
tinco
Ruby. It's the language with the most intuitive syntax and standard library,
the most ergonomic external libraries and an extremely well polished web
development ecosystem.

Close rivals might be PHP and Node.JS, there's pros and cons to all of them.

If something needs to be very performant you pluck it out into a little golang
microservice.

~~~
cakoose
The standard Ruby VM (MRI) is about the same speed as the standard Python VM
(CPython), aka on the slow end. Twitter famously moved from Ruby to
Java/Scala, largely for performance reasons.

Node.js is much faster because V8 is much faster, but it's still basically
single threaded, so you need to run process-per-CPU, which is what the
original blog post (to which this post is a follow-up) was complaining about.

~~~
tinco
Yeah, the Ruby VM certainly is slow, but that doesn't really matter for most
realistic workloads. In Twitter's case it definitely made sense, but do keep
in mind that besides switching to Ruby they also switched their entire
architecture around to a paradigm that Ruby had no mature ecosystem for at the
time.

Node.JS is much faster both because it has a fast VM, and because it has
concurrency much better dealt with in its standard library than Ruby does
(i.e. a proper event loop is built in and its use is idiomatic).

That's not what the original blog post was about though, it was about how bad
Gunicorn is. I've written Python, but never web apps (since Ruby's superiority
there is obvious), but I don't think it's really fair to judge the whole
language based on the use of some popular library that's shitty. I think it's
symptomatic of Python that the library is shitty, but that's my dislike of
Python shining through. The reality is that there are most likely great Python
libraries for dealing with concurrent web requests, and she didn't bother to
learn them.

On Ruby the community would without hesitation recommend you to run Puma or
Passenger or even Iodine or Falcon or whatever fancy stuff you have nowadays.
These webservers all deal with concurrency, optimizing memory usage, load
balancing and managing queues correctly. They don't have silly things like
fork'ing before importing libraries, because good software architecture is
something that's highly valued in the Ruby community.

~~~
cakoose
> That's not what the original blog post was about though, it was about how
> bad Gunicorn is.

Yes, the post starts out describing an issue with how Gunicorn listens for
connections. Like you said, there are better libraries than Gunicorn so that's
not a reason to jump to Ruby.

However, the article goes on to talk about other things, like the lack of real
multithreading, import-time code execution, and overall efficiency, all of
which also apply to Ruby.

I'm not saying Python or Ruby are a bad choice. It's just that MapleWalnut was
soliciting an alternative that doesn't have the problems described in the
original post, so Ruby doesn't qualify.

