
Pydis: Redis clone in Python 3 to make points about performance - yumaikas
https://github.com/boramalper/pydis
======
jchw
Not sure what point this is making. It's not pure Python, and even the bulk of
it is just testing the performance of built-in data structures. Can't use
Cython but it's OK to use uvloop? Want to prove you don't need C but use
hiredis to parse?

~~~
boramalper
This was a quite common criticism over r/Python[0] as well, saying that using
C modules and then benchmarking "Python" with C isn't fair.

Perhaps it was my wording that caused this confusion, for which I am sorry,
but I never meant to compare _" pure"_ Python with C. The point I am trying to
make is that, Python with C extensions can be as considerably[1] performant as
C code for network or memory bound tasks.

[0]:
[https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...](https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redis_clone_in_python_3_to_disprove_some/)

[1]: It's of course controversial whether 60% throughput is considerable or
not.

~~~
jchw
I think you have a point, but it's not a terribly surprising one to most
people here. This is pretty much the same thing people had already realized
with Node.js, after all. I'll even go as far as to say the use of uvloop is
fair, it's a reasonably generic piece of software that applies to a wide range
of problems. It fits the Node.js comparison well, too.

But personally, I don't see how the usage of hiredis is so much different from
literally taking a part of Redis itself and embedding it in Python. As a
thought experiment, if I _did_ take Redis's parsing code and embed it in
Python, would it still be fair?

I'm not really convinced that a pure Python implementation of the Redis
Serialization Protocol would actually be meaningfully slower than the Python
bindings to hiredis, so this whole thing may be a bit moot.

I think your project is cool. Perhaps some changes to the README to explain
exactly what your benchmarks are meant to show about Python itself would help
the criticism.

~~~
boramalper
> I think your project is cool. Perhaps some changes to the README to explain
> exactly what your benchmarks are meant to show about Python itself would
> help the criticism.

I agree. I'll edit when I have some time this week. =)

------
james_a_craig
I really like this, not just because it makes a good point on performance, but
because it also provides a nice, simple, idiomatic example of a service with a
custom protocol in python, using asyncio.

I'd like to see the same thing in a few different languages, just for fun -
perhaps a Perl or Rust redis, for example.

~~~
PhilippGille
For Go there's for example
[https://github.com/alash3al/redix](https://github.com/alash3al/redix), and I
think I've seen others as well.

------
jnwatson
As much as I love Python, I don’t this example is easily generalizable to
other systems.

This takes advantage of probably the two highest performing aspects of Python:
raw dict and async (especially under uvloop).

If your app does mostly those things, then yes, Python is a good choice. It
doesn’t say much about anything else.

------
jdsully
Redis actually has a surprising number of performance issues with larger data
sizes. But most people use it with small value sizes and TCP performance
dominates. I could write for hours on how bad Linux TCP/IP perf is with small
packets but I'll save that for another time. This is the bottleneck with the
benchmarks op has done here (the value size is only 3 bytes in these
benchmarks).

Since we're sharing Redis clones, here's my multithreaded one:
[https://github.com/JohnSully/KeyDB](https://github.com/JohnSully/KeyDB)

~~~
anitil
When that time comes I'd love to hear it.

------
anderskaseorg
Note that this delegates all protocol parsing to the hiredis C library. Using
C libraries is not an invalid way to write Python, of course, but it may limit
the generalizability of the purported lesson.

------
LIV2
> _The aim of this exercise is to prove that interpreted languages can be just
> as fast as C._

From my limited understanding of Python, it compiles into bytecode that is fed
into a virtual machine doesn't it? How can that ever be just as fast as C?
Unless the C programmer does a really bad job?

------
boramalper
Hello, author here! I'll be happy to answer your questions if any, and I
already tried to answer some below.

You might want to check the r/Python thread as well:
[https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...](https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redis_clone_in_python_3_to_disprove_some/)

------
snissn
I think the get method can be sped up since you're doing look ups twice

------
truth_seeker
Using pypy runtime will be advantageous in comparison with cpython runtime

~~~
rukittenme
Wouldn't that defeat the point the library is trying to prove?

~~~
boramalper
(Author here) I don't think it would defeat, but I have tried pypy and the
performance was worse (probably due to lack of support for uvloop and hiredis)
so I've decided to stick with CPython.

When I have some time, I'd love to port those two to pypy and see how it
performs (but porting uvloop would be task on its own!)

------
orbifold
The conclusion should probably be that redis is super slow, not that the
python 3 implementation is particularly fast. Also it is unclear how python
dictionaries would scale for large number of keys compared to redis.

~~~
jnwatson
The Python dictionary implementation is among the most optimized on the
planet. Given that Python uses dictionaries for almost everything, very close
attention is paid to dictionary performance.

~~~
benekastah
That doesn't automatically mean that dictionaries are just as fast with
hundreds of thousands or millions of keys. I don't know one way or the other,
but it's conceivable that performance could degrade somewhat with large
numbers of keys.

