Hacker News new | past | comments | ask | show | jobs | submit login
Pydis: Redis clone in Python 3 to make points about performance (github.com/boramalper)
47 points by yumaikas on March 2, 2019 | hide | past | favorite | 27 comments



Not sure what point this is making. It's not pure Python, and even the bulk of it is just testing the performance of built-in data structures. Can't use Cython but it's OK to use uvloop? Want to prove you don't need C but use hiredis to parse?


This was a quite common criticism over r/Python[0] as well, saying that using C modules and then benchmarking "Python" with C isn't fair.

Perhaps it was my wording that caused this confusion, for which I am sorry, but I never meant to compare "pure" Python with C. The point I am trying to make is that, Python with C extensions can be as considerably[1] performant as C code for network or memory bound tasks.

[0]: https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...

[1]: It's of course controversial whether 60% throughput is considerable or not.


I think you have a point, but it's not a terribly surprising one to most people here. This is pretty much the same thing people had already realized with Node.js, after all. I'll even go as far as to say the use of uvloop is fair, it's a reasonably generic piece of software that applies to a wide range of problems. It fits the Node.js comparison well, too.

But personally, I don't see how the usage of hiredis is so much different from literally taking a part of Redis itself and embedding it in Python. As a thought experiment, if I did take Redis's parsing code and embed it in Python, would it still be fair?

I'm not really convinced that a pure Python implementation of the Redis Serialization Protocol would actually be meaningfully slower than the Python bindings to hiredis, so this whole thing may be a bit moot.

I think your project is cool. Perhaps some changes to the README to explain exactly what your benchmarks are meant to show about Python itself would help the criticism.


> I think your project is cool. Perhaps some changes to the README to explain exactly what your benchmarks are meant to show about Python itself would help the criticism.

I agree. I'll edit when I have some time this week. =)


As-in Python is as-fast-as C — when it is C.


Well, hiredis and uvloop already exist and are well tested, so anybody can easily use them with normal Python. Rewriting things in Cython isn't something many Python users would be doing routinely.


Fine, but then I don't see what this program is proving about Python performance, other than that it's actually bad and even basic programs will need to use high performance libraries that are not pure Python. This is in contrast to say, languages like Go or Rust, where most of the time shelling out to C is an issue of ecosystem maturity.


clearly you are suffering from

> Unfortunately many programmers, due to their lack of experience, of some knowledge of computer architecture(s), or of an in-depth understanding of the task they are given

Where you fail to see that C libs being linked into python code is the same as python code performance


I think the most bizarre part of that paragraph is the note about dropping type safety for performance. I mean... Python is very much not type safe. Memory safe perhaps, though that argument is out the window if you are willing to accept arbitrary C extensions. All in all... It seems either I'm missing the point, or the author is.


> Python is very much not type safe

It's strongly typed and mypy adds static typing support too (so you end up with a strong gradually typed language). Would you consider it insufficient?


>It's strongly typed

Sure, it's strongly typed, but I think that's irrelevant, at least if my understanding of the term 'type safe' is correct. From Wikipedia:

>In computer science, type safety is the extent to which a programming language discourages or prevents type errors.

The thing is, a Python program that causes a type error is valid, even if it crashes. TypeError is strictly a runtime error. To me, that is the epitome of not having type safety.

Of course pretty much everything is runtime in Python, by virtue of it being a script language. So you really would need a type checking stage, like TypeScript. But...

>mypy adds static typing support too

In my opinion, MyPy is not sufficient to represent common usages of Python's type system. Even with the recent addition of structural typing, many Python programs just do things that are not easily typed. An example would be, in Django, many things in the core directly patch the Request object, and there's really not a terribly sane way to represent this in a reasonable type system. I feel Python encourages all kinds of type craziness, with metaclasses, 'file-like' objects, etc.

Opinions aside... Since MyPy is not part of Python itself (yet?) it is unfair to call the Python language 'type safe' simply by virtue of a type checker existing, for the same reason one wouldn't call JavaScript type safe (I do acknowledge that both of them definitely have a chance of becoming 'official', but neither have yet, and that's the important part.)

Not strictly related, but there are also other type checkers, too: At Google, we've got pytype (https://github.com/google/pytype) and Facebook has pyre-check (https://github.com/facebook/pyre-check). I've tried neither myself.


I really like this, not just because it makes a good point on performance, but because it also provides a nice, simple, idiomatic example of a service with a custom protocol in python, using asyncio.

I'd like to see the same thing in a few different languages, just for fun - perhaps a Perl or Rust redis, for example.


For Go there's for example https://github.com/alash3al/redix, and I think I've seen others as well.


Redis actually has a surprising number of performance issues with larger data sizes. But most people use it with small value sizes and TCP performance dominates. I could write for hours on how bad Linux TCP/IP perf is with small packets but I'll save that for another time. This is the bottleneck with the benchmarks op has done here (the value size is only 3 bytes in these benchmarks).

Since we're sharing Redis clones, here's my multithreaded one: https://github.com/JohnSully/KeyDB


When that time comes I'd love to hear it.


As much as I love Python, I don’t this example is easily generalizable to other systems.

This takes advantage of probably the two highest performing aspects of Python: raw dict and async (especially under uvloop).

If your app does mostly those things, then yes, Python is a good choice. It doesn’t say much about anything else.


Note that this delegates all protocol parsing to the hiredis C library. Using C libraries is not an invalid way to write Python, of course, but it may limit the generalizability of the purported lesson.


> The aim of this exercise is to prove that interpreted languages can be just as fast as C.

From my limited understanding of Python, it compiles into bytecode that is fed into a virtual machine doesn't it? How can that ever be just as fast as C? Unless the C programmer does a really bad job?


Hello, author here! I'll be happy to answer your questions if any, and I already tried to answer some below.

You might want to check the r/Python thread as well: https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...


I think the get method can be sped up since you're doing look ups twice


Using pypy runtime will be advantageous in comparison with cpython runtime


Wouldn't that defeat the point the library is trying to prove?


(Author here) I don't think it would defeat, but I have tried pypy and the performance was worse (probably due to lack of support for uvloop and hiredis) so I've decided to stick with CPython.

When I have some time, I'd love to port those two to pypy and see how it performs (but porting uvloop would be task on its own!)


The conclusion should probably be that redis is super slow, not that the python 3 implementation is particularly fast. Also it is unclear how python dictionaries would scale for large number of keys compared to redis.


The Python dictionary implementation is among the most optimized on the planet. Given that Python uses dictionaries for almost everything, very close attention is paid to dictionary performance.


That doesn't automatically mean that dictionaries are just as fast with hundreds of thousands or millions of keys. I don't know one way or the other, but it's conceivable that performance could degrade somewhat with large numbers of keys.


>The conclusion should probably be that redis is super slow, not that the python 3 implementation is particularly fast.

That sounds like the definition of confirmation bias.

How is that substantiated?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: