Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pydis – Redis clone in 250 lines of Python, for performance comparison (github.com/boramalper)
239 points by antman on Nov 15, 2020 | hide | past | favorite | 124 comments


This is essentially some Python glue for C code.

It uses a C Redis client parser (hiredis), uvloop (Python wrapper for libuv) and the Python hashmap implementation.

All the performance critical code is implemented in C.

I appreciate the point that higher level languages can be more than fast enough for many use cases. A tremendous amount of software is written in Python, PHP, Ruby et al after all.

But this is not a good way to make that argument, since it almost refutes it's own point: if performance matters, you apparently implement it in C/C++/Rust.


Hi, author here!

Copying my comment from the previous submission[x]:

This was a quite common criticism over r/Python[0] as well, saying that using C modules and then benchmarking "Python" with C isn't fair.

Perhaps it was my wording that caused this confusion, for which I am sorry, but I never meant to compare "pure" Python with C. The point I am trying to make is that, Python with C extensions can be as considerably[1] performant as C code for network or memory bound tasks.

[0]: https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...

[1]: It's of course controversial whether 60% throughput is considerable or not.

[x]: https://news.ycombinator.com/item?id=19293590


> The point I am trying to make is that, Python with C extensions can be as considerably[1] performant as C code for network or memory bound tasks.

You don't make that point. You instead make the point that calling C code with glue code is slower than C code alone, and the performance hit of using Python as glue code in your case is a 2x slowdown.


What caused confusion is that you put “written in ~250 lines of idiomatic Python code” in the readme file and didn't mention anywhere that you were in fact using a bunch of C libraries to do the heavy work.

The disclaimer section (which contains this information) was added 44 minutes ago. And it's still misleading. For example about hiredis you write:

> Python extension that wraps protocol parsing code in hiredis.

Where is the mention that the parsing code is actually C code?

------

Last but not least you already posted this a year ago on Reddit[1] and Hacker News[2] (edit: someone else did for HN), but only (partially) addressed criticism very recently.

[1] https://www.reddit.com/r/Python/comments/awav6k/pydis_a_redi...

[2] https://news.ycombinator.com/item?id=19287717


I don't like your tone; this was a one-off experiment (that did not even gain much attention back then) that I did for fun, and no one is entitled to my time so I have addressed those issues when I wanted to.

I am not the one who posted here, why so aggressive?


Fair point, here is a pull request that addresses my criticism: https://github.com/boramalper/pydis/pull/15


> as long as the heavy work is done by calling native non-Python code.

Do not merge that. I once realized that grep is really fast, and replaced our custom solution with bash script, but I still claim that app is written in idiomatic PHP/python/ruby. This project has a lot of problems that it depends on C code is only a symptom. Noting all those bad things in a project Readme is a trap I fall into way to often it never works.


I don't think the tone was hostile and if you ignore any hint of tone you might hear, you have to admit that the title is really click-baity (yes I understand that you didn't post it). Regardless of the criticisms of the project, it's great that you're experimenting and I would be curious to know what's causing the reduction in performance ... is it only part of the program or is that the performance of the Python code being interpreted.


> I would be curious to know what's causing the reduction in performance ... is it only part of the program or is that the performance of the Python code being interpreted.

Mostly the latter but also a bit of the former: I could have optimised it further — at the expense of idiomaticity — but I believe gains would be marginal. Other than that, Python is not that fast so you pay some penalties there too.


Don't listen to these mindless code golfers. Your experiment does indeed prove your point imho.


From your README.md:

> pydis is an experiment to disprove some of the falsehoods about performance

and your comment:

> The point I am trying to make is that, Python with C extensions can be as considerably performant as C code for network or memory bound tasks.

So your point is regarding the falsehood belief that python with heavy use of C extensions is (not) much slower (in this case ~60%) than C? I don't think that such a falsehood belief exists and therefore pydis disproves nothing.


You're missing the point. Obviously, you can write your application in Cython and have decent performance.

The entire purpose is to demonstrate that you can build a reasonable performance application built on high performance libraries and stick with idiomatic Python code.


This, one hundred percent. There's 3 tiers of things happening here.

1) Write Redis in C. Hard for some devs who only know Python, but good performance.

2) Write Redis in Python with C libraries. Doable for a Python dev, performance hit.

3) Write Redis in pure Python. A fun experiment to see the performance limitations of Python when compared to C running on the same machine.


The thing is that the title/project makes you think it's 3), when it's actually 2).

It feels a bit sensationalist to me, and I see this all the time with projects these days. It's kind of disappointing.


But the claim is that he is dispelling common falsehoods. Are there actually people out there saying that python glue with c libraries for performance critical parts is going to be orders of magnitude slower than an entire thing written in c?


I actually don't like Python and much prefer Java, but I think #3 is really only for performance research and like, fun and games. It would be cool to have a quantified answer to "just how slow is Python"? I suspect the answer is many orders of magnitude slower.

However, if the experiment in question is, "using python glue code, can a developer string together a service that replicates the functionality of redis", then that illustrates what would actually happen in the real world.

For example, for a very junior Python developer, writing something in C is basically like asking a C developer to write assembly- possible, but they'd have to learn a lot of different things to do it.

Similarly, when I write Java, I want to be calling Arrays.sort(myarray) instead of writing my own sort (if the focus is on developer convenience and development velocity).

So, it's just a difference in what you're desiring to measure.


> You're missing the point. Obviously, you can write your application in Cython and have decent performance.

"Decent" is a weasel word, which suggests that the performance hit of following this approach is tolerable or acceptable. Sometimes it is, sometimes it isn't. It really depends on where you place the glue code, and where you expect the hot path to be.

In the case of a self-described "Redis clone" which ends up being Redis itself glued back together with Python, I'm not sure you can make that case.

> The entire purpose is to demonstrate that you can build a reasonable performance application built on high performance libraries and stick with idiomatic Python code.

I'm sorry to say but that isn't much of a demonstration. I mean, Python is already widely used as glue code to hold together high performance libraries, and it's already widely known and understood that gluing together code with Python brings in a significant performance hit. And that's ok, because people do it as a tradeoff between performance and some other kind of value-added.

But in this case I fail to see what value-added is being proposed, other than the personal portfolio side of things.


> You're missing the point.

Well, your claim is that the author has missed the point, because I just quoted the authors point and his intended purpose. So I guess we both agree.


> I don't think that such a falsehood belief exists and therefore pydis disproves nothing.

You don't think such a falsehood belief exists, therefore you don't think pydis disproves anything.


Some of the comments are nitpicking, for sure. A good benefit of Python is that it abstracts a lower-level language into a higher-order one. So that you get some percentage of the raw speed of the lower-level language but in a "easier-to-use", "easier-to-reason-about" and safer format.

On some level, it's like complaining that C# is just using C++ modules because the CLR that interprets the C# byte-code is written in C++.

But it misses the point. You wrote this in Python and you were able to leverage the speed of a much more complicate language for your benefit. Without touching the lower language and without worrying about it's memory-management or lib dependencies. This is a win, and it's part of the beauty of programming!

Btw, my compliments to you on writing this! Really cool project/poc.


This experiment reminds me of a rant from Ryan Dahl on software a few years back:

> In the past year I think I have finally come to understand the ideals of Unix: file descriptors and processes orchestrated with C. It's a beautiful idea.

We just keep moving up the stack, it seems :)

Source: https://gist.github.com/cookrn/4015437

(edit: formatting)


The older I get, the more I appreciate the simplicty and power of C, text files, a terminal, and TCP/IP


If you ran it would pypy would it be faster?


I think the use of uvloop is reasonable, since it's not something domain specific to Redis.

But doesn't the inclusion of hiredis invalidate the entire point? It's 7500 lines of C specifically for working with Redis.


I agree with this.

If you write an application in Python that spends the majority of the time in C-based extensions – for example if I were to write a differential equation GUI using numpy and pyqt – then I think it's still valid to say that I wrote the application in Python. In fact even most of the standard library modules are written in C (e.g. zipfile, io, sqlite3) so it's hard to avoid this. You could say that no program is ever really written in Python but I don't think that's a useful point to make.[1]

But if you write a load of C code that's very specific to your application, wrap it in a Python extension, then call it from Python, then I don't think you could argue that you "wrote it in Python". If someone happened to have written that application-specific extension module for you then the situation is not materially different.

([1] A more useful motion is whether your code is vectorised. That is, if you want to do something with a data structure (e.g. mutate each element, find an element, do matrix multiplication), do you have to loop over the elements in Python or do you make a single call into a library that does the looping in C? Either way you're still ultimately going to be calling C code, if nothing else in the Python interpreter itself, but the distinction is obviously still important.)


The Python zipfile module is actually just that, Python. It performs well enough, with compression zlib is the limiting factor anyway.


OK, by my own rules that means it's "written in Python" then. But in that case what I actually meant is that majority of time in calls to zipfile is spent in pure C, not in the Python interpreter loop, and if it's based on zlib then that agrees with what I meant.

(There is an exception I didn't mention: if the zip is encrypted then the zipfile module falls back to pure Python code. The difference is very noticeable!)


uvloop also unfairly lets Python take credit for concurrent request handling performance too though, no?


I'm not sure I see what is the problem here. Gluing together pieces of C code is one of the main reasons to use a scripting language like Python.


The problem is how the author frames it in the README: "an experiment to disprove some of the falsehoods about performance and optimisation regarding software and interpreted languages in particular", "unfortunately many programmers [...] spend countless hours by making life harder for themselves in the name of marginal performance gains, often trading many other conveniences (such as type safety)".

Between the heavy use of C libraries and correctness issues (https://github.com/boramalper/pydis/issues/12), I don't see how this project proves that Python is faster or safer than other languages.

That being said, the speed with which it was developed can certainly be seen as Python's strong point.


I definitely see both sides. Python is meant to be used exactly like this. It's an exemplar of python-as-library-glue.

I also saw the title and clicked on it because I thought to myself "no fricken way...in pure python?" It's a touch baity.


It's a failed experiment — an experiment nonetheless.


Yes but if to achieve good performance in Python you have to implement all the domain specific code (in this case the redis protocol) in C then you:

a) aren't writing Python

b) invalidates all of the points for writing in Python. i.e speed of implementation, etc.


> All the performance critical code is implemented in C.

Isn't this why people like python?

Just glue together some libraries that someone else has written and then you write a viral blog/twitter post about your <insert trendy thing> that you wrote in just 10 lines of code?

Seriously is 't this the main reason people use it? I.e. it has a big library of stuff so it is easy for beginners and/or non-programmers to use?

I'm struggling to think of any other reasons to deliberately pick python to start something new in, although it is at least a bit more same now with types.


Python is an extremely expressive language, that lets you prototype and experiment very rapidly. In many cases, with some cleanups, refactorings, and calls to C modules like numpy, you can convert those prototypes into sufficiently performant real implementations.

Here's a recent write-up of mine where almost all of the time is spent in C extensions but where it was much faster to develop and code in python: https://www.jefftk.com/p/sharding-the-brigade


In my professional environment we use python because of the cost, definitely not because we want to post it anywhere.

And this is not an opinion on how companies measure cost or how one language is better than another but maybe trying to explain, and I think underestimate, popularity of Python needs a little bit more insight.


All the reasons you list are indeed reasons to use Python, but none of them require performance critical code to be written in C.


The important technical question is how good a Redis replacement can be built with 250 lines of Python "glue code".

If performance is good enough, where are robustness and features? How large are the effort gap and the quality gap compared to the original Redis?


I get the point of hiredis and uvloop and agree to some aspect, but including the python hashmap implementation as "implemented in C" feels like claiming python programs do not really exist because in the end it's implemented in C/Assembler/Running CPU instructions.


I think that's because people are kind of forgetting what the concrete thing being discussed even is.

A programing language can be thought of in the asbtract, as simply a syntax for computation and programming.

But if you go in the concrete, a programming language implementation translates language into machine code for a particular machine.

From that angle, the Cython interpreter/compiler is pretty bad at doing that. This experiment actually shows it. It is 40% slower even if all it does is orchestrates the calls between C code.

This is true of hash map as well. Python is so bad at generating efficient machine code from its own language syntax and semantics, that it would be unreasonably slow to implement hash map in Python. Why? Because Python is really slow.

In order to speed up programs written in Python, you need to write more of your program in some other faster language, such as C.

Now if you reuse a bunch of components written in fast languages, like hash map being implemented in C, well you avoid the need to write the C code yourself. But it doesn't make Python any more efficient.

In practice though, you might be able to not have to implement 90% of your application, because there exist C libraries that do most of what you already need. Thus only 10% of your application will have degraded performance due to using Python. And for that 10%, it might be Python is a great choice, because you like the language and productivity, readability and all it gives you in order to implement that 10%, and can deal with 10% of the code being slower.

But if there doesn't exist existing fast libraries you can leverage, well more and more of your program will be Python, and thus it'll get slower and slower, unless you start using C yourself and implementing your own C code libraries.

Yes, that's true of hash map as well. For example, if for any reason you needed a hash map that behaved differently than the standard one, you would most likely need to write it in C to get any performance out of it. Writing it in Python would be just too slow.

So to recap, Python is a slow language, because there are no known interpreter/compiler for it that generate fast machine code for some real machine. This is not a falsehood, but the current reality. But it turns out most applications can leverage existing C libraries for a large amount of their functionality, thus making it that you can build an app that is reasonably performant, with the only code you need to personally implement being implemented in Python.


A better point would be .NET 5 server implementation of gRPC.

https://devblogs.microsoft.com/aspnet/grpc-performance-impro...

However Python isn't never going to get there.


But the operations are all done on Python's data structures?


aren't they also written in c though


Ya. This whole post confused me. I thought it was claiming that the Python port was faster. It's too early in the morning.


That's taking it too far though isn't it? I mean, gcc is written in C++ too so what? :)

You might also be interested in my other comment: https://news.ycombinator.com/item?id=25101200


While what you said is true in general, I believe it is debatable in this case, as Redis (at least the subset implemented in this experiment) is something very very special, it is essentially a hashmap attached to the network. There aren't much stuff left if you offload the "hashmap" part and the "attached to the network" (I/O + protocol parser) part. In other words, in this case the hashmap implementation itself is the business logic of your application.


The really annoying thing is is that you can quite happily write shorter code in D than in Python and have it run as fast as C++. One of the reasons D compiles so fast is because of this - the compiler isn't even specifically optimized for speed.

Everyone seems to think that writing expressive code is slow, when really it's because Python is slow and C++ is a lumbering behemoth.

Python isn't even a high-level language, think about it - it's higher level than C but the capability it has for abstraction it gives you is really just interpreted C with classes.


Why did D never take off?


Bad timing and the GC


It could be significantly faster if using io_uring


This is a straw man argument. Python's associative array code is written in C, not in Python, and Redis is written in C. So you're comparing C to C. The 40% loss in performance is due to Python being much slower at doing the stuff that's actually written in Python.


Although cPython and Redis hashtables are both implemented in C. cPython's open addressing based hashtable[1] is far superior to the simple chaining based hashtable[2] in Redis. Python's performance is heavily dependent on the performance of its hash table implementation that it has been optimized over and over decades now.

[1]: https://github.com/python/cpython/blob/master/Objects/dictob...

[2]: https://github.com/redis/redis/blob/unstable/src/dict.c


Well of course it is calling C. The point of this argument is that you don't have to go all in and write your code in C to get good performance. You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.


>>You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.

In this case we're talking about a >2x slowdown.

The service also has fewer features.

You're talking about development time in a context where development time is largely irrelevant, as this is a service that people reuse and pick based on performance, because lower performance means higher costs to scale enough to meet requirements.


He is comparing a version 4 release of a product used by millions, and a POC I suspect of you took a look at early release of redis performance and features would be a lot closer.


OP specifically did a Python Vs C comparison. Your assertions have nothing to do with the claim behind discussed.

Even so, if you really want to discuss specific technical aspects, you should do well to keep in mind Python's notorious and widely known performance problems, such as those due to Python's GIL, and the fact that even in performance-oriented benchmarks Python lags way behind other languages, specially C

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

If it was easy or possible to mitigate Python's notorious performance issues then we would see the results in these synthetic benchmarks. But we don't, no matter how much time has been devoted to them.

Speaking as someone who writes Python for a living, Python aficionados would do well by avoiding misrepresent Python in ways that a) it's easy to verify and refute, b) goes against it's main technical characteristics. Python is awesome to chug out quick and dirty POCs, exploratory code, glue code that ties together performance-critical parts, utilities, and non-performance critical applications. Performance-critical applications is not, nor it ever was, Python's thing. Once we see Python addictions dos trying to claim that their hammer is the best screwdriver around, we start to sell a losing proposition.


>You can just write good Python and get almost as fast code as C that will take you 5 times less time to develop.

Sure, if someone has done the actual core work in C already that is.


Which, for uvloop, somebody already had - for a completely different purpose.

I think most people are usually quite surprised at how concentrated and how generic most hot paths can be. Theres a heavy power law distribution for the linesof code most code spends its time on.


This thing has all kinds of thread safety issues which if were actually addressed would make the implementation significantly larger, and slower. I'm not really sure what the point of this is other than to say that python dicts are pretty fast. But we already knew that


Redis itself is single-threaded though (or was, it's only "mostly" single-threaded now)


Yes, but the point is that you can only do that in some special cases like this one.


But isn't this exactly what the author is demonstrating? You can use the higher language of Python with its garbage collection, and since the critical parts are in C anyway, it ends up being competitive to the pure C implementation.


"competitive"

Right, so let me get this straight, the argument is:

You can implement an arbitrary system in python and it is 'fast enough' to be usable and 'better' in terms of 1) speed to develop, 2) lower complexity (ie. lower LoC, easier to maintain) and 3) the garbage collection doesn't matter?

I strongly disagree.

I've worked in python for a long time, and it's a great glue language... but, it's not suitable for implementing high performance systems. Flat out.

Not. Suitable.

If the system you're developing is a mild variant on 1) something that already exists and 2) is implemented in a lower level language, then yes, python is a reasonable glue language to link together native modules.

That's why many of the machine learning frameworks use python; because it's great at allowing you to express 'high level concepts' using low level primitives.

However.

It is not suitable for implementing low level primitives; because its too slow and single threaded.

So... you might argue that this redis implementation uses enough pre-existing code that someone else has written that it is reasonably performant, but... once you go beyond the 'trivial' implementation that uses someone else code, you'll find it's really not suitable for this kind of use-case.

I love python; but this is... it's just wishful thinking.

Just because you like python, does not make python suitable for every workload.


You say that Python is unsuitable here but isn’t that subjective?

60% of the speed of Redis with its core functionality could be more than suitable for some.

> It is not suitable for implementing low level primitives; because its too slow and single threaded.

Redis was single-threaded for much of its life. That didn’t stop it from excelling.

I’ll add that I also have worked in Python predominantly. One of the things that frustrates me about it are the packages that use lower-level language bases that need compilation during a ‘pip install’. Hunting down dependencies gets old pretty fast when you were expecting ‘just Python’. For those that are ok with that and the performance hit, Python can for sure be a suitable tool for use cases that would traditionally be tackled lower down the stack.


Not really. Python is sufficiently order-of-magnitudes slow, that it is not possible to implement pure python low level primitives.

There are no pure python low level primitives; everything is either a) wrapper, or, b) slow as hell and uses memory like a hog.

Python that wraps another language is a perfectly good way to doing things; but those low level primitives are never written in python.

...and neither are the low level primitives used in this case (—-> https://github.com/redis/hiredis).


I think we’re on the same page here with thinking Python wrapping another language is an alright way to do things.

There might be room for going lower with Python via interpreters like PyPy. Memory usage will still be high but speed will improve and you get the benefit of Python’s ease of use. For some that’s what matters most.

Personally I’m looking for a language that marries Python’s ease of composition and simple package building with typing that speeds up development in an IDE. I haven’t found that language yet. I’d be interested to hear any suggestions you have if you’ve gone down that road.


That's not how it's framed though. It mentions nowhere that most of the work is actually done in C and he literally states that "The aim of this exercise is to prove that interpreted languages can be just as fast as C" which is just incredibly misleading.

Or look at the start of the readme, he claims that he wants to disprove "some of the falsehoods about performance and optimisation regarding software and interpreted languages in particular". What falsehood exactly? To me it seems he intended it to be "interpreted languages are slow", but he doesn't disprove that at all.


Well, 40% slower isn’t exactly what I’d call competitive...


In what universe? People use Java, C#, Go, Node etc services that are 40% or more slower than equivalent C ones, and they're just fine with it...


Yes, it’s probably fast enough to be perfectly usable.

The point here is that implementing new features will be impossible, because this relies on an existing implementation that is not in python.

It is therefore not a proof that arbitrary high performance applications can easily be written in python.

It is simply an example of how high performance applications based on existing implementations someone else had written in another language can be wrapped in python.


>The point here is that implementing new features will be impossible, because this relies on an existing implementation that is not in python.

Isn't that the opposite? Implementing new features will be easier, because it uses the C backend a helper library (for parsing, eventing), so all the business logic is Python which is easier to extend.

This is also why people use embedded Python/Luc/etc in games, 3D programs, and so on.


If it was pointed out to their higher-ups that their infrastructure costs could be reduced by a sizable amount by changing language they might not be so fine with it.


The last thing most higher-ups care about are infrastructure costs and optimizations.

And in many cases you're the higher up, and startup founders take decisions to use slower but more flexible technologies every day...


No, because you'd rely on the fact that the time critical execution paths in your python application happens to be in C. This is clearly not true in general and therefore pydis just demonstrates a special case.


Not just that but hitting 50% or 60% of the target performance is usually not that difficult. It is always the last 10-20% of extra performance that are the ones really difficult to hit and the ones that might influence design decisions early on. Some of these frameworks really have tiny performance differences in the order of single digit percentages. Hitting 60% of any framework's performance is not really a feat.


What if you run it in pypy?


So, if my python program uses a lot of dictionary operation, I can claim that my python program practically written in C?


All the code of this project is in Python, so who cares about what it is using under the surface?

With that reasoning, you're actually comparing machine code to machine code. Why even compare the performance of any languages at this point, since you're comparing machine code to machine code in the end?


> All the code of this project is in Python, so who cares about what it is using under the surface?

Because this can be considered a special case, where the required critical logic is available as a library, which is not something that can be assumed as a general case in product development.


This feels a bit like “Thinking quickly, Dave constructs a homemade megaphone using only some string, a squirrel, and a megaphone.”


Hi again, author here!

I am so pleased to see your interest, thanks a lot! I have addressed a few points here, and just updated the repo, so please keep that in mind while browsing the comments:

1. Some people found the README a bit arrogant/bratty --- which I agree, sorry! --- so I have edited it.

2. Some people found it clickbait-y it says Python whereas there are two C libraries (uvloop, and hiredis) for heavy-lifting the event loop and parsing (respectively.) I have added this fact as a disclaimer in the README, and ultimately it is up to you to decide whether that counts as cheating or not. :)

3. I have addressed some major correctness issues too (#12) but it is not 100% correct still.

4. This is a proof-of-concept that I wrote in a day (+ spent today fixing some bugs) that you are comparing to an 11 years old industry-grade software. =) Don't take the experiment too seriously!

Kindly,

Bora


I hope you didn't take too many of these comments to heart. You unfortunately got the standard HN comment section treatment – a delightful mix of nitpicking, assuming bad intent, and completely missing the point. I also hope you continue to share things on the internet – it's far easier to criticise than it is to create.


Thanks for creating and sharing this! Always easier to criticize than create. I for one think your code reads as quite pythonic. Using convenient python wrappers for c libraries is the raison d'être for python in the first place. See numpy, pytorch, ...


Very cool! Making C integrations available with nice syntax and managed memory is such a huge win for Python. This POC glues together a hashmap implementation, async event loop and redis protocol parser (all in C) into a very readable (and quite performant) program. You could probably add pub/sub support in another 50 lines as well, since asyncio is already present.


I appreciate the example code. I didn’t know that asyncio had a Protocol class that you can so easily just extend to make stuff like this. I’m saving this away for future work.


Pydis code is awesome: https://github.com/boramalper/pydis/blob/0a51b6e31475dab083d...

I've been a Python dev for roughly a decade and had no idea about asyncio.Protocol, asyncio.transports.Transport, or that asyncio's event loops could be used to create a TCP listening server.

I always thought of asyncio as synonymous with async def and websockets. But perhaps the websockets should have clued me in that it's an actual server protocol.

Between this and "On the beauty of Python's ExitStack" (https://www.rath.org/on-the-beauty-of-pythons-exitstack.html) it's been a good week.

I was hoping that it implemented a Redis protocol reader, but pydis uses hiredis.Reader(). I wonder if there's a pure python version...

./benchmark.sh produced some warnings related to a lack of CONFIG GET, so I added a basic one just now: https://github.com/boramalper/pydis/pull/13

The benchmarks are indeed pretty close to Redis on my MBP!


asyncio was introduced only recently, roughly python 3.4 and above. There wasn't anything for you to see.

I'd wager it only became usable very recently since most development was stuck writing code compatible with python 2 and python 3, precluding the usage of asyncio.


The problem with inheriting from asyncio.Protocol is that it ties your code to the callback-based low level asyncio API—I'm not surprised that this package doesn't have any tests.


> falsehoods about performance and optimisation regarding software and interpreted languages in particular.

At this point, I think this argument will fail to convince many. There are far too many people who have gotten 10x, 100x or even higher speedups by simply converting Python to other languages. That is without even trying to optimise the code much.

CPython running idiomatic Python code is just slow, no amount of handwaving will change that.


BTW, first version of Redis was actually built in equally slow TCL and later rewritten in C.


Curious as I'm already in the industry, but I had a look at this person's GitHub profile and it says that he's still looking for graduate opportunities (finishing uni soon).

Someone with this level of calibre should easily get a job after graduation right? Or are college graduates these days trained to a level where this type of project is expected hence it's still competitive to find an entry level role after college?


I’d rather take the humble exploratory academic than a cocky programmer. Just as an example he writes:

“ Unfortunately many programmers, due to their lack of experience, of some knowledge of computer architecture(s), or of an in-depth understanding of the task they are given, spend countless hours by making life harder for themselves in the name of marginal performance gains, often trading many other conveniences (such as type safety, garbage collection, etc) too.”

And then he himself doesn’t use type safety making the text very superfluous. As others have brought up he then uses pythonwrappers for the heavy lifting. It’s just a very shallow thought process that you are supposed to hammer out in the university.

Modify the README to just say: I wanted to scratch an itch so I built PoC for a Redis alternative, and it would have been an amazing achievement. Now? Not so much.


> it would have been an amazing achievement

250 lines of Python code wrapping C libs is an amazing achievement..?

This is so small I wouldn't even classify it as a project personally.


Author here:

I am not arguing that this is an amazing achievement, but I think you are being overly dismissive here, which I find unfair, and think you are missing the point: It is 250 lines because I was experienced enough to implement it in 250 lines.

Again, it's a small proof-of-concept, but the concepts employed there aren't necessarily beginner level:

1. Asynchronous I/O using `asyncio.Protocol`.

2. Sans I/O networking, see https://sans-io.readthedocs.io/ .

3. Python performance optimisations (e.g. knowing where to choose `collections.deque` over `list`)

4. Knowledge of the modern Python ecosystem (e.g. using uvloop to for event loop).


I was responding to the parent comment saying it would have been an amazing achievement. I haven't passed any judgement on the code, and I never stated it was beginner level.

> It is 250 lines because I was experienced enough to implement it in 250 lines

If you had written a clever algorithm I would agree, but the code seems pretty straightforward. I'm not really sure how being inexperienced would make someone unable to implement this.

Yes, there's some knowledge required to do what you have done but given that the implementation seems relatively straightforwards and it doesn't seem to have much substance (Only 250 LOC + you said you wrote it in 1 day) I would _personally_ not classify this as a project.

I think you are reading too much into an offhand comment.


I am sorry to see that you presume I am a cocky programmer. :)

> And then he himself doesn’t use type safety making the text very superfluous.

You are partially right, in that I did not use type annotations thoroughly in my code. Though what I had in mind was that there are much "better" languages out there (compared to C) with strong static/gradual type systems.

> As others have brought up he then uses pythonwrappers for the heavy lifting.

Indeed, and you can read my answer to that here: https://news.ycombinator.com/item?id=25101200

> It’s just a very shallow thought process that you are supposed to hammer out in the university.

If you specify what you refer to by "it", I would be happy to respond but it feels more like a personal attack now.


Getting a job as a programmer is less about being a savant programmer and more about your soft-skills like communication and connections, at least if you're aiming to work in a team environment.

It helps to be a great programmer in order to get hired, but I never found that to be a requirement, you can always teach people to be great programmers.

But if they are not great at communication or have other "personal faults" like easily irritable or arrogant, it's much harder to help them improve without spending a disproportionate amount of time helping them.


Landing a job has very little to do with programming skill and very much to do with networking and the way in which you present yourself and what the company is looking for. So the answer to your question depends entirely on external factors.

Python jobs in general seem to be extremely rare compared to frontend JS and backend Rails, which I found disappointing several years ago.

But python jobs do exist, and the author should find no trouble landing one with "Cloned Redis in 230 lines" as a bullet point. :)


Jobs these days are just sending you back an automated coding test on HackerRank/Codility after you apply, so finding a job has a lot to do with having some programming skills.

There's quite a bit of python jobs coming up in the past few years, if we're talking about US tech hubs. Location is everything.


>Or are college graduates these days trained to a level where this type of project is expected

Well, I would be unpleasantly surprised if wrapping python's dicts with a simple API was beyond the capabilities of an average college grad.


Author here, I have recently been offered a job at Stripe, but did not update my bio. Thank you for your kind words!


Others have already given possible reasons, another might be that they don't want to accept certain jobs or opportunities and are overly selective. They may not want to work yet but are adding any offers to the pile to negotiate better later.

I know a similar person who is good with designing distributed systems and programming in general. They had a offer from facebook but declined. They want to get a good paying cybersec job but alas there aren't any at good companies. Currently working on gigs and being broke.


As a fresh grad I'd say the latter. Especially if you are applying for Software only roles in large web companies. Side projects often count for nothing in these cases. Its more important to be able to whiteboard.


Yeah, yeah, Python is fast... nevermind that my program (that does real work) that I ported from idiomatic Python to idiomatic Rust became around 140 times faster. Real world doesn't matter, synthetic benchmarks (with clever calling of C code) are the way to go :)

PS. The Rust version is also easier to read, maintain and understand (for someone who groks algebraic types and Rust in general), not to mention a lot safer.


Speaking of real world, in your case, would the Rust version come into existence without the Python version being written in the first place?

Here's a real world example:

https://github.com/sripathikrishnan/redis-rdb-tools - python, optional C library dependency to liblzf, fully-featured, pure-python performance is decent enough when using pypy

https://github.com/badboy/rdb-rs - rust, inspired by the former, not fully-featured, abandoned, may or may not have great performance

Rust is great, but I just want to contend that the real world picture is a lot more nuanced than how you painted it.


I don't know how I painted it in your eyes, but at least my intention was exactly to point out that the real world is nuanced.

I never said that python should not exist. I celebrate its strong parts like being a great first language and maybe, if you know what and why you are doing, great language overall in some specific situations.

My comment was mostly against trying to pretend that Python is fast, which it is definitely not, and everyone who knows another language can very easily prove it to themselves and to the world.

Regarding this specific library... Redis is not really very useful to a Rust program, because one usually loses performance by using it, instead of gaining some (which would be the case for a Python program). One would either use native data-structures for simple things (read, where Redis would be useful) or a "real" database if something more complex is needed.

If working with Redis (and its dump files) was some real priority, the rustlers would come up with something with equal quality to the python library you linked. Maybe something like this, but for Redis dump files: https://github.com/serde-rs/serde


I spend a lot of time giving advice to people changing careers into tech. I agree with a lot of the criticism here, but I think the author's choices are reflective of the new grad hiring climate.

Hiring for any competitive tech job relies on having a stellar resume with impressive sounding projects, and ideally a degree from a top CS school. Interviewers have little time to review projects, so new grads market small projects like this as much more impressive than they are, knowing they'll only be asked about them in passing as a break from algorithm problems.


You aren't benchmarking a Redis implementation, you are benchmarking a Redis wrapper


This hack seems to be missing a whole bunch of redis functionality that makes redis suitable for production workloads (clustering, persistence just for start). And I wonder what performance would be like once all that stuff is added in.

Still, nice and easily readable program - great for education.


The thing we need to time is - how long did it take the author to write the code vs writing in C. Number of lines of code is a proxy, but not always accurate.

The emphasis on expressing ideas clearly, sometimes at the cost of performance has been a good bet for python in terms of data science fueled market share. But it does open the door to some competition.

I'm also curious how this performs versus similar code in Julia

https://github.com/markmo/HiRedis.jl

This could help understand the sweet spot between runtime performance, time to code and understandability and type safety.


HiRedis.jl is a redis client, not a server. So not directly comparable without a server implementation.


There whas a moment when Tarantool[0] was looking for market niche and was checking Redis' turf. They published performance tests with 2-3% upside. It was done by a team of experienced C devs spending months trying hard to beat original. 40% is a very big gap for that matter.

[0] https://www.tarantool.io/


I've had a nearly-pure Python "toy" implementation for a long time. The source for it isn't public, but it's based off this: https://github.com/rcarmo/miniredis

I use it as a single-file, mock Redis server for CI testing :)


As there is much of the same criticism here as there was March 2019, you might find the last submission interesting: https://news.ycombinator.com/item?id=19287717


Reminds me of http://www.underengineering.com/2014/05/22/DIY-NoSql/

What I'd love to see in Pydis is durability option.


Well, 2x the speed to use a lower level language doesn't sound like a bad deal to me, especially given we have a few fast, modern languages which are as ergonomic as python


I believe you can turn off garbage collection in python and manage it yourself? How much of a speed up would that give you?


How do you do that?

Even if you can, it's unlikely to give you a 10-100x speedup, which other languages can often give you compared to Python. The garbage collector does not take anywhere near even half of the CPU time.


import gc

gc.disable()

Call gc.collect() when you’re at a good point to afford it and need it.

Ive done this before in really deep inner loops where speed is important and shaved some significant time off.

It’s also useful if you have something time critical and don’t want to be interrupted. Eg Profiling, etc.


This affects only the cycle detecting part of Python's GC, most of the work is done by the refcounting system which you can't stop or pause.


To reply to your point. Others were saying the python is slower because of garbage collection. I’m just suggesting they try turning it off. Big fat downvote though :-(


Maybe if author will try to write README in a less arrogant tone then performance will be even better, who knows.


One line bash file clone of redis:

./redis.exe





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: