Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Python can make 3M+ WebSocket keys per second (github.com/szabolcsdombi)
83 points by cprogrammer1994 on July 3, 2023 | hide | past | favorite | 70 comments



The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code). And Python has achieved a lot of speed-ups from Py2 to Py3 by adding more compiled C code. This is a real blocker for understanding the foundation library.

On the other hand, in Java, with a few exceptions around Swing (native painting for GUIs) in Java, almost everything is written in pure Java, so you can debug all the way down if need be. It is a huge help for understanding the foundation library and all of its edge cases (normal for any huge library). Modern Java debuggers, like IntelliJ, are so crazy, they will decompile JARs and allow you to set debug breakpoints and step-into decompiled code. It is mind blowing when trying to debug a library that you don't own the source code (random quant lib, ancient auth lib, etc.).


Mojo seems like a very promising solution, despite it being AI oriented. Lex Friedman had a great podcast with the founder, its first version was released about a month ago, and in two weeks got 10k discord users.

A superset of python that, once finished, can run all native python code, but also adds language features to get closer to the “metal”, getting speeds of C libraries with the ability to properly debug and step through it.

If this project goes the way it’s promised (the guy has great experience with language development), this could further cement python as it would solve its biggest criticisms. Very excited for it

https://youtu.be/pdJQ8iVTwj8 https://www.modular.com/mojo https://en.m.wikipedia.org/wiki/Chris_Lattner


Crikey: LattBot strikes again! This guy's footprint on compilers and languages in one generation cannot be understated. His PhD work eventually grew into LLVM and Clang which lit a fire under GCC and MSVC. The C and C++ compiler ecosystem is now far more competitive and healthy.

He has the magic touch. What you describe sounds amazing. And having LattBot behind it is even better.

I have watched one of his episodes with Lex before. He is absurdly humble about his accomplishments.


Likewise, in C#, almost everything is written in pure C#.

It also gives you more control over memory allocations than Java, which is nice when trying to squeeze more performance out of the code.


    It also gives you more control over memory allocations than Java
Cool! I didn't know about this. Can you share an example? It might foster some good discussion. I haven't written any serious C# in about 10 years now. I still love that language. To me, it's like Java with all the rough edges sanded down. In that era, Visual Studio was the only choice for dev and needed the IntelliJ plug-in. I always thought that was a bit goofy, but nothing against the language itself. And, the COM+ integration is legendary if you need to run on Win32.


Value types have gained more functionality in recent versions.

You can quickly create record structs or value tuples (that auto-implement Equals(obj) & GetHashCode()) and pass them around by reference instead of copying.

  record struct RGBA(byte R, byte G, byte B, byte A);

  // allocate a small collection on the stack
  Span<RGBA> colors = stackalloc RGBA[3]
  {
      new RGBA(255, 0, 0, 255),
      new RGBA(0, 255, 0, 255),
      new RGBA(0, 0, 255, 255)
  };

  // get value by reference
  ref var c = ref colors[2]; 
  c = new RGBA(127, 127, 127, 255); 
  
  // prints: RGBA { R = 127, G = 127, B = 127, A = 255 }
  Console.WriteLine(colors[2]); 

  // value tuple
  (byte R, byte G, byte B, byte A) white = (255,255,255,255);
Heap allocated generic collections of value types have better data locality than collections of reference types.

Ref struct is a type of struct that is always stack allocated and cannot be promoted to the managed heap or boxed. `Span<T>` is ref struct.

Spans allow you to create views over contiguous regions of memory that is located on the stack or on the heap or over native memory. You can pass spans to methods that then read/modify the data in the view.

And many other features that help you manage memory or avoid allocations.

For example. When working with interfaces you can avoid boxing allocation of value types by using a generic type constrained to the interface instead of the interface.


c# has structs and span, which makes memore allocation more easy. in java the thing gets easier as well (soon) but java still has no value types and than there is sun.misc.Unsafe which is still a special case and a really really old api, in jdk 19 some new apis were introduced which lifts some of these stuff like panama preview and a new vector api.


Hopefully Java is getting value types and primitive types soon-ish.


if you need speed ups like that, don't program in either.

I strongly doubt any production C# code is significantly faster than its JAVA counterpart.


When it comes to languages you can draw up a loose hierarchy of potential performance where languages like C# and Java are vastly equivalent. Attempting to refine it any further descends into a bikeshedding farce.


Java is faster than C#, that's not "bike-shedding farce", that's a documented, and benchmarked fact

JSon serialization: 20+% faster: https://www.techempower.com/benchmarks/#section=data-r21&tes...

Single query: 20+% faster: https://www.techempower.com/benchmarks/#section=data-r21&tes...

Multiple query: 20+% faster: https://www.techempower.com/benchmarks/#section=data-r21&tes...

Data update: 15% faster: https://www.techempower.com/benchmarks/#section=data-r21&tes...


Why is this being downvoted? Sure the opening line isn't the most gentle, but these are some crazy benchmarks. I never saw this website before.

First link: How is it possible that Java can achieve same perf as C??? I write a lot of Java, and none of my stuff is close to C. I guess about 50% the speed -- which fine for my needs.

One reason (totally unscientific, of course) I think Java is faster: The virtual machine has been open source for longer. So more academics have looked at, run experiments, written papers... that are then read by the core Java team and sometimes implemented. C# is a bit behind, but should catch-up one day.


> The virtual machine has been open source for longer. So more academics have looked at, run experiments, written papers... that are then read by the core Java team and sometimes implemented. C# is a bit behind, but should catch-up one day.

That's what it is indeed, more R&D went into Java

C# will catch up eventually, specially now that they invest massively on PGO/AOT and ways to minimize heap allocations (stackallock and struct reference for example)


Techempower benchmarks are web framework benchmarks, not language benchmarks. It's hard to compare language performance from these benchmarks as there are so many other factors here that affect performance.


C'mon, this benchmark properly benchmark how fast languages are at certain tasks, the library used doesn't matter, the task however matter

A framework is not representative of a language performance, it is representative of the framework's performance, they might not use newest language features, they might use outdated dependencies, developer might have wrote poor code, they might have bugs


> Why is this being downvoted? Sure the opening line isn't the most gentle, but these are some crazy benchmarks. I never saw this website before.

I guess people don't like getting fact checked


because you should really look into these frameworks if you really think its a good comparsion. if you exclude all the unserious java frameworks you will be stuck with vertx, after that there is a huge list of c# and than there is the rest of java. I mean yes, these frameworks might be the next big thing in the java world, but once they are there, they will at least loose 20% of performance. (p.s. you can also remove the first c# entry, because no sane person will write everything as a middleware)


If vertex's json serialization is too slow, you swap that part for something faster, that's how you use languages in the real world; what is this language capable of, and how i can fully exploit it

Developers nowadays seem stuck in a box for some reason


    you swap that part for something faster
FasterXML (Jackson)?


Do you think that relying less in C code could be better for Python and it’s community?

I mean less focus on being “glue” between C code and more focus con optimizing the interpreter?


It is unlikely people would use Python if it didn't rely on C code.

If your ML model takes hours when it can take minutes, or takes days when it can take hours, you will move away. You could move away to another language or a faster interpreter but that's a different discussion.

> more focus con optimizing the interpreter

This is good but there's an upper bound to performance of interpreted languages. Maybe the Python interpreter could be as fast as V8, but it is unlikely to be fast as JVM. People will need to drop down to C / Fortran for whatever compute intensive work they're doing.


> If your ML model takes hours when it can take minutes, or takes days when it can take hours, you will move away. You could move away to another language or a faster interpreter but that's a different discussion.

How much of this is done in Python, vs constructing instructions in Python to run on a GPU at uber speed?


Widely used libraries for python such as SciPy, PyTorch, TensorFlow, Numpy all drop down to lower level languages (C/C++/Fortran).

I haven't seen many people doing ML in Python without one of these libraries, so I'd say it is mostly constructing instructions in Python and offloading the actual intensive work to these libraries (which may or may not run on GPU depending on your hardware).


Creator of LLVM and Swift is doing just that: https://www.modular.com/mojo


If you asked me, Python not only shouldn't rely on (third-party) C library code, but it shouldn't even be used for cases where pure Python is not optimal.


Right, Python is the new Perl and people are misunderstanding what that means.


Python is Bash++.

Always was, always will be.


But as a signifier it’s actually the opposite. Someone that knows bash in 2023 is likely a decent well rounded dev. Python in 2023 doesn’t even imply the ability to program.


Ouch. I know the feeling. Is your thinking: If they can program in Bash, they have learned how to program in many other languages, and Bash is their glue to make stuff "schedulable" via crontab, (dreaded) AutoSys, k8s, etc.? For me, learning how to program /OK/ in Bash has been a very difficult journey -- as difficult as Perl and Python, due to insanely weak typing. You are always fighting unknowns when you receive some function parameters.

Related: When I search for help on a Python foundation library or built-in funciton, my Google search results are overwhelmed by "learn-to-program"-type of websites. I guess it makes sense: During the gold rush, don't dig for gold, rather sell shovels!


Lots of schools / boot camps / etc churning out people who "know python" but are not "python devs". Think - data scientists / data analysts / stats / business analysts / etc.. who need to monkey around in some data, and are like slightly more than an Excel power user.

On the other hand, no one is doing the same for Bash.

You are using Bash because you know how to program something else and need to schedule it / wrap it in a shell script / etc. The fact that someone even knows Bash exists is a filter. Having an idea of where you might need to use it, a second filter. And having successfully done so, a third filter.


The whole point of Python is that it abstracts away the low level stuff and you simply “import” and attack the problem you’re actually looking at. It’s optimal for allowing people who don’t have lots of algorithm design memorised to do higher level comp sci.

Python’s optimal use case does not relate to speed.


Then no one would really use Python then? It's far too slow


On the other hand, you still have to learn and use Java. I think learning Python and a touch of C is easier than trying to learn Java. Heck I'd go so far as to say C is a lot easier than Java. Frankly I don't understand why having to learn a little bit of C to debug a python program is such a red flag or wall to you, considering the sheer amount of learning one has to do to use Java at all. You're setting up an uber-sophisticated IDE to debug "ancient auth packages" instead of just... learning a little C and potentially fixing an up-to-date and beloved library?


> You're setting up an uber-sophisticated IDE

You just install it.

> learning a little C and potentially fixing an up-to-date and beloved library

A romantic thought, but 99% of the time I'm just going to do a workaround or a local patch.

I don't use Java anymore, but I don't hate it. I think it has some verbose conventions, but I vastly prefer it to C's extremely terse conventions.

Nowadays I try to do as much in TypeScript as I can, because I find it a pleasure to use, and it has the same property where you can dive into any lib when debugging.


    uber-sophisticated IDE
I love this. As if any developer is not much more productive with a 4GB+ RAM IDE churning away at their code base and suggesting all sorts of things as you write code. OMG: See ClangTidy! All C++ IDEs these days are either directly incorpating ClangTidy, or copying its features. When I use CLion, as a medium-level C++ programmer, it is scary how good are the suggestions from ClangTidy!


Monkey patching in-house proprietary libraries without source code is also such a nice feature of Java. (Can this be done with C#? I assume yes.)


This post assumes that I don't already know C. I do. The real problem is friction. Being able to debug from Python code into C code is super hard, even in 2023. If you are in the same language (and debug session), the friction is so much lower.


Are python debuggers really unable to integrate with something like GDB? I have no problem debugging native calls from Java. During development I've never seen a case where library source was unavailable. Even proprietary components from other companies come with source included.


GDB on its own does a reasonable job debugging python: https://wiki.python.org/moin/DebuggingWithGdb


I feel it’s less about instrumentation but more about Python developers understanding C code.


No, as a "fluent C speaker", the instrumentation part is much harder than understanding the C code.


> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

This is one big reason I love Pharo and other Smalltalk-language implementations. Being mostly written in themselves down to the VM. You can take a deep dive and inspect everything without being afraid of smashing your head against C bedrock underneath. And they still manage doing this while being reasonably performant and dynamic.


> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code). And Python has achieved a lot of speed-ups from Py2 to Py3 by adding more compiled C code.

"Debugging a Mixed Python and C Language Stack" (2023) https://news.ycombinator.com/item?id=35710350


> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

Aren’t most of these native libraries open source? I’m a C# dev so maybe this is a naive question based on my experience but is there not a way to bring in the source of the C library and debug into it in these “hit the wall” situations?


You can, but then you have to setup and build the libraries in debug mode. You can then debug with gdb or visual studio. It is just a lot more work than debugging pure python.


> Show HN: Python can make 3M+ WebSocket keys per second

> This article is about optimizing a tiny bit of Python code by replacing it with its C++ counterpart.

So it's C++ rather than Python.


The article was posted by "cprogrammer1994", that should've been a hint!


The first "C" stands for "Cython".

/s


C stands for, well, C :) also, I do not use Cython.


Yes, that's one of the ways one is supposed to optimize Python. Writing a tiny performance-sensitive part of a Python app in C/C++ is a possibility that many people do not think about when they decide they are going to develop the whole app in C/C++ and can be a much a smarter choice.


In a way, but at this point your stack is Python/C++, where Python serves the role of a high-level language and C++ serves the role of the fast one. You're not optimizing Python-as-a-language at that point; you're FFIing to another codebase written in a different language.

So, if anything, the title should say "Python, when linked with C++ code, can make (...)"


It’s cool that languages support this kind of stuff.

So if you’re writing python the employs this technique and you still want to keep the “OS agnostic” characteristic of python does that mean you’d have to compile multiple C++ binaries and check the OS to see which one to run?


Yes, though hopefully it's been packaged up by the distro otherwise you'll end up having dependency problems if you compiled against a wrong libc or something else.


Alternatively, you could compile the C++ (or rust or zig) to WebAssembly, and a single binary would run anywhere.


Thanks for this. This is something I wouldn’t have considered.


WWouldn't you lose some of the performance benefit?


Bingo.


This has nothing to do with websockets and much more with doing hashing and Python-to-native calls. It's comparing generating base64(sha1(something)) in Python which I suppose also means "websocket keys".

I'm not sure why the author implemented SHA1 and a base64 digest thereof manually rather than including a small library, but perhaps that was part of the challenge.

Python can generate a whole lot more keys per second if you enable SIMD, multithreading, or even GPU support. In fact, Ryzen / 11th+ Gen Intel/ARMv8A have dedicated SHA1 instructions that should significantly boost performance here. Together with something like https://github.com/WojciechMula/base64-avx512 I bet you could increase the performance an order of magnitude if daw CPU speed were really a concern.

I suppose three million keys per second ought to be enough for any websocket server, especially for a relatively simple implementation of the code.


The title should be “optimization-demo” (original title) or “Replacing parts of Python programs by C++ can be easy and profitable”.

They replace Python code that makes 5 calls into native code by code that makes 1 call that makes those 5 calls, and get a speed up from 869k calls per second to 3.15m calls per second, so a snarky title could even be “Python-to-native calls are slow”.

They could even measure it by adding a C++ version of that

  def magic_accept(key: str) -> str:
    return 's3pPLMBiTxaQ9kYGzzhZRbK+xOo='
code and benchmarking that.


Did you take a look at the C++ implementation of the hashing function they did? I didn't see a single call made there. They replaced python code that makes 5 calls into a single call.


That's a copy-pasted variation of this public-domain SHA1 code: http://ftp.funet.fi/pub/crypt/hash/sha/sha1.c surrounded by base64 decoding and encoding for a known-length binary text.

By inlining all library code you use, yes indeed, you too can also not make a single call.


I didn’t. I expect that it makes a difference, though for the observation that Python native calls are relatively slow.


"...by replacing it with its C++..."

Nice try!


Python can make 3M+ WebSocket keys per second C++ 85.1%


The irony of Python.


Indeed. "I'm a Pythonjack and I'm okay..."


I suspect part of the speedup is avoiding the python included base64 implementation. Third party extensions[1] claim fairly large improvements.

[1] https://github.com/mayeut/pybase64

This particular one also includes b64encode_as_string, which would also reduce some work/copying.


While the article clearly says this is a toy/example and so on, one of the nice points of the Python version is that it doesn't e.g.

- segfault the interpreter if you pass in something that's not a string

- read bogus memory if the length of the string is < 24


Ok now try the python version with Pypy and see how it goes


the tragedy though is websocket tls is what will actually slow us down




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: