Hacker News new | past | comments | ask | show | jobs | submit login
How many lines of C it takes to execute a + b in Python (codeconfessions.substack.com)
293 points by taubek on Dec 11, 2023 | hide | past | favorite | 210 comments



A while back someone posted their patch to cpython where they replaced the hash function with a fast one and claimed this dramatically sped up the whole Python runtime.

They claimed that the hash function was used constantly —e.g. 11 times in print("hello world")—because it's used to look up object properties.

Apparently the default implementation is not optimized for performance but for security, just in case the software is exposed to the web. None of my Python programs are, so assuming all this is true, I'd much prefer to have a "I'm offline, please run twice as fast!" flag (or env variable).


This may be a somewhat uninformed opinion, but I think CPython is just straight up not particularly good software. There are a million and one optimizations that other major scripting runtimes (V8, LuaJIT, PyPy, Ruby YJit etc.) have had for years that CPython is lacking. This is by design though. CPython has never been focused on performance, that's why it's not even JIT. It optimizes for simplicity and easy interoperability with C.

The problem is that because Python is such a ubiquitous language, CPython gets more attention than it deserves. People see it as an archetypical implementation of a scripting language. We get blogposts like this examining its inner workings, discussions about how its performance could be improved, comparisons of its speed vs. compiled languages, and tutorials on how to optimize code to run faster in it. I feel like all of this effort would be better spent on discussions about runtimes that actually try to be fast.


It's great software for the use case that Python is intended for. Python is supposed to be glue code. You embed a scripting runtime in your application, do all the heavy lifting in C, but configure your building blocks in Python so that you can easily reconfigure them as needs change.

NumPy, SciPy, TensorFlow, PyTorch, JAX, Pandas, Pillow, lxml, cjson, PyCapnP, Tornado, fast-avro, etc. all get it right. They are wrappers around C (or in some cases: Fortran/assembly/CUDA) code, where the overflow of Python method dispatch is dwarfed by the hundreds of thousands of iterations of an inner loop that's in optimized, vectorized assembly. Django, Protobufs, and Avro get it wrong (often for portability or developer velocity sake), where they wrote the whole library in Python at the expense of performance.

I was briefly tempted to write an API-compatible reimplementation of Django with the core in C++ when I left Google, but by then Django (and server-side web programming) was already falling out of favor, and if you're just shipping JSON to a SPA you can use cjson with any number of fast wsgi or asgi gateways.


Arguably no one’s goal is to glue pieces of C together. That’s too abstract. Their goal is to do something like write a performant web server, maybe easily and quickly. Python’s approach is one way to do that, but there are other ways that might be better. Having to write code in two languages to solve a problem, one of which that is difficult to write and has lots of footguns, has some clear downsides.


Basically nobody's goal is to "write a performant web server" either, it's to serve data to customers quickly and efficiently. And that highlights why it may not be worth optimizing the Python web ecosystem. There are so many newer alternatives for that overall goal - Firebase, Amazon Lambda, ditching webapps for native mobile, etc - that it may not make sense to try to optimize an application server unless you work for Google or Amazon, because very few people setup a standalone webserver on bare-metal hardware anymore, and those that do probably aren't going to try a new and untested alternative.


Nobody's goal is to serve data to customers quickly and efficiently either. It's more like, get this startup acquired and buy a ranch in New Zealand.


Haha you might like this old comment I made where I gradually go up the abstraction ladder about what you "really care" about (in the unlikely event you see this comment):

https://news.ycombinator.com/item?id=9777816

Root of the subthread: https://news.ycombinator.com/item?id=9775799


Gluing pieces of C together was literally the reason I started using Python back in the 1990s.

My other two main options were Tcl and Perl. Tcl was excellent at gluing, but worse at scaling, with no namespaces (then) and OO only as third-party add-ons.

Perl extensions were not so easy (better with Perl 5), and much as I enjoyed the language, handling complex data structures, was not for the faint-hearted.

Gluing pieces of C together is why we have NumPy, PyQt, pywin32, wxPython, and tens of thousands of other packages that work with C/C++/Fortran libraries.


Ritchie and Thompson might disagree with you.


> for the use case that Python is intended for

Where is this single use case intent articulated, by whom, and what year was it? What is the use case?

Today, it seems that Python is pitched for almost everything, short of ethernet drivers.


>Today, it seems that Python is pitched for almost everything, short of ethernet drivers.

I think the "Python is pitched for almost everything" in that sentence shows a misinterpretation of gp's phrasing of "use case".

The "use case" isn't about different subject matter domains as if it was a claim about using Python as a universal language for writing database kernels or AAA games.

Instead, the "use case" is about the 2-level 2-language architecture of (1) a high-level scripting language and (2) extension modules that can be written in low-level C and imported into the interpreter. That's the "glue language" + "C Language" to combine the strengths of each language approach. (In contrast, Julia took approach of designing a language that was "fast enough" to avoid the "2 languages issue".)

>Where is this single use case intent articulated, by whom, and what year was it? What is the use case?

The Python "ergonomics use case" (not "domains use case") was originated by Python's inventor Guido van Rossum from the beginning in 1991. A clone of the 1991 Python source code has Guido's commentary for importing C modules:

https://github.com/smontanaro/python-0.9.1

Modern frameworks and libraries like TensorFlow and Pytorch continue the same use case of "high-level script glue code calling low-level C code" that was there in 1991. You can't write a tight cpu loop in pure Python code to paint a 60fps video game. That's not Python's intended use case. That philosophy is why a library like Tensorflow only has Python code for users to "glue" together the neural network graph which then calls out to the C++ code for the expensive cpu loops of backpropagation, gradient descent, etc.


I liked it back when we used "use case" for that concept with the actor executing a concrete interaction with a system to achieve a specific goal, and "architecture" for how the pieces fit together, without mixing or equating the two. :)


From https://www.python.org/about/ (mouse over the "About" menu) :

'Python is a programming language that lets you work more quickly and integrate your systems more effectively.'


This is one of the biggest misunderstandings in computing for the last decade, I feel.


That's not a specific use case. That's like brochure pitch.


> I was briefly tempted to write an API-compatible reimplementation of Django with the core in C++ when I left Google, but by then Django (and server-side web programming) was already falling out of favor, and if you're just shipping JSON to a SPA you can use cjson with any number of fast wsgi or asgi gateways.

I like Django. I need to process data on the server side and like to write that in python because it is more convenient than C. I also built my GUI in Django without knowing JavaScript. "just shipping JSON" seems like a different use case.

I have a piece of hardware (a laboratory hardware switch) that exposes a REST API for CRUD. I wanted to build a GUI that formats and summarizes information and offers convenient control. The data is small enough so that python can process it without becoming the bottleneck. I used Django ORM to model the data and django forms with htmx for the GUI. Authentication was easily added to Django.

The ORM part was a bit painful as Django forms expect a queryset and a queryset is not be the result of a raw sql query. There is a way to feed list of tuples into a choices argument but I decided against that , and instead dumbed down my query so I was able to write it as an django ORM language query.


This is the problem with SQL queries and Django Forms I am talking about

https://stackoverflow.com/questions/17330158/django-how-to-u...


I've been playing around with HTMX with Django, and it seems simple enough for server side rendering. It works well with the Django templating system.

I feel like as an industry we should step back and take a serious look at front end frameworks from first principles. One aspect that's clear to me, is that we should make modifications to HTML to support HTMX like transactions.


Python rather happened to have a language design and C API design that doesn't allow a simple performant implementation. Even PyPy was not that fast compared to other JIT implementations, while its C API support was always subpar. It is easy to say that Python should trade them off for performance, but they are one of the key reasons for Python's success after all.


The semantic issues of making a performant Python language implementation are more or less exactly the same as for JS and Lua, optimizing Ruby seems to possibly have even more "magic" that needs patching but we've seen the Shopify team get cracking on that (it includes MaximeCB that did HiggsJS).

PyPy is in many aspects to be rated as a research project that tried a novel approach to reduce the workload compared to the manhours poured into V8,etc. LuaJIT managed with less with a focused language and a really capable lead. (Also I wouldn't be surprised if the PyPy team has also had to make compromises to get some kind of compatibility)


Unfortunately Python's innard is much more complicated than most expectations. You have named JS and Lua, but those languages never have "magic" methods---JS instead has prototypes and more recently proxies, while Lua has metatables. Ordinary objects aren't magic in this sense, and conversely magical objects are generally deliberate choices in those languages. But Python's magic `__dunder__` methods are everywhere including ordinary objects (and customized objects that look like ordinary objects, e.g. `list` subclasses). That alone complicates a lot of things.


In the late 1990s and early 200xs, there were a lot of claims that there was no such thing as a "slow language", that all languages can be run as quickly as C if you just built a sufficiently smart compiler and/or runtime.

I haven't heard anyone make this claim in a while. The inability to speed up Python beyond a certain point despite a lot of clever approaches taken was probably a good chunk of the reason, the remainder being the wall that JS has hit despite the huge effort poured into it where it is still quite distinctly slower than C.

If I were designing a language to be slow, but not like stupidly slow just to qualify as an esolang, but where the slowness still contributed to things I could call "features" with a straight face, it would be hard to beat Python. I suppose I could try to mix in more of a TCL-style "everything is a string" and "accidentally" build some features on it that bust the string caching, but that's about all I can think of at this point.


I think this is really complicated. I do believe that it is important to separate language from implementation, and it is also true that different implementations can have different performance profiles. It is also true that the semantics of the language can effectively require specific implementation details that can affect performance either way. So there's always gonna be bounds to any particular languages' ability to be faster.

That is why the "as fast as C with the sufficiently smart compiler" never truly came to pass in a general sense, even if many languages that were slow to start have gotten way faster with better implementations.


It certainly is complicated.

And I'd love to peer in to some alternate universe that has an optimal Python interpreter (can't say "the" optimal because it's really a complicated frontier rather than a single point) that runs on our hardware and see what it looks like. Maybe even on multiple points on that frontier.

What really is the limit? I struggle to imagine what a C-speed Python interpreter could even look like, but is there some conceivable program that runs it at, say, half the speed? What even is the limit? What techniques would such a program use that would surprise us and be new and perhaps useful other places?

Or are actually pretty close to that frontier now?

To be honest, given the way optimizations tend to work, the answer probably is that we are relatively close today. They tend to have diminishing returns.

But I don't know. Is there some execution model nobody's thought of, or that has been thought of but simply hasn't had the effort invested to make it pay off, that would make huge gains? I can't prove there isn't. (In fact it's not far off junior-level computer science to prove that you can't prove it.)


Yep, that'd be a lot of fun. I think back to the stuff that used to go on to optimize lua: http://lua-users.org/lists/lua-l/2011-02/msg00742.html

I also suspect that we are relatively close, and there are diminishing returns. I suspect that you would have to start the language design with this goal in mind, and then balance out performance concerns with certain features.


> In the late 1990s and early 200xs, there were a lot of claims that there was no such thing as a "slow language", that all languages can be run as quickly as C if you just built a sufficiently smart compiler and/or runtime.

I think "slow language" never meant that way. It was more like a counterpoint to the claim that there are inherent classes of languages in terms of performance, so that some language is (say) 100x or 1000x slower than others in any circumstances. This is not true, even for Python. Most languages with enough optimization works can be made performant enough that it's no slower than 10x C. But once you've got to that point, it can take disproportionally more works to optimize further depending on specific designs.

> If I were designing a language to be slow, but not like stupidly slow just to qualify as an esolang, but where the slowness still contributed to things I could call "features" with a straight face, it would be hard to beat Python.

Ironically, Python's relative slowness came from its uniform design which is generally a good thing. This is distinct from a TCL-style "everything is a string" you've said, because the uniform design had a good intent by its own.

If you have used Python long enough you may know that Python originally had two types of classes---there was a transition period where you had to write `class Foo(object):` to get the newer version. Python wanted to remove a blur between builtin objects and user objects and eventually did so. But no one at that time knew that the blur is actually a good thing for optimization. Python tried to be a good language and is suffering today as a result.


"If you have used Python long enough you may know that Python originally had two types of classes"

Yes, because that's also the era where this claim was flying around.

I'd say that each individual person may have their own read on what the claim meant, but certainly the way it was deployed at anyone who vaguely complained that Python was kind of slow shows that plenty of people in practice read it as I've described... that if we just wait long enough and put enough work into it, there would be no performance difference between Python and C. In 2023 we can look back with the perspective that this seems to be obviously false, so they couldn't possible have meant that, but they didn't have that perspective, and so yes they can have meant that. "Sufficiently smart compiler" was just starting to be a derogatory term. I also remember and lightly participated in the c2.com discussions on that, which may also contribute to my point that, yes, there definitely were people who truly believed "sufficiently smart compilers" could exist and were just a matter of time.

As for proportions, it's impossible to tell. Internet discussions (yea verily including this very one) in general are difficult to ascertain that from because almost by definition only outliers are participating in the discussion at all. Obviously by bulk of programmers, most programmers had simply never considered the question at all.


Yeah, I agree everyone may have different anecdotes, for my case though I heard more of the "Python as glue code" arguments and never heard that Python proper can be as fast as C. I've used Python since 2.3, so maybe that argument was more prevalent before?


Same here. Never heard of python can be fast. It is not what it is originally for.


> Most languages with enough optimization works can be made performant enough that it's no slower than 10x C.

Maybe if we take the slowest C program, CPython no slower than 50x ?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


CPython had many other concerns besides from performance, and PyPy is much closer to my quote I believe.


fwiw "How many lines of C it takes to execute a + b in Python" was explicitly about CPython.

fwiw PyPy doesn't seem to have been released in "the late 1990s".


You could add in continuation support as well as unbounded stack size to make it even more difficult to implement efficiently. There are tricks these days for implementing continuations somewhat efficiently (and by somewhat, I mean that they're only 3-5x slower than explicit continuation passing with lambdas), but these tricks largely don't work in WASM without doing something extreme like completely ignoring the WASM stack and storing return addresses/current continuation on the WASM heap.

Unbounded stack size is similarly difficult for WASM because like before, you have to be very careful about using the WASM stack.

Even with C++, you basically need to drop down to intrinsics or assembly to make full use of SIMD.


Can someone explain what exactly it is about Python's design that makes it slow?

What changes would have to be made to speed it up? Obviously changing its core design now would break things, but my question is, can we can imagine an alternate universe Python that's as close as possible to our Python, except really fast? What would be different?


It's the whole design. Every object creates allocation/gc overhead, bytecode dispatch is a major bottleneck, attribute lookup is expensive, objects are expensive, namespaces are expensive, etc.

You can change things internally (e.g. optimizing opcode parsing), fixed object layouts, restricting mutability, converting everything to predictable array accesses, but you'll likely just end up with something like Lua or Wren rather than Python, and people like Python specifically because of the ecosystem that's built up around that dynamicism over the years.


Iirc the slowness of CPython in all the above mentioned are artifacts of the implementation rather than the language itself.

The huge issue is that a big selling point for Python was the easy C-api integration providing lots of useful functionality via libraries now works as a chain that limits how many changes can be made (see any GIL-removal discussion).

The most sane way forward would be to mandate a conversion to a future-proof C-api (PyPy has already designed an initial one iirc that's tested and also has CPython support) that packages would convert to over time.

CPython will probably never go away due to many private users of the old api, but beginning the work towards implementation independancy in the package ecosystem at large could allow _language compatible_ runtimes with V8/JSCore/LuaJIT-like performance for most new projects.

It all depends on the entire community though and that in turn depends on the goodwill of the CPython team to support this.


I should clarify that LuaJIT is not universally used either, because LuaJIT doesn't support anything after Lua 5.1 among other reasons. People often claim that Lua is a relatively speedy language due to the existence of LuaJIT but it's not entirely correct.


It DIDN'T because Mike Pall(LuaJIT author) was on a hiatus from develoment. He's resumed work since and many 5.2+ features has been incorporated now even if the biggest language-turd (1: _ENV) is explicitly incompatible.

The problem is that _ENV explicitly exposes the lexical environment as regular objects prohibiting optimizations, even JavaScript has _removed_ a similar feature (2: the with statement) when running in "strict" mode to simplify optimizations.

LuaJIT _could_ implement the _ENV blocks but it'd seep into large parts of the codebase as ugly special cases that'd slow down all code in related contexts (thus possibly breaking much performance for code in seemingly unrelated places to where _ENV exists).

To compare from an implementation optimization perspective, exposing _ENV is actually __worse__ than what CPython has with the GIL for example.

Luckily "with"-statements in JS was seldomly used so implementers ignored it's existence as long as it's not used(but still have to consider it in implementations, thus adding more workload), but it's an wart that will kill many optimizations if used.

For most practical purposes most people are fine without "with" or _ENV and the languages are fast enough.

https://luajit.org/extensions.html

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Smalltalk, Common Lisp and SELF are just as dynamic if not more, with a JIT.


That is a question in my mind throughout reading this discussion.

How about objective c vs swift from a dynamic at least type vs static one. Can swift be glue.


I am going to answer this question in a roundabout way: first showing examples of different code gen in Rust, because it is more straightforward, but then I will reach for an example with Ruby, because I know it better than Python, but I believe it is similar enough that you will get the gist.

If I write a function like this in Rust:

  pub fn add(x: i32, y: i32) -> i32 {
      x + y 
  }
this will compile to this assembly (on x86_64):

  add:
    leal (%rdi,%rsi), %eax
    retq
two instructions. This is because in Rust, free functions exist, have a name, and they are called by name. There's an additional twist here though too, let's check it out in debug mode, with optimizations off:

  add:
    subq $24, %rsp
    movl %edi, 16(%rsp)
    movl %esi, 20(%rsp)
    addl %esi, %edi
    movl %edi, 12(%rsp)
    seto %al
    testb $1, %al
    jne .LBB0_2
    movl 12(%rsp), %eax
    addq $24, %rsp
    retq

  .LBB0_2:
    leaq str.0(%rip), %rdi
    leaq .L__unnamed_1(%rip), %rdx
    movq core::panicking::panic@GOTPCREL(%rip), %rax
    movl $28, %esi
    callq *%rax
    ud2
There's a few things going on here, but the core of it is that in Rust, in debug mode, overflow of addition is checked, but in release mode, wrapping is okay, and so the compiler can eliminate the error path. This is an example of language semantics dictating particular implementation: if I require overflow checks, I am going to get more code, because I have to perform the check. If I do not require the checks, I get less code, because I do not perform the checks. (Where this gets more interesting is in larger examples where the checks get elided because the compiler can prove they aren't necessary, but this is already a tangent of a tangent.)

In Ruby, there are no free functions. If I write a similar add function:

  def add(x, y)
    x + y
  end
This function is not a free function: it is a new private method on the Object class. When you invoke a function in Ruby, it's not like Rust, where you simply find the function with the name you're invoking, and then call it. You instead perform "method lookup," which has some details I will elide, but for the purposes of this discussion, the idea is that you first look at the receiver to see if it has the add method defined, and then if it does not, you look at the receivers' parent class, and if it's not there, you keep going until you hit the top of the hierarchy. Once the method definition is found, you then invoke it.

Now, it's not as if Rust doesn't also have method lookup (though the algorithm is entirely different), but Rust's design means that method lookup (in the vast majority of cases) is a compile-time thing: the lookup happens while you're building the software, and then at runtime, it simply calls the function that you found.

So why can't Ruby run method lookup at compile time? Well, for one, I left out an important second step: Ruby provides a method called method_missing, as a metaprogramming tool. What this means is, if we look the whole way up the object hierarchy and do not find a method named add, we will then re-traverse the entire ancestor tree again, instead invoking each class's method_missing method on the way. method missing takes the name of the method that was trying to be called, the arguments to it, and any block passed to it, and you can then do stuff to figure out if you want to handle this. This means that, even if no add function is defined, it still may be possible for the call to succeed, thanks to a method_missing handler.

Okay well why can't we do that at compile time? Well, Ruby also lets you redefine functions at runtime at basically any time. The define_method method can be called and generate a method on anything, anywhere you want, for whatever reason. You could do this based on user input, even! And yes, that would be a terrible idea, and you probably shouldn't do it, but the implementation of the language requires at least some sort of runtime computation to pull this off in the general case.

Now, I also want to point out that in my understanding, there's caching on method lookup, so that can help reduce the cost in many scenarios. But the point stands that the language has features that Rust does not, and those features mean that certain things must be more expensive than languages that do not have those features.

> can we can imagine an alternate universe Python that's as close as possible to our Python, except really fast? What would be different?

We could, but you lose compatibility with most Python code, and so you're effectively creating a new language. People do try this though, Mojo being an example of this very recently. I am excited to see how it goes.


Thanks. That's fascinating about Ruby, I'll have to look into that.

I'm not an expert on Python but I don't see how Python is significantly more dynamic than e.g. JavaScript. I think PyPy and JS performance is comparable (or at least within the same order of magnitude), so I think it largely comes down to implementation, i.e. prioritizing performance.

I think if it had been Python (or Ruby for that matter) in the browser instead of JS, it would run about as fast as JS does today.


> I'm not an expert on Python but I don't see how Python is significantly more dynamic than e.g. JavaScript.

JS has much less in the way of magic methods that can affect "normal" object behaviour, and it doesn't have metaclasses in the way that Python does at all. Most of this customization goes unused most of the time, but the runtime still has to handle it in case it's being used this time.


You should have added Ruby JITs into the explanation, with the related generated Assembly code. :)


Yeah that would be fun but I don't have a Ruby environment installed and the comment is already getting very long; if I ever turn this into a blog post someday maybe I will add that :)


You just happened to post a day after "Ruby 3.3.0-Rc1 Released" "many performance improvements especially YJIT"

https://news.ycombinator.com/item?id=38599293


"Tech debt doesn't matter!"

"We're going to redesign it the right way after we get this version out the door!"


Arguably those redesigns might have happened if the Python 2->3 transition hadn't received a decade of extremely vocal pushback


What’s funny is the breaking change that caused us the most headaches with 2->3 was the changes to fucking print.

I’m still finding broken print as a statement instead a function issues in codebases, somehow.


It’s the easiest one to fix, though. 2to3 does it easily. If it were the only change, we would not be having this conversation. It might have even not deserved a major version bump (given that even minor releases of Python don't conserve backwards compatibility).


It’s not entirely true that JS does not have magic methods. `valueOf` and `toString` can show up surprisingly deep into the resolution of operations, and recent JS has “well known symbols” to implement or override behaviour.

However it is true that this is much, much less extensive than it is in Python. As of 3.12, section 3.3 (“special method names”) of the data model documentation lists 107 entries (although some of them only apply to class protocols, and a handful are duplicates for async versions / context of sone operations).


`valueOf` and `toString` are indeed fairly complex (and it's fun to consider when both are implemented ;-), but less of concern for tracing JIT engines because you can have efficient type-specialized implementations for most cases. Type specialization in Python is not that huge win...


Smalltalk, SELF and Common Lisp are full of dynamic magic, you can in a single function call change the representation of all instances of a given object during program execution, at any given time break into the debugger and change whatever you feel like and resume execution, dynamically load code from the network with side effects on the running program, and many other crazy things.

Yet, not only are they in the genesis of JIT compiler research, their JITs are quite good, and their results went directly into JavaScript JITs research.


Is there any direct comparison though? Because it can be alternatively argued that these dynamic languages needed new JIT techniques for performance, but they especially work well for more constrained languages like JS. I don't think Smalltalk was ever a good fit for number crunching, for example.


Better than CPython, regardless of how you put it.

EDIT:

Also to note, those languages powered whole graphical single workstations, with microcoded CPUs + JIT.

"Efficient implementation of the smalltalk-80 system"

https://dl.acm.org/doi/10.1145/800017.800542

https://computerhistory.org/blog/introducing-the-smalltalk-z...

"Self-Confidence: How SELF Became a High-Performance Language"

https://www.cs.cornell.edu/courses/cs6120/2020fa/blog/self/



CPython is the one being compared here, and besides Pharo, better use an industrial strength one, like Cincom Smalltalk.


Cincom Smalltalk is the one being compared.


Exactly. Comparing with CPython is too easy and arguably unfair, Smalltalk should be compared with JS instead and I think it has no chance in that setting.


Why should it be compared?

We are not comparing Smalltalk JITs to Javascript JITs.

The whole point of this conversation is CPython refusing to add one, and the lame excuses regarding its dynamic capabilities, when more dynamic languages have had a JIT for decades.


First, because CPython had more concerns than what Smalltalk implementations have, so such comparison would be unfair to Python. (See my topmost comment for example.)

Second, my question was about the possibility that Smalltalk and others were unbearably slow without JIT, so JIT was not a matter of choice for them. I'm not aware of Smalltalk implementations that don't have JIT, so it would be easier to compare Smalltalk with another well-optimized JIT implementation instead---in this case JS.


Not much in the Python language is categorically worse than features within JS when it comes to JIT creation.

My thesis work was on AOT JS compilation, in it I refer to a bunch of experimental Python runtimes, Self,etc and the main issues in all these papers were basically the of same kind.

Heck, even PyPy exists and iirc when it comes to the core language is almost entirely compatible (except for code that relies on ref-counting semantics but that code should apparently big fixed anyhow).

https://www.pypy.org/compat.html

The real summary is: CPython is a turd in many ways, the old C api holds it back and the community hasn't put the effort into using cross-implementation compatible C-bindings instead making PyPy or others a second class citizen.


So if one was "Prototyping a Real-Time Embedded System in Smalltalk" one might 'improve frequently invoked methods by recoding, possibly as “primitive” functions in a lower level language such as C or assembler'.

https://dl.acm.org/doi/pdf/10.1145/74878.74904


CPython is up to date with one specific Smalltalk implementation in 1987, when an ESP32 has better hardware resources than those systems, what a great achievement!


One way Smalltalk was made a better fit for number crunching: primitives.



They reside in metatables, which are only optionally linked to ordinary objects. Python magic methods are a part of the core object protocol.


Also, you have craziness like quite regular iteration being implemented using exceptions, which are not exactly trivial to optimize.


If you want to be return absolutely any value from an iterator, an exception is indeed a reasonable choice though. Python generators came in much later, unlike JS for example.


If that is indeed a requirement, you let your iterator have an .at_end() test instead of trying to shoehorn it into the return value. Or have something like C++'s std::optional or Rust's Option. Or a special EndOfIteration object that you cannot return from an iterator (is that really so bad?). Iteration is extremely common, and should thus be based on as simple primitives as possible if you want things to go fast.


I believe the original PEP [1] answers almost all questions. The only option not covered would be probably an "optional" type, but it is even easier to answer: any such type has to be heap-allocated in the Python memory model, so a mere iteration could've caused a lot of redundant memory allocations.

[1] https://peps.python.org/pep-0234/#rationale


Magic methods are not that "hard" to optimize (as long as you don't overload the add,etc operators of f.ex. the Number class in JS). I'm gonna use numbers and addition as an example here.

First off is the value model, the Python runtime handles ALL values as objects and that's fine for an initial naive runtime. All fast/modern language runtimes however use value models/encodings that fits "fast" values directly into machine register at the lowest level.

V8 has(had?) "small-ints" and objects (doubles,strings,etc) by setting the lowest bit in a register for pointers and otherwise dealing with them as numbers. So a+b when JIT'ed has a check (or stored knowledge from a previous opertion to elide the check) that both a and b are integers, if that is true then the actual addition is one single machine addition. if that ISN'T true then a more complex machinery is invoked that could methods like double-dispatch to see if more complex processing (like a "magic" method) is needed. This is how as JS engine handles that + behaves differently between numbers, strings, BigInt's, Date object's,etc.

(Other JS engines and LuaJIT use something called NaN/NuN tagging that also allows for quick passing of numbers w/o allocations and only a few small extra checks)

Re-implementing Python, you'd probably choose a small-int optimization (to better support Pythons seamless bigints) for values, put a runtime specific magic to the number add and make some kind of hook that detects writes to it from user code. Patching that from user code would trigger de-optimizations but for most applicataions it could continue running with optimized paths.

And even with larger objects (like heap-allocated BigInt's) a JS runtime can use inline caching to direct the runtime to fast direct dispatches, and then teams like the V8 team can detect commonly used objects that are often used and create fast-paths. A list addition for example will use common "slow" paths for dispatch but that's ok since it's an inherently slow operation that often involves allocations of some sort so the _relative_ overhead is fairly small in the big picture.

All this naturally assumes that you have the machinery in place, once in place though you can make simple code (numeric additions) fast while retaining magic for more complex objects (bigint, list,string,etc).

Tl;Dr; once you have that kind of optimizing in place, expensive processing can be allowed in special cases in slow paths thanks to type-guards, but 95% of the code will run the fast paths and having that handling in places with speed will give you most wins.


Instead of individually replying your comments, let me answer them at once here, because I agree you are correct in principle but think you are still missing my points.

Modern tracing JIT engines indeed work by (heavy) specialization, often using multiple underlying representations for single runtime type. I think V8 has at least four Array representations? After many enough specializations it is possible to get a comparable performance even for Python. The question is how many, however.

For a long time, most dynamically typed languages and implementations didn't even try to do JIT because of its high upfront cost. The cost is much lower today---yet still not insignificant enough to say it's no-brainer to do so---, but that fact was not that obvious 20 years ago. Ruby was also one of them, and YJIT was only possible thanks to Shopify's initial works. Given an assumption that JIT is not feasible, both CPython developers and users did a lot of things that further complicate eventual JIT implementations. C API is one, which is indeed one of the major concern for CPython, but a highly customized user class is another. Herein lies the problem:

> Magic methods are not that "hard" to optimize (as long as you don't overload the add,etc operators of f.ex. the Number class in JS).

Indeed, it is very unusual to subclass `Number` in JS, however it is less unusual to subclass `int` in Python, because it is allowed and Python made it convenient. I still think a majority of `int` will use the built-in class and not subclasses, but if it's the only concern, Psyco [1] should have been much popular when it came out because it should have handled such cases perfectly. In reality Psyco was not enough, hence PyPy.

[1] https://psyco.sourceforge.net/introduction.html

At this point I want to clarify that magic methods in Python are much more than mere operator overloading. For example, properties in JS are more or less direct (`Object.defineProperty` and nowadays a native syntax), but in Python they are implemented via descriptors, which are a nested object with yet another dunder methods. For example this implements the `Foo.bar` property:

    class Foo:
        class Bar:
            def __get__(self, obj, objtype=None): return 42
        bar = Bar()
In reality everyone will use `bar = property(lambda self: 42)` or equivalent instead, but that's how it works underneath. And the nested object can do absolutely anything. You can specialize for well-known descriptor types like `property`, but that wouldn't be enough for complex Python codebases. This is why...

> This is how as JS engine handles that + behaves differently between numbers, strings, BigInt's, Date object's,etc.

...is not the only thing JS engines do. They also have hidden classes (aka shapes) that are recognized and created in runtime, and I think it was one of innovations pioneered by V8---outside of the PL academia of course. Hidden classes in Python would be more complex than those in JS for this added flexibility and resulting uses. And JS hidden classes are not even that simple to implement.

After decades of JIT not in sight, and a non-trivial amount of work to get a working JIT even after that, it is not unreasonable that CPython didn't try to build JIT for a long time and the current JIT work is still quite conservative (it uses a copy-and-patch compilation to reduce the upfront cost). CPython did do lots of optimizations possible in interpreters though, many things mentioned above are internally cached for performance. One can correctly argue that such optimizations were not steady enough---for example, adaptive opcodes in 3.11 are something Java HotSpot used to do more than 10 years ago.


> LuaJIT managed with less with a focused language and a really capable lead.

It's pretty well established that "Mike Pall" is the pen name for an AI sent from the future for unknown reasons. It disappeared from our light cone due to a rift in causality, presumably because it succeeded in whatever changes it wanted to make in the future.


Finally got the proof that time travel did exist. But I wonder many like bitcoin might be. Then the pin drop could the whole python is a future ai coming in and prevent us from optimise and squeeze them out. They need the glue language. Ai depend upon it. Hence it is all for ai.


If this is by design and it optimizes for simplicity and easy interoperability with C, why would you assess its quality as 'good software' based on criteria that are non-goals?


In my eyes, "good software" does not necessarily just mean something that accomplishes its goals. You can set a goal of being bad, and that's kind of what CPython is doing.

Of course, "good" and "bad" are relative. If you don't care about performance then there's nothing wrong with CPython.


I agree with that estimate, I think it was really the idea of Python to be less than "good" by design.

I mean one doesn't need more than:

    >>> exit
    Use exit() or Ctrl-D (i.e. EOF) to exit
to see. They have that special handling, but they still don't want to let you out, because... IMO, they just want to be annoying.

When a solution could be something like:

    Note: exit() is needed in scripts. In this prompt Ctrl-D (i.e. EOF) can also be used.
    Exiting.


There has been discussion of this change a few times, e.g. https://bugs.python.org/issue44603

I generally agree with your overall sentiment, but I think it's important to note that the behavior is _not_ special handling; it's just the normal `repr` behavior at the REPL, where `exit` is an object like any other, and `repr(exit)` is that message.


Whoever types exit in REPL will never care what "repr(exit)" does.

Python has a lot excuses about the need to remain "consistent" but the reality isn't so:

    >>> x = open( "/tmp/whatever123", "w" )
    >>> close( x )
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'close' is not defined. Did you mean: 'cosh'?
    ...    
    >>> x.close()
    >>>
    >>> x = "tttt"
    >>> x.len()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>    
    ...
    >>> len( x )
    4


I think this is to be consistent that exit is not a statement but a function. This was done with print. 3.0 was an ergonomic fix for cpython developers and language consistency fixes (also redoing the Unicode implementation).


Yeah, that message is a pet peeve of mine. Like, you know what I'm trying to do.


It sort of doesn't! It's just doing exactly the same thing it would do with any other identifier: it's printing its `repr`. Try `repr(exit)`!


More to the point, try `[len,exit,repr]`. Consider what would happen if `exit.__repr__()` did terminate the program.


I don't disagree with you but, now you can retrofit a JIT into python. https://blog.pyston.org/2022/09/29/announcing-3-7-3-10-suppo...


"We think of the breakdown roughly as follows: of our roughly 30% original speedup, 10% is going into Pyston-lite, 10% was done independently by the CPython team between 3.8 and main, and the remaining 10% we are hoping to contribute back upstream."

To compare with Javascript: if I remember, as it appeared, V8 JIT was orders of magnitude faster, compared to the interpreted code.


There was also a bunch of effort being put into efficiency of JavaScript. But now the tables have been changed and Microsoft is putting a large amount of effort into Python .


The last couple of CPython versions have had dramatic speed improvements, so that demonstrates your point but also gives some hope that things are changing on that front.


Dramatic? Barely. Even less than half of the non-dramatic speed improvements they promised. The promise was like "5 times faster in the next 4 releases", and the improvements in 2 releases thus far was like 20% at best (so not even 2x faster).

Compared to JS pre-and-after modern engines it's a tiny improvement.


Recent CPython development has been towards optimizations and addressing use cases that benefit from optimizations, some coming from the faster CPython initiative. You might just get your JIT[1].

At the same time, I also agree with your sentiment.

[1] https://github.com/faster-cpython/ideas/wiki/Workflow-for-3....


Just yesterday I compared the small language I created to Python. Was surprised my tree walking interpreter somehow beat Python's bytecode virtual machine at recursive Fibonacci. It was just a simple hyperfine benchmark so I might be confounding code execution performance with the initialization time. Still caught me totally off guard and gave me a huge confidence and motivation boost. I was thinking something like "OK let's see just how bad this thing is" but it actually beat Python at something.


Is there any comprehensive comparison and benchmark of intepreted languages or languages with a REPL and analysis of how each interpreter handles common operations?


I really enjoined working in Lua.


Your "just in case the software is exposed to the web" should be "exposed to untrusted data." The very old Python hash could be DoS'ed reading a data file. The original randomized version required some feedback to figure out the hash, so generally required some sort of interaction.

I find that hard to believe that's a performance bottleneck. String hashes are all cached, and names like "print" are interned.

For a 2x overall gain I would expect to see the hash function pop up easily in my profiling, but I haven't seen it in my own profiling which was looking for simple things like that.

When siphash was evaluated, quoting https://peps.python.org/pep-0456/#performance , "In general the PEP 456 code with SipHash24 is about as fast as the old code with FNV" and "The summarized total runtime of the benchmark is within 1% of the runtime of an unmodified Python 3.4 binary".

Since then they switched from siphash24 to the faster siphash13. https://github.com/python/cpython/pull/28752


I'm aware that Rust has something similiar for things like `std::collections::HashMap`. By default:

> The default hashing algorithm is currently SipHash 1-3, though this is subject to change at any point in the future. While its performance is very competitive for medium sized keys, other hashing algorithms will outperform it for small keys such as integers as well as large keys such as long strings, though those algorithms will typically not protect against attacks such as HashDoS.

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

However, Rust also lets you pick or implement your own hash algorithm if you want to optimise for your usecase.


Which is why the Rust compiler itself uses a non-cryptographic hash, which takes just 3 x86 instructions and can work on 8 bytes at a time: <https://github.com/rust-lang/rustc-hash/blob/master/src/lib....>


Python 3.11 appears to have switched to SipHash 1-3 for strings, from 2-4, following the lead of Rust and Ruby. https://github.com/python/cpython/issues/73596

However, Python does not use it for integers;

  >>> hash(10)
  10
  >>> hash(100)
  100
  >>> hash(2**61-2) == 2**61-2
  True
  >>> hash(2**61-1)
  0


That’s good though, right? Is there a reason for not using an identity hash (is that the right term?) for integers?


That depends on the hash table implementation and the distribution of the integers.

For the commonly used hash tables with prime size that use modulo to turn the hash code into a slot index, an identity hash for integers is usually fine (unless many integers are multiples of the prime size).

But other hash tables use power-of-two size to replace the modulo operation with a faster bit-and operation. Now an identity hash for integers is much more problematic, e.g. if all integers are multiples of 1000, only 1/8th of the table slots can be used.

The latter kind of hash tables would like all bits in the hash value to be well-distributed; and this is typically not true of the underlying integers. So an additional mixing operation needs to be used. Whether that mixing happens in the hash function or in the hash table depends on the implementation (for some, it's even configurable, e.g. is_avalanching marker in ankerl::unordered_dense).


Don't use user-supplied integers on dicts or sets on Python:

>>> {i for i in range(10000)}

Takes 0.005s

>>> {i * sys.hash_info.modulus for i in range(10000)}

Takes 0.76s


Do you mean this post? https://www.reddit.com/r/Python/s/raofvsKCiz That speedup was disputed in the comments. I haven’t tried it myself though


https://www.reddit.com/r/Python/comments/mgi4op/comment/gswg...

I’d use “thoroughly disproven” rather than “disputed”.

I’m sure the hash function can be changed, but as various comments noted:

- the benchmark was nonsensical

- cpython caches string hashes, and “symbols” are interned, so outside of dynamic attribute access from dynamically constructed strings each hash for attribute purposes or namespace lookup is computed once per process

- and finally (though probably not the biggest issue) xxhash is known for being mostly useful on larger sizes (>128 bytes, although you can find better hashes (city IIRC) up to 512 or so)

Much like Rust, CPython uses siphash as its default, it’s a pretty good all rounder though not the fastest. It actually used to use FNV before HashDOS.

CPython does suffer from the inability of users to configure hash functions since it’s an object property rather than a container property.


Rust is doing something even more subtle here than just making the hash function a container property. There are two inter-related Rust traits, Hash is a trait which types implement to explain abstractly how to hash that type, but it's written in terms of a Hasher trait so you can drop in a different function with the same API. The Rust standard library provides a derive macro for Hash, so most people can just gesture vaguely at their custom type and have it hashed correctly for any hash functions they need. The naïve approach here easily ends up doing a bad job either colliding hashes for correlated objects in a surprising way or emitting different hashes for equivalent objects because people who make a user defined type probably aren't hashing experts.


We can get into the weeds about the details, but what I'm talking about is mostly that in Python (and in most languages really, AFAIK Ruby, Java, or C# are the same to list just a few) objects have to return the hash itself, so the hash function is fixed by the type.

In Rust, the Hash trait is only used to feed data to a hasher such that the type can decide what should be hashed. The creation of the hasher is done by the collection, and thus provides better opportunities / flexibility in customising the hash function.


The derive macro is also key here. Have you ever looked at the machinery needed for Java's URI type to have a halfway useful hashCode implementation? It's pretty elaborate, there's just no way the average programmer will do that work correctly for their own types even ignoring the desire to allow different hash functions. Rust programmers are almost always able to just write #[derive(Hash)] and not worry about it.


What's more, you can get a significant speedup from your Python scripts by replacing the inbuilt cpython malloc calls with a static "allocate big chunk of stack at the beginning and I'll manage it myself" implementation, falling back to malloc as needed if it grows beyond that. A college class in perf engineering I TA'd did this, the results even a beginner could achieve were compelling, the top of the class produced results quite remarkable indeed..

This is most effective for reducing startup time of short lived scripts, where the runtime is dominated by many thousands of trivial mallocs right at startup. But in general if you can establish a bound on memory, it will be faster to allocate it in one shot.


I'd love to see an example of this if you happen to have one available. I hit a startup-time issue in a previous life, and I wish I had spent the time looking at it back then


A template repo can be found here https://github.com/JacksonKearl/cpython, but it does not implement an ideal malloc, just a baseline one - I am not sure if it is still being used as an assignment.

The repo states that even this dummy implementation:

> has a 60% faster startup as compared to base CPython, and in some test cases has marginally better runtime performance as well.


Thankyou so much!


AFAIK Python strings cache their hash value. All the hash table lookups for object properties should be compile-time constant strings that reuse the string object, and thus also reuse the cached hash value. It may take 11 hash computations for the first print("hello world"), but the second call shouldn't take any.


It looks like you can disable the slower randomized hashing yourself by setting PYTHONHASHSEED to 0. Though I don't know if there's further speedup to be had by using a different hash implementation.

https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHA...

The original issue: https://bugs.python.org/issue13703


That only disables the keying of the hash function (sets the initial value to 0), it does not change the hash function.

The hashseed is a per-process value, it has basically no impact on performances.


> I'd much prefer to have a "I'm offline, please run twice as fast!"

If I know anything about programmers, it's that everyone would just use the "go faster" flag by default.


Yet ~ nobody uses PyPy by default. Speed is clearly not the top consideration for most Python programmers.


Hash function security is certainly a concern due to hash flooding attacks which force worst case performance hash table lookup and leads to denial of service.

https://peps.python.org/pep-0456/


A lot of code never sees data input by untrusted sources. Just let me ignore the possible attacks and run python faster on my own data.


So your solution to if statements is to add more if statements?


This was quite interesting, but I’m disappointed it didn’t mention how many lines in C it actually took to run. Perhaps a profiler might help calculate this?


If you read the article you can see that some codepaths can invoke Malloc with all the follow-on effects like Kernel boundary crossings that this implies, it's thus quite random.


It would still make sense to give a number or a range.


Depends, I look at it from a performance standpoint when starting to count lines/instructions, not just directly executed code but also how feasible it would be to translate the thing to a JIT for example, the amount is large enough that going to a JIT would yield little (this is why there has been so many Python JIT's that has failed to gain enough performance and hence traction) before mayor architectural fixes are made.

Not only is there branches to a ton of special things but also macros that hides even more lines (IncRef/DecRef probably has a lot of magic behind there).


Performance is not the only important metric, and LoC is a good approximation to B complexity.


Since malloc() is a standard C library function, it would be okay to not count its implementation (which isn’t necessarily written in C).


If only counting simple lines of code, malloc is often large enough of a beast that the code executed beyond that simple line often dwarfs the rest.


The Python C API has become so verbose that I recommend against using it directly. Recently I mapped a few external libraries using Nanobind, and the result was significantly more concise. Nanobind uses the type system of modern C++ (17 and above) to provide most argument conversions, Mapping C functions is typically just a few lines of code. It also provides a great way to map C++ constructs such as classes, exceptions, and standard containers.

Nanobind is by the same author that started pybind11, used by Tensorflow and PyTorch. The web site [1] contains a bit more of the rationale.

[1] https://nanobind.readthedocs.io/en/latest/why.html


Python's math operations are going to need a lot of C code because the numbers can be any size. It's part of what makes it so great for scientific computing (as you don't have to spend hours implementing arbitrary precision math - probably badly.) If that's too slow though there's always NumPy.


That's only true for integers.

For scientific applications you'd typically want floating points and Python's floats are just regular ieee-754 doubles (or whatever "double" meant to the compiler used to compile that python interpreter).


The disadvantage with floats is you can't do accurate computations due to having an inexact representation. Python has the same issues with floats.

For example: I do a lot of work with financial software and also some basic applied cryptography. And the essential rule is to never ever use floats. Where 'decimals' are needed you want them to be simulated using integers. Python has a module called decimal which I think helps mitigate some of these issues.

I've written some code to work with accurate, large precision numbers in Python and C before. Mostly the annoying part with this is having data types for the numbers that correspond well to database fields (like uint64) or are portable (in C its easiest if you can get a u128 but this type is very compiler-specific so some hacking may be needed.)

It's fun to work on code like this but definitely needs to be precise and have good test coverage. Writing your own math libraries that are going to be used for such important operations is hair-raising stuff.


Python also has a fixed point arithmetic in its std library: https://docs.python.org/3/library/decimal.html


If only there were libraries for multiprecison arithmetic in other languages...


A more thorough comparison of time, energy and memory required to execute similar Python, C, Rust, Java, C#, etc. code:

https://stratoflow.com/efficient-and-environment-friendly-pr...

And why Mojo could be an answer for high (well, higher) performance Python: https://stratoflow.com/introduction-to-mojo-programming-lang...


It seems like it doesn't answer it's own question, so I'll pose another.

The entire Lua core is 15kLoC. Is that more than or less than what's needed for python's "a + b", assuming a and b are defined.

I'm genuinely curious.


Yes, it's a very tootsie-roll-center kind of answer, but it's clearly more than a few.

To answer your question, 15kLoC is more than enough to implement dynamic dispatch and the PyObject base struct, along with the special method logic for __add__ on any python object type.. but still a lot less than what's needed for all the special method types and a lot of boilerplate for the C-compatible interface around those methods.


I wonder what percent of this audience got the tootsie-roll reference. Does anyone under 40 know it?


I'm in Gen Z and I think I got it. The Tootsie pop commercial with the owl used to play all the time on Canadian TV.


Onne... Two-hooo!


I am about to turn 38 and I got it, so it's at least slightly lower than 40.


Hey Steve, loved your work on Rust and your goodbye letter. Nice to see you in my comment chain


Thanks!


There was some high-level discussion of this in the chat Guido van Rossum had with Lex Fridman (timestamped): https://youtu.be/-DVyjdw4t9I?t=3964


Lines of C aren't the defining factor. You want total instructions executed.


>Lines of C aren't the defining factor. You want total instructions executed.

I understand what your clarification is trying to provide but it isn't relevant to this particular thread's article. The article is not about "performance benchmarks" where you need cpu instructions as a definitive unit-of-measure for comparisons.

Instead of measuring performance, the author's theme in this case is more akin to "decompiling" or "reverse-engineering". He takes a tiny piece of Python code and then maps it back to the actual CPython source *.c and *.h files that implements the Python vm. He added several deep links to the relevant sections of CPython source code on Github to help illustrate the mappings between Python's BINARY_OP to the .c and .h files. The article is sharing the type of knowledge you'd gain by loading up CPython in a debugger and single-stepping through the source code line-by-line.

In other words, the article's title could also have been: "Which Lines of CPython does it Take to Execute a + b in Python?"

For the scope of this particular article, the "lines of C" _are_ the defining factor because the subject of dissection is CPython's .c/.h files.


It’s about 100 for x86_64 https://www.computerenhance.com/p/waste


IME I'd say a line of typical C code (not necessarily C++ code) maps to around 1..5 instructions on average. It's still a quite useful measure to get a rough idea how much CPU work happens.


I have a real life example in this commit: https://github.com/hpc4cmb/toast/pull/380/commits/a38d1d6dbc...

Replacing 2 lines of python code (with tens of glue code in Numba) with hundreds lines of C++ with glue code.


The general topic is covered in Brett Cannon's “Python is (mostly) made of syntactic sugar” series of posts, via translation of high-level semantics into a subset of Python. LWN has a high-level summary: https://lwn.net/Articles/942767/

The actual posts are here: https://snarky.ca/tag/syntactic-sugar/ (multiple pages!)


What's the answer?


Article does not answer the question on its title.


Running `__radd__(tyepof(b) b)` on `a` seems like a complicated problem. So: Many LoC?

Or, the generic, useless but correct, answer: it depends (as the linked article said, too)


It shoud've been possible to establish the lower and upper bounds.


it's about as possible as solving the halting problem if you allow for operator overloading.


Leaving aside the apparent confusion between C and C++, do you really imply overloading could make adding two fixed size numbers in Python take unbounded time?


I'm saying a + b in Python can do whatever you override the __add__/__radd__ to. Not sure why you invoke C/C++ here.

If you only limit yourself to numbers (the title doesn't specify that) it should be bounded, but the article goes into some depth here, so I'll leave it at that.


But this has nothing to do with the halting problem. You can run the code, you don't need to do this theoretically.


Fair!


No. All you need to do is use a code coverage tool to find out which lines are run.


It's hard to say. How many lines of code does it take to call

    typeobj->tp_as_number->nb_add()
when `tp_as_number` is a pointer to a `struct float_as_number`, and `nb_add` is a pointer to `float_add`?

Do struct definitions count as "lines of code called"?


We know that addition is a relatively trivial operation, say, on 64 bit integers.

A more interesting example IMO would be something like "how many lines of C it takes to execute person.name='Bob' in Python, where person.name is undefined".

That would better demonstrate why we use Python in the first place (hint: it's not "to add integers"), while also indicating why it is slow.


There is a really good ten hour walkthrough of Python 2 internals https://youtube.com/playlist?list=PLzV58Zm8FuBL6OAv1Yu6AwXZr...


I can’t find the number of lines it takes in the article. Is it mentioned there?


You would be shocked at how bad/slow addition is in a CPU/GPU or any digital ALU.


[flagged]


Talking about annoying ...

I open a HN discussion about an article, and the first comment talks about popups, colours, fonts, cookies, ... and it triggers a long discussion. Same story again and again ...

... now, that's annoying.


Flag and move on:

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

> Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead.

> Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.

If we're consistent enough about this, eventually the community will get the message.


TIL that one can flag comments. I always wondered why there wasn't a flag link on comments.

> Click on its timestamp to go to its page, then click the 'flag' link at the top.

The extra click is probably why it isn't used much. I thought I simply didn't have enough karma.


Isn't that just the Substack default? so much better than Medium's or others, at least it closes. Besides, Substack is one of the more writer-friendly platforms re: take-home-pay and no ads, so maybe reconsider?


At risk of being condescending, 50% of the platform's job is also being reader-friendly. If it can't achieve that, then it's not very well-designed. Some people have said it has a nicer nag than Medium, though surely we can do much better and have no nagging when you're in the middle of reading an in-depth technical article?


The other 50% is actually getting paid, which is what the popup is for.


I'd reckon that user/reader comfort is less than 50% of the site's purpose nowadays. Considering author content is future training data for an AI, meaning it's more likely more monetizable than users (especially a hackernews user). IE: make money from readers, be comfortable for writers, collect data to be future monetizable for AI training.


Writer-friendliness doesn't need to be at odds with reader-friendliness.


Sometimes it seems like they do need to be opposed. For the writer to get paid, the reader needs to be asked to pay, or exploited quietly. I don’t know a way out :(


It's not about what, it's about when though — like the GP, I got a pop-up fairly immediately before I'd engaged in the article so I just closed the tab.

The best time to engage me is after I've enjoyed the article and hopefully interested to hear more from the writer rather than immediately landing on the page.

It's the equivalent to sales staff jumping on customers in shops the minute they walk in the door — "can I help you with anything today?" — before you even have a chance to see what the shop is like.


"I would rather not read this article than have to click a button" is an interesting level of entitlement. Hope you never had your fingers stained by the ink when reading the newspaper, you probably would have sued the New York Times.


Dude we had a whole web browser revolution in the early 2000s centered largely around the ability to prevent browser pop-up windows. The fact that people are doing it in the canvas now doesn't suddenly make it not incredibly obnoxious. Pop-up windows were bad design 20 years ago and they're bad design now.


Pop-up windows that take focus over other running applications are a whole another ball game, dude. In-app popups are a pattern that may or may not be annoying, like most things used well or badly. This case seems pretty innocuous to me.


No, they are always annoying. I'm here to read an article, the only thing this pop-up window does is get in my way.


Here's some tools that help make the web suck a little bit less:

Kill Sticky: https://github.com/t-mart/kill-sticky

NoScript: https://noscript.net/

Reader mode: https://duckduckgo.com/?q=browser+reader+mode

Or yeah just close it. Life's too short to put up with websites that hate you. I rarely bother visiting Medium or Substack articles because of how hostile they are to readers.


I personally would not call it "incredibly" annoying. There's much more annoying things out there.

Maybe mildly annoying? You can easily close it and continue reading.


It's just that the amount of "mildly annoying" annoyances starts to pile up. Yes, it can be closed, but what's the rationale behind offering a modal subscription popup if I haven't even read the article and don't even know who wrote it and what else he wrote?

Offer a decently sized, floating, pinned-to-top, non-blocking subscription banner, perfectly fine.


Didn't see that. I guess that's just uBO working as intended.


My uBO didn't catch them for the record. I just added the following rule to be sure:

    substack.com##[class^="frontend-components-SubscribePrompt-"]


Thanks for that filter. I was trying to make one on my own, but couldn't quickly figure out the documentation. How did you learn making these?


If you have no knowledge you can still make use of element picker in the context menu. In this case though the problematic element will have a generated class name like `frontend-components-SubscribePrompt-<random>`, so I resorted to the CSS syntax (`<hostname>##<css selector>`). There are a lot, a freaking lot of them [1] but the CSS syntax alone can achieve a lot.

[1] https://github.com/gorhill/uBlock/wiki/Static-filter-syntax


As disrespectful and annoying as it is, don't sites do this because it's a net gain in conversions for them? I think we're a minority here. At scale, do not the masses submit to these popups and end up converting into sales?


You’re partly right. If anyone signs up at all, then it’s a net gain in ‘conversions’. But it’s not a majority at scale, and it doesn’t have to be anywhere close in order to work. Typically only a small minority of visitors convert to being email subscribers, and then only a small minority of those email subscribers convert into paying customers. It’s not uncommon for the paying customers at the bottom of the funnel to be in the single-digit percent of visitors, or less, and the people annoyed by popups who leave to be the overwhelming majority.


To echo another person in the thread, surely the best time to engage someone is at the end, when they've finished reading a (hopefully) substantive article, and want to stay engaged with the writer. Not after a few paragraphs.


Aren’t you assuming most people finish the full article? I find that quite unlikely, especially for longer articles.

I would definitely imagine a sweet spot between “they are now engaged” and “they haven’t given up or gotten distracted yet” is optimal, especially given people’s attention spans these days :)


You'd be shocked at how varied people can be and what effects your actions will have on them. The cumulative effect of those actions can be especially counter-intuitive when there are multiple orders of magnitude at play (like the 100:1 or 1000:1 odds substack must have on anyone interacting positively with the popup).

It wouldn't surprise me at all if any of the following effects (or hundreds of others) are enough to make a difference.

- The people most likely to pay for in-depth tech articles are also high earners with large demands on their time, and they're likely to be interrupted before they finish the article or bounce before the conclusion because they've internalized the meat of the content already. Asking early is annoying, but if it's even 10% as effective then it might overcome the bounce rate. Optimizing total revenue might happen by targeting those people, even at the expense of ruining the sign-in conversion rate.

- Their A/B testing treats sign-ups and subscriptions as information-independent funnels. They're optimizing sign-ups, hoping to therefore optimize revenue, but the sort of person most likely to sign-up after being afronted with a pop-up is also the sort of person who doesn't care that the pop-up appeared before they finished.

- Sign-ups are dominated by people who are hooked on a single article after then first paragraph or three and who can't stand to put it down. After the psychological investment of making an account, they're more likely to stay.

- Sign-ups are roughly orthogonal to the current article. People sign up when they think doing so will have enough value over time, and they have enough signal from the article to figure that out well before they've extracted all the benefit they'll get from the article. The pop-ups only capture people who were about to convert anyway, and the pop-up just made the conversion easier.


Well people don’t pay and websites have bills to pay. Sure many of them want to put out good content but they need to pay the bills to keep doing that. It really is a perfect usecase for crypto currency if they could manage the throughput. They should be an option to pay a penny/fraction of a penny to view a page without ads that people could just one button click and move on. Any amount larger than a $1 and it should bring up a verification screen so people don’t get swindled.


Sounds like it would be easier to just use C anyway.


Yes, but C is a much more arcane and complicated language. Case in point: instead of writing a+b in Python, in C you would have to write a+b

(Jokes aside, it would really be more complicated in C if a and b were actually strings or lists.)


> it would really be more complicated in C if a and b were actually strings or lists

You don't event need strings or lists for that. Just imagine bigger numbers for a and b. Arbitrarily long integer addition is not a native language feature in C.

edit: formatting


to be concrete:

    fac = lambda n: 1 if n<=1 else n*fac(n-1)
    a, b = fac(42), fac(69)
    a + b
is 3 lines of python; how many lines of C code would it take to execute?


One: return 1.7112245243e98

Joking aside: these are not catch-all comparisons. Nor is the article. But Python is mucher slower and much safer than C. It's easier to start in Python than in C.


;-P Just to be pedantic, if you're going to all the trouble to give (modulo lack of a repl) a wrong answer fast, might as well do it really quickly:

    return 0;
in the hopes that compiles down to something like:

    xor rax, rax
    ret


Or numbers that add up to over the size limit of what ever byte/int/long type you are using.

There's a surprising amount of depth to adding two numbers.


> much more arcane and complicated

How much more, though? The conventional wisdom here seems to be that it's worth taking the unavoidable performance hit of dynamically typed scripting languages because the productivity boost to programmers balances it out... but I don't believe I've seen that productivity boost measured. Once you know what you're doing in C, you can do that same things you can do in Python. There's some (fascinating) syntactic sugar in there, but Python can easily be just as incomprehensible as C.


You can become productive in python faster than you can in C. My background is the natural sciences and on reason python is used a lot in these fields is it's ease of use for people not coming from a computer science/engineering background.


a+b in Python adds two numbers. a+b in C adds the numbers if their sum is below a certain size, if not then it's undefined behaviour and destroys the security of your entire program.


It still isn't simple

#! /Usr/bin/python

Print(a+b)

V

#include <stdio.h>

Int main (){ Printf("%i", a+b); Return 0; }

And printf is basically a DSL, so it still isn't 'simple' And this is assuming a+b fits into an integer


If you're being silly you don't need a lot of that:

    $ cat tmp.c
    main() {
       printf("%d\n", 3+4);
    }
    $ gcc -w -o tmp.out tmp.c && ./tmp.out
    7
You can even do away with types entirely, as long as you're working with ints:

    $ cat tmp.c
    foo(x) {
       return x+5;
    }
    bar() {
       return 4;
    }
    main() {
       printf("%d\n", foo(bar()));
    }
    $ gcc -w -o tmp.out tmp.c && ./tmp.out
    9
Rarely a good idea, though!


True, but then someone will pop up and announce UB! (This probably isn't ub though).

Anyway, my main point was that c has more boilerplate. It's never 'a+b'.

Second, printf is complicated, that's aimed more at the op though.


If a and b are strings, Python would make "ab".


C++ otoh is actually more arcane and complicated.


There are plenty of AOT compiled languages, C isn't the only option, thankfully.


It's even easier to just buy a desk calculator or even just learn long addition and do it with pencil and paper.


Actually, I generally recommend to not leave easy tasks to the computer. The brain likes to have to do stuff sometimes.


why waste time?


Easier is exactly what that wouldn’t be


Easier for whom?

The whole point of high level languages is that you put in effort upfront to make everyone elses job easier.

By you logic, using machine code directly onto toggle switches is easiest. No assembler to write, no test editor to write.....


Tell that your average data Joe and he will say "But C is a letter, not a programming language".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: