Hacker News new | past | comments | ask | show | jobs | submit login
Annotated explanation of David Beazley's dataklasses (simonwillison.net)
108 points by jonahbenton on Dec 20, 2021 | hide | past | favorite | 16 comments



Slightly on topic, a talk by David Beazley where he live codes a Wasm interpreter.

https://www.youtube.com/watch?v=VUT386_GKI8

He also teaches a compiler a course, https://dabeaz.com/compiler.html

> However, the project is structured in a way to help you succeed.


This is a very clever solution. To be clear, most users of Python don't care about startup time to that degree, but for some applications shaving off that 0.1 seconds would be meaningful.

However, Python's dataclasses have great features and I wouldn't want to give them up. What would be ideal is a solution for caching the generated code of dataclasses.


This is a neat trick, no question.

It is only about the performance of the decorator itself, not about the decorated class, right? So faster imports (if imports are your bottleneck) but equally-performant classes.


Will this caching optimization make it into the standard library?


From the README:

> Q: Who maintains dataklasses?

> A: If you're using it, you do. You maintain dataklasses.


Is this something that is safe to rely on as python evolves?


To the same extent that any other bytecode manipulation is safe to rely on: everything is documented, but everything can change in any major release.


Another major python release (i.e. Python 4) would lead to widespread rioting, massive overloading of mental health services and numerous cases of spontaneous human combustion.


Yes, but would it still be worth it? That's the question that maintainers should ask themselves. Pretty hard to do much of anything without uproar of some sort at this point, even the most recent no-GIL proposal had people complaining about it. Just use the version you're happy with and live your life.


I didn't say it wouldn't happen. Point is a new major python version is extremely rare and a huge deal. Having to update dataklasses would be very low on the list of concerns.


Then I guess the related question is: is this safe to use with Python implementations other than the main one (CPython)? For example, PyPy or Nuitka, neither of which use byte code (or at least I don't think so)?


> Then I guess the related question is: is this safe to use with Python implementations other than the main one (CPython)?

Absolutely not.

> For example, PyPy or Nuitka, neither of which use byte code (or at least I don't think so)?

pypy absolutely uses bytecode. Though not necessarily cpython's bytecode (although I'd expect they have at least a bridge for cpython compatibility).

Jitting straight from the source is pretty rare and tends to be avoided: it means either you don't have a baseline interpreter, or that you're repeating the work of the interpreter in your JIT, or possibly that you're interpreting and jitting from the AST.

The former is what Chrome did at the start, it had a baseline compiler and an optimising compiler, which was pretty weird. After a while a baseline interpreter (ignition) was introduced, and the bytecode it generated got fed into the optimising compiler. Its timeline was a bit odd because rather than building a bytecode interpreter then building a jit for that bytecode they actually started with a bytecode-based JIT (turbofan) then figured they could interpret the bytecode directly instead of having to deal with (and patch) code from the baseline compiler, so the interpreter (ignition) grew from the effort of building a new compiler, rather than the compiler being built for the interpreter.

As for AST interpretation I know it's pretty common for teaching / learning but I don't know if there's any system actually doing it for real "in production", the amount of pointer chasing seems like it'd be ridiculously inefficient.


This doesn't directly manipulate bytecode but instead just renumbers variables on code objects.


Same diff, the entire thing is an implementation detail of cpython and deeply tied to that VM.


This looks like it would be pretty stable in practice. The iffiest assumption I see is that names are always ordered by first appearance.

It would be worse if it touched the bytecode itself, I've seen that break even in a bugfix release.

It actually works in PyPy (if you change the way dictionaries are merged so that it's 3.8-compatible). Other Python implementations could have a harder time.

A future Python release could always overhaul the bytecode system to the point this no longer makes sense. I think they'd consider it a backward compatibility break, but I'm not sure, and Python makes those fairly regularly regardless.

If the optimization were added to the standard library then it could just be kept in sync with the interpreter.


You can always write some tests to verify behavior and fall back to the standard library if this winds up not working.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: