
Cluegen – Python Data Classes From Type Clues - gigatexal
https://github.com/dabeaz/cluegen
======
miohtama
Python imports and slow startup speed hurts Python developers on moderate and
large projects. It's the downside of nothing having more static module
exports. I believe JS/TypeScript folks try to avoid this trap. In Python, to
know what things module exports you need to run the module, there is no static
analysis way to find out it otherwise.

Making imports fast for data classes may solve some of the problems. Old big
projects like Plone/Zope have solved this problem by making more generic lazy
import system that it extensively being used.

Use zope.deferredimport package for this:

[https://zopedeferredimport.readthedocs.io/en/latest/narrativ...](https://zopedeferredimport.readthedocs.io/en/latest/narrative.html)

Though I am not sure if zope.deferredimport has been updated to play nicely
with the modern typing tools like editors and MyPy.

~~~
fnord123
>Python imports and slow startup speed hurts Python developers on moderate and
large projects

What kloc is moderate and large? I have not run into a project where import
took that long.

>I believe JS/TypeScript folks try to avoid this trap.

Have you ever had to run a node project? I haven't benched it but I think it
would be faster to compile a rust project and start that up than use node. But
I can't web so maybe there's special ways to make node start quickly that I
just don't know about. (I think there's a law where you can get answers more
quickly by declaring something impossible to trigger people into flooding you
with great suggestions :-) ).

~~~
twa927
The problem I experienced multiple times in 50-200 KLOC projects is not the
time needed to import the modules, but the memory consumption caused by the
imports. Moving some imports from top-level module statements to inner
functions' code could improve the memory consumption several times, e.g. from
250MB per process to 80MB per process.

One tool I used was
[https://github.com/mnmelo/lazy_import](https://github.com/mnmelo/lazy_import)
but I'm not sure it's updated for Python 3.7/3.8.

------
sq_
Love this bit from the Q&A section:

> You should pronounce it as "kludg-in" as in "runnin" or "trippin". So, if
> someone asks "what are you doing?", you don't say "I'm using cluegen." No,
> you'd say "I'm kludgin up some classes." The latter is more accurate as it
> describes both the tool and the thing that you're actually doing. Accuracy
> matters.

~~~
tda
I read the readme only after this comment and was surprised by how detailed
this was. Why would someone put effort complicated tool, benchmark it etc. at
then not really take it seriously? Until I noticed who the author was, then it
suddenly made complete sense

~~~
sq_
I don't know the author at all really, but I do appreciate a professional in
any field who can find humor in their work and avoid taking themselves too too
seriously.

The writing in the README reminds me of people like Derek Lowe (author of the
In The Pipeline pharma/bio/whatever blog) and John D. Clark (author of
Ignition!) who can create exceptional things and deliver knowledge in a
humorous and engaging way.

------
nurettin
There is no setup.py because we can just copy cluegen.py to our project and
modify it. And there will be no new features.

This is probably the first library of it's kind that is unpackaged and claims
that it doesn't need packaging.

~~~
BiteCode_dev
David Beazley usually does that, and also refuses PR.

He is interested in providing PoC (like with curio), but not doing what comes
after.

~~~
lstamour
I don’t know, the code itself is shorter and easier to read than the readme
initially was. And maybe this will dissuade folks from installing and using
this if they’re not committed to maintaining it as part of their dependencies.
Dependencies do require your maintenance and oversight of new code revisions,
but few people schedule such time. It’s essential though, any dependency could
arbitrarily change its behaviour at any time, technically breaking your code
until you can produce a fix. So dependencies are technically yours to maintain
— you just choose to limit your maintenance from a “fork” to a version string
and API usage, but the maintenance burden still exists...

~~~
BiteCode_dev
Yes, and he doesn't care.

------
raziel2p
That is the dirtiest use of metaprogramming in Python I've ever seen. Well
done.

~~~
gigatexal
That’s just David being David. It’s awesome. And his talks always illuminate
all the cool parts of python as he does crazy things with it.

------
BerislavLopac
The main problem with this approach is that it conflates type hierarchies
(through class inheritance) with developer convenience; there are reasons why
both attrs and dataclasses chose the decorator approach.

~~~
smitty1e
I might quibble at calling this a "problem".

Every ingredient in the rack is spice. There is a "proper" amount to use for
the recipe. Let good taste be the guide. Never go Full [language I like to
drag].

------
hprotagonist
dabeaz is the King Bumi of the python world.

And,true to his physicist roots, he’s very good at gedankenexperimenten.

I’m unlikely to actually use this and can’t really think why I would, but
that’s hardly the point.

~~~
cooperadymas
I'm trying to figure out if you intend some comparison beyond "somewhat mad
but effective."

Is he waiting patiently for the right moment to strike? Does he pretend to be
dumb but is actually incredibly smart?

:-)

~~~
hprotagonist
he’s a mad genius who would rather go cabbage-traincar sledding with his
friends than rule.

there’s a lot to admire in that, frankly.

~~~
sitkack
Do they have that in Evanston?

~~~
hprotagonist
no, just hackney's.

------
Congeec
I use
[https://github.com/samuelcolvin/pydantic](https://github.com/samuelcolvin/pydantic)
instead

~~~
softinio
I use pydantic also and its great. Funny the README of this new lib claims it
hasn't been done before. Good to have choice never the less.

------
uryga
i'm curious about that __eq__. it generates code like

    
    
      (self.x, self.y) == (other.x, other.y)
    

but i think this would be more efficient:

    
    
      self.x == other.x and self.y == other.y
    

because you avoid creating a tuple (possibly heap-allocated) and if `.x`
differs, you avoid a dictionary lookup¹ for `.y` thanks to short-circuiting. i
think i benchmarked it at some point, but that was a while ago (on CPython
3.5)

\---

btw, i went with a similar approach (i.e everything is codegened) in my sum
type library:
[http://github.com/lubieowoce/sumtype](http://github.com/lubieowoce/sumtype)

i never got around to generating the methods lazily, maybe i should!

for correctness reasons i switched from interpolating raw strings to <array of
(line, indent-level)> and a bunch of wrappers – generating if-elif-else chains
via raw strings gets scary. but they work well enough here

\---

1\. iirc even though it's using slots, it still has to look up the descriptors
for `.x` and `.y` in the class dictionary. another possible optimization would
be to trade off memory (and extensibility) for time and "cache" those.
something in the vein of

    
    
      def generate(...):
        ...
        get_x = MyClass.x.__get__
        get_y = MyClass.y.__get__
        ...
        def eq(self, other):
          return (get_x(self), get_y(self)) == (get_x(other), get_y(other))
      
        MyClass.__eq__ = eq
    

here, `get_x` and `get_y` would be closed-over in `eq`, so they shouldn't
incur a dictionary lookup. which of course adds significant complexity, but
that may be a sensible trade-off

------
tutuca
> Yes. Yes, you could do that if you wanted your class to be slow to import,
> wrapped up by more than 1000 lines of tangled decorator magic, and
> inflexible.

At first I thought: there's no way someone is this categorical about some
obscure aspect of python's internals, then I noticed it's from dabeaz. What a
man :)

------
_ZeD_
>>> Q: Who maintains cluegen?

>>> A: If you're using it, you do. You maintain cluegen.

while I know and respect dabeaz for all his work, I still prefer to rely on
the "official" dataclasses module.

~~~
sixhobbits
This wasn't actually in the original

[https://twitter.com/honnibal/status/1260544951638732801?s=20](https://twitter.com/honnibal/status/1260544951638732801?s=20)

------
pietroppeter
luckily there is a PR that will fix the par.t of th.e read.me that is
un.read.able!

[https://github.com/dabeaz/cluegen/pull/4/commits/b506ed86697...](https://github.com/dabeaz/cluegen/pull/4/commits/b506ed866970f3ffaaf20f3a65deb7c1b84c7dd0)

~~~
BiteCode_dev
That's a joke, and Beazley don't accept PR anyway.

