* Attrs - To reduce the boilerplate of defining classes, pre-dates dataclasses, but still has a bunch of capabilities that dataclasses don't.
* Pydantic - A declarative data validation + [de]serialization tool, mostly to be used at the boundaries between systems.
Assuming my understanding is correct, it feels a bit odd to compare them as if they were like-for-like alternatives to each other. It wouldn't be crazy to use both in your code.
Personally what I'm missing in my suite of libs, is something that only does data validation. The nicest (IMO) API I've seen for it is Django's forms, but I don't like that they're inherently coupled with the UI aspect of said forms and Django itself.
What I've really been looking for something that I can just use to check a data structure i'm passing in to a function call is valid. One use case is related to state machines, where I'm providing some data along with a transition name, and want to make sure the combination of transition + data is valid.
It works great for a python stack and you can generate some clean looking code if you have a reusable data class package
Also, Attrs and Pydantic both support arbitrary additional validation on fields.
It's great stuff, but I'm always looking for alternatives.
You're right, `attrs` solves a different problem, but `cattrs` builds on top of it to tackle the same ones as Pydantic.
I'm a die-hard pydantic fan, but it's refreshing to see other perspectives, while the author is acknowledging it is their opinion, without getting all holy-war. Also, they aren't orthogonal. Pydantic is definitely heavier than attrs in terms of processing. This is what makes pydantic a great bastion at shearing layers. "Validate all 10000 ints" is exactly what you want when parsing a request, CLI input, or some configuration.
Also, pydantic makes it almost trivial to write top-level app config logic that is populated from configs, env variables, secrets, etc.
Also, I can generate pydantic structures from openApi, jsonschema, etc, and conversely generate schema/swagger from pydantic. This is game-breaking amounts of awesomeness.
On the other hand, inside business logic, pydantic has the downsides TFA mentions. I would however still contend some validation sprinkled in with business is helpful to reign in some of that zany python dynamicism in huge codebases.
I like the simplicity and composability of the c/attrs approach. Along with the "pydantic does not like positional args," (`__root__=(a,b,c,...)`, ugh) I'll be considering this for retrofitting some code paths that are currently dicts with actual structured types.
I didn't know about the performance differences. Sam Colvin strikes me as very thorough and the community is very involved, so I don't think the benchmarks claiming pydantic faster than attrs is wrong. I know pydantic uses cython under the hood (no idea what) - is it possible that environmental differences are causing the discrepancy?
By comparison, the system of tools around Attrs and Cattrs is a bit scattered and underdeveloped. I don't think there's a technical reason for it, that's just how it happened. It would be great to see either a competitive parallel "stack" developing, or increased interoperability from libraries that currently only support one or the other.
For this reason, in my personal projects I use Attrs almost all the time when writing new classes. I don't use Pydantic. But at work, we are very likely going to adopt the aforementioned "Pydantic CRUD stack" for new internal services, specifically because our needs are straightforward, and for straightforward use case is a "just works" and "there's only one right way to do it", which I think are extremely valuable features when operating in a team.
> The author of this article (https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses...) poits out that replacing dateutil.parser.parse function with datetime.isoformat increases performance of attrs in 7 times.
> There are three reasons to use dataclasses over attrs:
Another, not in this list, is that since it is in the standard library, co-workers/contributors are significantly more likely to have heard of dataclasses and already know its interface. Another reason, sort of along similar lines, is the fact that that dataclasses has fewer features than attrs - the author mentions this as a disadvantage, but actually it's a benefit if it has the features you need, because a simpler library is less cognative overhead to write and maintain code against.
Devs moving off of attrs generally go to pydantic in my experience.
Direct usage of dataclasses is literally zero across several python companies and many python codebases I have worked with.
The nice thing here is that, since dataclasses are more-or-less just slimmed-down attrs, it's relatively easy to migrate from dataclasses to attrs at a later date.
I find it preferable that such changes are possible outside the central committee bureaucracy. You really don't want to end up like C++ w.r.t language evolution.
When I do that, I ask myself: "how is it different from writing Java code?". I don't have a clear answer. It becomes almost as verbose as Java. Java has better type validation (doesn't compile on mistake); it is faster (at least some payback for verbosity); real type system has additional advantages, for example using types as part of method signature.
Maat looks really interesting for this use-case.
The recent proliferation of type annotations is also trending towards the static typing mindset, which is also cargo culting on a massive scale.
I'm not talking mypy --strict here, heck I've yet to achieve that myself. But if your are writing functions with obvious realizations of interfaces and not typing them, all you are doing is creating more mental strain for the next consumer (often future you) of that function down the road.
At $LASTCO I inherited some jupyter-notebook-copy-pasted datascience ML abomination of a pipeline. Dicts everywhere, mutation everywhere, zero docstrings, nary a type hint in sight. Took two exceptionally stressful months, with constant back and forth with the authors, to get it working in prod ("it works on my machine, I don't understand the problem"). $THISCO embraces type hints, functional style, etc. My stress level has actually normalized.
Typing `foo(bar: Optional[float])` takes what, 2s, more than `foo(bar)`? Asking "hey Steve, what does "bar" take in function "foo"? on slack is already more time and characters than just annotating it, times every dev that doesn't know and has to ask.
I'm skeptical that the people that produce the kind of abomination you encountered would produce something less horrible if they went to town with static typing. I've seen too much Java to fall for that ;) I'm not averse to putting in some annotations and data validation chokepoints here and there.
I'd ask Steve why he called it `foo(bar)` rather than `set_width(width)`. Static typing fans shift the semantics of code onto a type system. I'd prefer it if they focused more on better naming than spending time on designing complicated Pydantic models.
It's better than writing it in a docstring, because a type checker will tell you to change the type if you change how you use a variable.
Does everyone need to go all the way and type 100% of things and use heavily generic code to represent all possible cases? Well, that would be wonderful, but just sprinkling built-in types is already a massive improvement over no types at all.
But even moderately complex libraries and applications are such a huge pain to develop, read, and maintain without some of the tooling that leverages type annotations. This is especially true in the ML / AI / Data Science world where a lot of the people implementing models have dubious
Pure dynamic typing paradigms simply have not delivered on their promises over the last 30 years. There are definitely some areas where they make sense, but I doubt we will see a massive readoption until our tooling becomes sufficiently intelligent. Imagine, for example, a probabilistic type inference based on both the structural aspects of the code and previous runs over actual data.
I'd say Python's large scale adoption is exactly a good example of dynamic typing delivering on its promises.
However doing this for arbitrary iterables is impossible, because the iterable might be lazy or infinite. In that case maybe the best option is to wrap the iterable with something that would validate each element as it came off of the iterator that is produced from the iterable.
Personally I think that is not the fine way to bash other projects in the open source community. Clearly, attrs+cattrs and pydantic focus on different things. Let us all live together peacefully :)
To your second, I struggle to see this as bashing. It's certainly not an objective comparison, but it is about as even-handed a comparison as I'd be willing to ask of a mere mortal when they're also personally invested in the subject. You don't see sentences like, "Pydantic is wrong!" you see ones like, "Pydantic is very opinionated about the things it does, and I simply disagree with a lot of its opinions," or, "I disagree with this. Un/structuring should be handled independently of the model." That's not bashing; that's constructive criticism. It's worth noting that he also acknowledges that Pydantic does some things better.
For my part, I think my only real complaint about this article is that he doesn't really pay enough attention to the fact that, despite their overlapping functionality, (c)attrs and Pydantic are optimizing for very different use cases. That leaves me thinking that some (though far from all) of his criticism has a certain, "This screwdriver isn't very good at driving nails," characteristic.
* Neither Pydantic nor Cattrs handle unions like how I'd expect (although Cattrs has stronger guarantees in converting Unions)
>>> class Y(BaseModel): pass
>>> class X(BaseModel): pass
>>> class Z(BaseModel): a: Union[X, Y]
Z(a=X()) # Converts Y to X implicitly
They do work well for simple python types but what I'd like to see is guarantee that the serialisation operation is completely reversible and if not raise warning/exception.
That’s it. I simply can’t stand them, they drive me crazy. When I first saw them I wondered what the fuck “ib” meant. And the thing is, I’d expect every decent python programmer to have the same reaction: . is syntax; it simply can’t be part of a name. The library is called “attrs”, so why is it imported as “attr”? I simply don’t know what the author thought he was doing.
I just don’t see how it is appropriate for a serious library to contain a joke that’s going to trip up literally every programmer with any taste. Maybe this is my problem, maybe one day I’ll relax and wonder why I was making such a big deal about it. But for the last several years it has made me think that the author completely lacks the judgement required to be a library author.
To add to my criticisms, imagine what the python ecosystem would look like if everyone thought they could have names that spanned attribute lookup syntax!
At first, some people have a negative gut reaction to that; resembling the reactions to Python’s significant whitespace. And as with that, once one gets used to it, the readability and explicitness of that API prevails and delights.
For those who can’t swallow that API at all, attrs comes with serious business aliases: attr.attrs and attr.attrib.
And anyway, the library is called “attrs”, so why is it imported as “attr”?
import attr as attrs
Python packaging is terrible, and giving package authors footguns like this. It's possible that it would be imported as "da39a3ee".
Clearly if they had chosen "da39a3ee" we wouldn't need to discuss whether they had good judgement.
Admitting to doing something obnoxious does not make it less obnoxious. If anything, it makes it more so. It says the person knew what they were doing was a problem and they wanted to share the problem with everyone.
@att.rs # because it refers to both AT&T and Rust
@at.trs # musicians will love it
@a.ttrs # because I never remember where the dot goes
@ttrs # because @ looks like an a @lready
@trs # @ is “at”! Get it?
@t.rs # I like both Perl and Rust
Say your API accepts the following JSON POST:
"title": "My new blog post",
"tags": ["writing", "productivity"]
More importantly: if there is a validation error, you need to provide an error message explaining what was wrong. This can be quite tricky to do well, especially for complex nested JSON objects.
Pydantic solves this problem really well - including returning detailed error messages helping show what went wrong.
The existing generators are generally terrible.
I think a lot of this boils down to the functional approach that (c)attrs takes, vs Pydantic's OO. Because it's more functional, there's higher composability and more power given to the user.
On my team, we use cattrs and add our own customizations to it to great effect, and these simply would not be possible with Pydantic.
That said, there are reasons Pydantic seems to be ascendant despite attrs being the library that inspired dataclasses. It's very much a batteries-included library, which fits the ethos of Python.
size: float = None
> This is no longer the recommended behavior. Type checkers should move towards requiring the optional type to be made explicit.