Hacker News new | past | comments | ask | show | jobs | submit login
I use Attrs instead of Pydantic (threeofwands.com)
132 points by YohAsakura on Aug 26, 2021 | hide | past | favorite | 74 comments

I'm a bit confused, because I ultimately understood Attrs and Pydantic to be aimed at solving different problems. I may've been wrong, but my understanding was that:

* Attrs - To reduce the boilerplate of defining classes, pre-dates dataclasses, but still has a bunch of capabilities that dataclasses don't.

* Pydantic - A declarative data validation + [de]serialization tool, mostly to be used at the boundaries between systems.

Assuming my understanding is correct, it feels a bit odd to compare them as if they were like-for-like alternatives to each other. It wouldn't be crazy to use both in your code.

Personally what I'm missing in my suite of libs, is something that only does data validation. The nicest (IMO) API I've seen for it is Django's forms, but I don't like that they're inherently coupled with the UI aspect of said forms and Django itself.

You could have a look at Marshmallow perhaps? I use it to great effect with marshmallow-dataclass so I don't have to define the boiler-plate data classes separately to the validation/schema. Essentially the dataclass (with type-hints) becomes the data-validation specification.




Worth pointing out Desert, which generates a Marshmallow schema from an Attrs class. You could then integrate this with Apispec or Marshmallow-Jsonschema.




Thanks, I've had a look at Marshmallow in the past, but it's still doing the extra bits of dealing with the extra bits of serialization, it never really felt all that dissimilar to Pydantic.

What I've really been looking for something that I can just use to check a data structure i'm passing in to a function call is valid. One use case is related to state machines, where I'm providing some data along with a transition name, and want to make sure the combination of transition + data is valid.

Pedantic has a validate_arguments decorator that while in beta, I think does what you want: https://pydantic-docs.helpmanual.io/usage/validation_decorat...

I have used a combo of attrs and fastjsonschema to do dataclass validation on instance construction.

It works great for a python stack and you can generate some clean looking code if you have a reusable data class package

Design-by-contract, perhaps? There are a couple libraries like this currently, and you could pretty easily write your own framework for it.

Also, Attrs and Pydantic both support arbitrary additional validation on fields.

Sounds like you want jsonschema.

I use marshmallow-dataclass in an API client lib I've been continuously working on for a while for work.

It's great stuff, but I'm always looking for alternatives.

Note the author is comparing `cattrs` with Pydantic, not `attrs`.

You're right, `attrs` solves a different problem, but `cattrs` builds on top of it to tackle the same ones as Pydantic.

Thanks for the clarification, cattrs actually looks pretty cool!

Pydantic's "BaseSettings" has pretty much cured the evergreen itch of writing my own configuration languages. I love that env vars automatically override default values.

Agreed. All my Go apps were written this way, it was nice to use Pydantic in a similar way. Especially in K8s, where most of my config is env vars.

That's just like, your opinion, man!

I'm a die-hard pydantic fan, but it's refreshing to see other perspectives, while the author is acknowledging it is their opinion, without getting all holy-war. Also, they aren't orthogonal. Pydantic is definitely heavier than attrs in terms of processing. This is what makes pydantic a great bastion at shearing layers. "Validate all 10000 ints" is exactly what you want when parsing a request, CLI input, or some configuration.

Also, pydantic makes it almost trivial to write top-level app config logic that is populated from configs, env variables, secrets, etc.

Also, I can generate pydantic structures from openApi, jsonschema, etc, and conversely generate schema/swagger from pydantic. This is game-breaking amounts of awesomeness.

On the other hand, inside business logic, pydantic has the downsides TFA mentions. I would however still contend some validation sprinkled in with business is helpful to reign in some of that zany python dynamicism in huge codebases.

I like the simplicity and composability of the c/attrs approach. Along with the "pydantic does not like positional args," (`__root__=(a,b,c,...)`, ugh) I'll be considering this for retrofitting some code paths that are currently dicts with actual structured types.

I didn't know about the performance differences. Sam Colvin strikes me as very thorough and the community is very involved, so I don't think the benchmarks claiming pydantic faster than attrs is wrong. I know pydantic uses cython under the hood (no idea what) - is it possible that environmental differences are causing the discrepancy?

As of 2021, there is now a complete Pydantic "CRUD stack" (ODMantic/SQLModel + FastAPI) that is ridiculously easy to get up and running with, and it has all of the features one might want in a brave new type safe world.

By comparison, the system of tools around Attrs and Cattrs is a bit scattered and underdeveloped. I don't think there's a technical reason for it, that's just how it happened. It would be great to see either a competitive parallel "stack" developing, or increased interoperability from libraries that currently only support one or the other.

For this reason, in my personal projects I use Attrs almost all the time when writing new classes. I don't use Pydantic. But at work, we are very likely going to adopt the aforementioned "Pydantic CRUD stack" for new internal services, specifically because our needs are straightforward, and for straightforward use case is a "just works" and "there's only one right way to do it", which I think are extremely valuable features when operating in a team.

What's the use case for Attrs in a post-dataclasses world? I use dataclasses all the time, and I've never felt the need to reach for Attrs. What am I missing out on?

More configurability, slots by default, faster to create the class object. I've heard stories that loading a module with a lot of dataclasses defined in it can be really slow. Basically dataclasses are for the people who can't afford a 3rd-party dependency for whatever reason.

Substantially more customization hooks in attrs.

SQLModel is brand new? why not just use sqlalchemy which is more mature?

SQLModel is a glue/shim layer for defining models that are both pydantic & SA compatible so you don't have to duplicate the effort.

The former wraps the latter, I think.

https://github.com/samuelcolvin/pydantic/pull/1568 If I'm reading this issue right the benchmarks in the pydantic docs are misleading


> The author of this article (https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses...) poits out that replacing dateutil.parser.parse function with datetime.isoformat increases performance of attrs in 7 times.

A little off topic, but the article mentions:

> There are three reasons to use dataclasses over attrs:

Another, not in this list, is that since it is in the standard library, co-workers/contributors are significantly more likely to have heard of dataclasses and already know its interface. Another reason, sort of along similar lines, is the fact that that dataclasses has fewer features than attrs - the author mentions this as a disadvantage, but actually it's a benefit if it has the features you need, because a simpler library is less cognative overhead to write and maintain code against.

In my experience way more Python devs know and are comfortable with attrs than dataclasses simply because its been around much longer.

Devs moving off of attrs generally go to pydantic in my experience.

Direct usage of dataclasses is literally zero across several python companies and many python codebases I have worked with.

One nice thing about making this decision is that the decision rubric for choosing between dataclasses and attrs is really straightforward. In addition to all the reasons the article lists for why you might need to use dataclasses, I'd personally also default to them unless I know I need one of the things that attrs does better: robust validation, or performance.

The nice thing here is that, since dataclasses are more-or-less just slimmed-down attrs, it's relatively easy to migrate from dataclasses to attrs at a later date.

Recently there's a strong tendency in the Python ecosystem to wrap perfectly fine idioms with even higher abstractions. Not sure if I like this, in the end it leads to a lot of cargo culting and people writing code they don't fully understand because it contains a lot of metaclass and runtime magic. But I understand that a lot of developers find this attractive, and type annotations really provide a good channel for realizing such functionality, so maybe I'm just being grumpy here.

One aspect may be that people just aren't that satisfied with some of the design decisions of Python. In this case the (perceived) issue is probably unnecessary verbosity in declaring classes, which I agree with. Due to the lack of macros changes are made through decorators and other metaprogramming magic, which brings about some collateral issues.

I find it preferable that such changes are possible outside the central committee bureaucracy. You really don't want to end up like C++ w.r.t language evolution.

For myself, better code completion is enough to lure me into the world of fully type-annotated python. I don't understand what you mean by saying "cargo cult", please enlighten me.

Cargo cult probably refers to doing something based on superficial assesment without understanding what is actually done. Originally refers to problems in science. https://calteches.library.caltech.edu/51/2/CargoCult.htm

> fully type-annotated python

When I do that, I ask myself: "how is it different from writing Java code?". I don't have a clear answer. It becomes almost as verbose as Java. Java has better type validation (doesn't compile on mistake); it is faster (at least some payback for verbosity); real type system has additional advantages, for example using types as part of method signature.

I agree, Python is/was about simplicity. That is why for validation I use (Maat)[https://pypi.org/project/Maat/] It's just a dictionary to describe the data, and it's easy to expand.

I also prefer my python to stay simple.

Maat looks really interesting for this use-case.

Yes, this higher abstractions tendency is clearly away from "Flat is better than nested". Creating good abstractions is so difficult (and important) that it should not be done lightly by everyone.

The recent proliferation of type annotations is also trending towards the static typing mindset, which is also cargo culting on a massive scale.

Static typing in python isn't "cargo culting." In fact I'd go even farther in the opposite direction, and say that if you are using python for production software engineering, and aren't typing most (>50%) of your function signatures, you are wasting everybody's time, your teams', code reviewers', and most of all your own time. Opinion, but empirical experience bears this out for me.

I'm not talking mypy --strict here, heck I've yet to achieve that myself. But if your are writing functions with obvious realizations of interfaces and not typing them, all you are doing is creating more mental strain for the next consumer (often future you) of that function down the road.

At $LASTCO I inherited some jupyter-notebook-copy-pasted datascience ML abomination of a pipeline. Dicts everywhere, mutation everywhere, zero docstrings, nary a type hint in sight. Took two exceptionally stressful months, with constant back and forth with the authors, to get it working in prod ("it works on my machine, I don't understand the problem"). $THISCO embraces type hints, functional style, etc. My stress level has actually normalized.

Typing `foo(bar: Optional[float])` takes what, 2s, more than `foo(bar)`? Asking "hey Steve, what does "bar" take in function "foo"? on slack is already more time and characters than just annotating it, times every dev that doesn't know and has to ask.

The problem with creating excessive types in production systems (which is what I want people to avoid and can happen if you go too far with Pydantic models + type annotations into pseudo-"static typing") is that you move outside the area that Python's core is already designed to generically cover in idiomatic ways. You end up wasting your own (and reviewers') time creating unnecessary code that future you will have to now maintain and extend to more generic cases.

I'm skeptical that the people that produce the kind of abomination you encountered would produce something less horrible if they went to town with static typing. I've seen too much Java to fall for that ;) I'm not averse to putting in some annotations and data validation chokepoints here and there.

I'd ask Steve why he called it `foo(bar)` rather than `set_width(width)`. Static typing fans shift the semantics of code onto a type system. I'd prefer it if they focused more on better naming than spending time on designing complicated Pydantic models.

Type hints just make everything so much easier. When you're writing a function, you _know_ that parameter is an int, or float, or something you can use a list comprehension on (iterable). Why not just say it right there in the function definition?

It's better than writing it in a docstring, because a type checker will tell you to change the type if you change how you use a variable.

Does everyone need to go all the way and type 100% of things and use heavily generic code to represent all possible cases? Well, that would be wonderful, but just sprinkling built-in types is already a massive improvement over no types at all.

A sprinkling of type annotations to help with ambiguity is nice. However, I don't recall the last time I spent ages figuring out what type I need to pass to a function if it simply wants a builtin type. It's usually the semantics or a library-unique type (ugh!) that I have to look up.

Definitely. This to be especially terrible with some libraries (example: sqlalchemy) which have a crap load of types. You're not really sure what's being returned, etc.

I think the comment about types takes a rather narrow view of Python use cases. Sure, if you’re some lone sysadmin using Python as an alternative to bash to implement some questionable ETL pipeline comprising a bunch of random scripts running off a single machine somewhere then I agree. You probably are not the target audience for type annotations.

But even moderately complex libraries and applications are such a huge pain to develop, read, and maintain without some of the tooling that leverages type annotations. This is especially true in the ML / AI / Data Science world where a lot of the people implementing models have dubious coding practices.

Pure dynamic typing paradigms simply have not delivered on their promises over the last 30 years. There are definitely some areas where they make sense, but I doubt we will see a massive readoption until our tooling becomes sufficiently intelligent. Imagine, for example, a probabilistic type inference based on both the structural aspects of the code and previous runs over actual data.

People in the data science world with dubious coding practices should keep their data in commonly used types such as Pandas tables. The last thing we should want from them is encoding their data with ill-conceived class/type systems. Not sure if that is what you mean.

I'd say Python's large scale adoption is exactly a good example of dynamic typing delivering on its promises.

Specifically I use pydantic for validation, so yes, I expect pydantic to iterate over a list of 10000 elements and verify these are all ints.

Yeah, not doing that when using typescript is leading to so many bugs. Yes the compiler is happy, but you will get a runtime error as the json decoding converted it to a string, not the expected number type being passed around. So add in a second step that verifies that the decoded json actually matches the type and fail fast. One thing I liked about Elm was the decoders, a bit hassle to write but made it so easy catching bugs when backend and frontend people weren't on the same page.

It needs to be configurable IMO, but yeah for small lists I would rather loop and check than not check at all.

However doing this for arbitrary iterables is impossible, because the iterable might be lazy or infinite. In that case maybe the best option is to wrap the iterable with something that would validate each element as it came off of the iterator that is produced from the iterable.

Having run into these issues with Pydantic, we've been using Mashumaro[1], which, while not having all the bells and whistles of Pydantic, has served us pretty well.

1: https://github.com/Fatal1ty/mashumaro

Wait so there is a Marshmallow and a Mashumaro which is the romanisation of the Japanese translation of marshmallow!? Talking about giving projects confusing names…

Isn't part of validation making sure you get the expected data in the expected format?

The author of the article is very biased because he is also the (co-)author of these libraries.

Personally I think that is not the fine way to bash other projects in the open source community. Clearly, attrs+cattrs and pydantic focus on different things. Let us all live together peacefully :)

To your first point, that's the third sentence of the article itself.

To your second, I struggle to see this as bashing. It's certainly not an objective comparison, but it is about as even-handed a comparison as I'd be willing to ask of a mere mortal when they're also personally invested in the subject. You don't see sentences like, "Pydantic is wrong!" you see ones like, "Pydantic is very opinionated about the things it does, and I simply disagree with a lot of its opinions," or, "I disagree with this. Un/structuring should be handled independently of the model." That's not bashing; that's constructive criticism. It's worth noting that he also acknowledges that Pydantic does some things better.

For my part, I think my only real complaint about this article is that he doesn't really pay enough attention to the fact that, despite their overlapping functionality, (c)attrs and Pydantic are optimizing for very different use cases. That leaves me thinking that some (though far from all) of his criticism has a certain, "This screwdriver isn't very good at driving nails," characteristic.

Pyantic and Cattrs both suffer with unacceptable issues and footguns. I have used Pydantic in production a lot and have played around with Cattrs. I've also looked at similar libraries like Dacite and marshmallow-dataclasses but none seem to be well thought out and mature.

* Neither Pydantic nor Cattrs handle unions like how I'd expect (although Cattrs has stronger guarantees in converting Unions)

  >>> class Y(BaseModel): pass
  >>> class X(BaseModel): pass
  >>> class Z(BaseModel): a: Union[X, Y]
  >>> Z(a=Y())
  Z(a=X()) # Converts Y to X implicitly
Cattrs has some problems with generics [1] [2]. Dacite and marshmallow-dataclasses don't support generics well either, with some issues around Union types.

They do work well for simple python types but what I'd like to see is guarantee that the serialisation operation is completely reversible and if not raise warning/exception.

[1] https://github.com/Tinche/cattrs/issues/149

[2] https://github.com/Tinche/cattrs/issues/44

You can tell pydantic not to mutate the field value in its definition when using Field

I absolutely hate the attrs “joke” names attr.ib and attr.s.

That’s it. I simply can’t stand them, they drive me crazy. When I first saw them I wondered what the fuck “ib” meant. And the thing is, I’d expect every decent python programmer to have the same reaction: . is syntax; it simply can’t be part of a name. The library is called “attrs”, so why is it imported as “attr”? I simply don’t know what the author thought he was doing.

I just don’t see how it is appropriate for a serious library to contain a joke that’s going to trip up literally every programmer with any taste. Maybe this is my problem, maybe one day I’ll relax and wonder why I was making such a big deal about it. But for the last several years it has made me think that the author completely lacks the judgement required to be a library author.

To add to my criticisms, imagine what the python ecosystem would look like if everyone thought they could have names that spanned attribute lookup syntax!

The `attrs` authors address it in the documentation:


At first, some people have a negative gut reaction to that; resembling the reactions to Python’s significant whitespace. And as with that, once one gets used to it, the readability and explicitness of that API prevails and delights.

For those who can’t swallow that API at all, attrs comes with serious business aliases: attr.attrs and attr.attrib.

Yes, I know. That doesn’t redeem it. I want never to have to read those names in python code I'm working with; the documentation entry doesn't achieve that.

And anyway, the library is called “attrs”, so why is it imported as “attr”?

The package can contain entirely arbitrary module names. Any similarity in naming is by convention only. Don't quote me on this, but I don't see why two different packages can't contain the same module name and overwrite each other.

If it is a problem,

  import attr as attrs
which is probably what I would do for consistency.

I want never to see the names attr.ib and attr.s in code that I am working on. So I don't understand how your suggestion solves the problem: it might not be me writing the code.

I never want to see Javascript code either, but being annoyed at Brendan Eich all the time is a bit futile.

>And anyway, the library is called “attrs”, so why is it imported as “attr”?

Python packaging is terrible, and giving package authors footguns like this. It's possible that it would be imported as "da39a3ee".

"attr" is the choice of the library designers. The question is why did they choose that name when users will expect it to be imported as "attrs"?

Clearly if they had chosen "da39a3ee" we wouldn't need to discuss whether they had good judgement.

Imagine we're in an elevator together and I said "I'm gonna fart now" then I passed gas. Would my saying so make it better?

Admitting to doing something obnoxious does not make it less obnoxious. If anything, it makes it more so. It says the person knew what they were doing was a problem and they wanted to share the problem with everyone.

We need more cute aliases!

    @att.rs  # because it refers to both AT&T and Rust
    @at.trs  # musicians will love it
    @a.ttrs  # because I never remember where the dot goes
    @ttrs    # because @ looks like an a @lready
    @trs     # @ is “at”! Get it?
    @t.rs    # I like both Perl and Rust

Haha, thanks! I think I'm warming to the idea now.

Why should I use either? I hope that’s not an obtuse question.

I used Pydantic for a JSON API project recently and found it extremely useful.

Say your API accepts the following JSON POST:

    POST /create-entry
        "author_id": 123,
        "title": "My new blog post",
        "body": "<p>...</p>",
        "tags": ["writing", "productivity"]
Your API needs some validation here: some of these fields may be required. The "author_id" field must provide the ID of a valid author (that the API caller has permission to create posts for). The "tags" field must be an array.

More importantly: if there is a validation error, you need to provide an error message explaining what was wrong. This can be quite tricky to do well, especially for complex nested JSON objects.

Pydantic solves this problem really well - including returning detailed error messages helping show what went wrong.

Quick look and from what I can see attrs/cattrs can't generate jsonschema => game over.

It would be better in the other way. Generate sane python models with it using existing jsonschema - similar to stuff like grpc, but without extra bits.

The existing generators are generally terrible.

I wish json-schema was nicer to write and if it was I would do that. However its way too unfriendly, so using something like PyDantic models as the "source of truth" is much easier, and at least lets other clients generate definitions and validate their inputs / outputs even if the generated schemas aren't perfect.

had generally good experience creating typed wrappers for api's with json-schema-to-pydantic[0] converter

[0] https://github.com/koxudaxi/datamodel-code-generator

That's what warlock does.

I'm just here to say I largely agree with the author.

I think a lot of this boils down to the functional approach that (c)attrs takes, vs Pydantic's OO. Because it's more functional, there's higher composability and more power given to the user.

On my team, we use cattrs and add our own customizations to it to great effect, and these simply would not be possible with Pydantic.

That said, there are reasons Pydantic seems to be ascendant despite attrs being the library that inspired dataclasses. It's very much a batteries-included library, which fits the ethos of Python.

From my understanding,

    size: float = None
is syntactic sugar for

    size: Optional[float]

Mypy allows that because initial versions of PEP-484 allowed that. This has changed; here's the current wording on the PEP:

> This is no longer the recommended behavior. Type checkers should move towards requiring the optional type to be made explicit.


It's bitter, not sweet. I make sure "implicit None" is disabled in Mypy and I think everyone else should do the same.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact