Hacker News new | past | comments | ask | show | jobs | submit login
Pydantic (github.com/samuelcolvin)
106 points by marinesebastian on Jan 22, 2022 | hide | past | favorite | 35 comments



Past related threads:

PEP 563, PEP 649 and the future of pydantic and FastAPI - https://news.ycombinator.com/item?id=26826158 - April 2021 (150 comments)

Show HN: Pydantic – Data validation using Python 3.6 type hinting - https://news.ycombinator.com/item?id=14477222 - June 2017 (27 comments)


Using Pydantic, FastAPI and OpenAPI generator tool was a decision which increased my speed of writing code by a lot. Plus it didn't hurt, that FastAPI has an excellent documentation (highly recommend the part that describes using FastAPI with SQLAlchemy).


Love FastAPI for lot of concepts that it helped to make mainstream, such as typing-based-validation, but its documentation is lacking.

I love the tutorials, but an API reference is just as important; I may not want to check out long tutorials about things I already read, but check a detailed description of each function would be super helpful - other frameworks such as Sanic have it.

I think it's in the roadmap, kudos for that, and hope that it will get a few more maintainers to speed up the process


OpenAPI generation is cool, but using the OpenAPI as the specification of even better. ;)


Looks great.

I'm hesitant, however, since marshmallow-sqlalchemy provides full integration with your SQLAlchemy models, but pydantic-sqlalchemy only is for generating Pydantic models based on SQLAlchemy models, and it seems as if it's still experimental (Why does it have more stars? :thinking:)

Otherwise, just between pydantic and marshmallow for straight up validations, it seems Pydantic is more legible and easier to use at first sight.

Will switch if pydantic had full integration with SQLAlchemy.

And for those of you looking down on me for using ORMs (yes, i know some of you exist), I use both raw SQL and SQLAlchemy.

I find it multitudes easier to build models and deal with migrations in SQLAlchemy than writing scripts.


The creator of FastAPI and pydantic-sqlalchemy has recently released a new library: SQLModel. https://sqlmodel.tiangolo.com

It is a thin layer on top of Pydantic and SQLAlchemy. I haven't used it yet, so can't speak out of experience, but I think it is basically exactly what you describe.


looks promising.

At first sight it seems like you still have to write a "schema" for the SQLModel based on your SQLALchemy model - so basically, two sources of truth.

If you edit your SQLAlchemy model, you'll also have to edit your SQLModel.

sqlalchemy-marshmallow allows you to build your schema based on your SQLAlchemy model.

Other than that, I'm still somewhat intrigued.

Thanks for the suggestion.


I think SQLModel actually exists precisely so that you don't have to do that. According to [1]:

> That class Hero is a SQLModel model... But at the same time, it is a SQLAlchemy model... And at the same time, it is also a Pydantic model

[1] https://sqlmodel.tiangolo.com/#sqlalchemy-and-pydantic


Ah, I must've missed something in the first lookaround.

looks good.


> so basically, two sources of truth.

As I understood you don't, as SQLModel inherits from sqlalchemy ORM base classes. From the user guide, this is an example how to define the model and generate the table.

  class Hero(SQLModel, table=True):
      id: Optional[int] = Field(default=None, primary_key=True)
      name: str
      secret_name: str
      age: Optional[int] = None

  engine = create_engine(sqlite_url, echo=True)

  SQLModel.metadata.create_all(engine)


Yeah, that was my bad. Went by a bit too quickly.

looks good.


Now we just need an integrated admin panel and templating engine and we'll have reinvented Django.


The templating part seems to be somewhat taken care of already: https://fastapi.tiangolo.com/advanced/templates/


We use this extensively as probably half of the Python programmers on HN do. It's great, but there are plenty of alternatives out there as well, especially for performance. Our specific use case is IO limited so it's perfect for that.


Care to list some? I disabled it on a few endpoints of our backend because the performance was awful for GeoJSON.


I've found that using construct() to skip validation is great when a model has e.g. a `List[float]` attribute with a length of 10k+ items. But this has to be done judiciously.

https://pydantic-docs.helpmanual.io/usage/models/#creating-m...

In the case of GeoJSON it might be worth using a custom data type with specific faster validation.

https://pydantic-docs.helpmanual.io/usage/types/#custom-data...


https://www.attrs.org/en/stable/ would be the main alternative


Are those alternatives faster according to benchmark?

In my experience, the bottleneck is either: - JSON parsing and dumping; the solution for me is ORJSON, fantastic wrapper to use fast JSON serialisation for most common fields, and also datetime. - Validation - if you choose to validate your data, pydantic can indeed be slow... But it's not Pydantic the problem, but the validation that you apply to your data.


> But it's not Pydantic the problem, but the validation that you apply to your data.

Indeed, this kind of validation is usually based on 'isinstance', which is really slow in Python, because you often need to call it many times. More than once, I doubled of tripled the throughput of some data pipelines (not microbenchmarks) just by replacing 'isinstance' calls with something else.

When you really need something like isinstance, type equality sometimes works and is much faster. For example, this works as a replacement for attrs-strict's type checking on a limited subset of types (non-generic classes, Any, Optional, Tuple, and Union): https://archive.softwareheritage.org/swh:1:cnt:7f4f1ea32eace... The downside is that you can't use subclasses of the specified types.


Marshmallow is another if you’re talking schemas, validation, and json serialization.


Fantastic project, but if you're looking for speed this isn't it.


Chances are, if you've chosen Python for a new project today then you're willing to trade a lot of speed for developer convenience already.


Take a look at https://github.com/Attumm/Maat

Maat is much faster then pydantic according to the benchmark of pydantic.


Looks atrocious from an ergonomics perspective. Who likes defining everything with python dictionaries over classes or type hints?


Its onlt recently that classes are embraced by python community. Before that dictionaries where seen as pythonic and classes where not.

https://youtu.be/o9pEzgHorH0


People who want stuff to go fast?


Ok, you go do that then


Why do you make such a blanket statement like that? According to the benchmarks page, it’s faster than its alternatives https://pydantic-docs.helpmanual.io/benchmarks/


The only reason it's rhe fastest is because they won't allow for other tools.

https://github.com/samuelcolvin/pydantic/discussions/3094

Maat readme also shares that benchmark. Its way faster.


I don’t think they’ll consider your framework at this point because it doesn’t have enough mindshare. I don’t think they’re being unfair or blacklisting you. Also, your PR was very forward and assumptive IMO, “can I get an issue number” like you’re entitled to one. Get some more users (stars being a proxy for that), and I’m sure they’ll consider you if you ask with a little more humility.

“Way faster” is a bit hyperbolic isn’t it? 2.5 times faster? That’s no order of magnitude. What validation use cases are there where it makes a difference? Very few.

Here’s the thing: if people really want fast validation and transformation, they’re probably not going to use python. People use python for its developer ergonomics and experience. Dicts as a configuration DSL are inferior to classes and type hints for very simple reasons

1. One bad thing about python is that both single and double quotes are acceptable. Dicts are built with strings, so there’s an anxiety about having to standardize in one over the other that adds to cognitive overhead.

2. There are dict literals that get rid of one string, but again, just another “choice” people don’t care to make if they don’t have to

3. Curly brackets aren’t the easiest things to type relative to other characters.

4. Curly brackets are extremely difficult to pair if you don’t have an editor that does it automatically or if your code formatter puts many brackets on the same line.

That’s pretty much it right there.


There is no need for hostility. My comment was in your response of the benchmark that pydantic is fastest.

In their own documentation they invite all other framework to send their results.

That said what pydantic team does is up to them, the Maat was made before pydantic was a option, it has filled that usecase. The benchmark was only added because of a internal discussion about which tool to use.

Engineering is about tradeoffs and each project will have own technical problems. Therefor there will never be the best solution, only the best solution for a particular problem within that context.


What do you look to for a more balanced speed / feature set?


A different language, most likely. I'd use Python for this if the work being done isn't too CPU-intensive and I can afford to make big tradeoffs.

If you want to do data validation like this but for something with better performance while still retaining the benefits of a high-level GC'd language, then I'd try something like https://github.com/go-playground/validator for Go.


[flagged]


HN Guidelines state, among other things worth noting [1]:

> Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. They're too common to be interesting.

[1] https://news.ycombinator.com/newsguidelines.html


Dear Starlevel,

Sorry if I came across as unhelpful, my intent was actually the opposite - to inform and hopefully educate. I don't like wasting what time I have left on this earth bothering people with unwelcome or unappreciated feedback (though I accept this is always a risk when engaging, especially via Internet forums where tone is not discernable).

Your argument would be strengthened if you added more explanation or insight into why projects using your disliked, cursed documentation format definitively suck :)

Feedback is always welcome, I'm always looking to improve, learn how to be more persuasive, and avoid miscommunication.

Sincerely, metadat@hn




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: