
Show HN: Pydantic – Data validation using Python 3.6 type hinting - scolvin
https://pydantic-docs.helpmanual.io/
======
sametmax
Marshmallow is still the best lib in town. Indeed, most of the libs fall short
when you start to use them in the real world. Where validation is more than a
type, when fields are dependants on each others, when data is generated on the
fly post validation and where you need all that to cascade down your nested,
sometime recursive, data structure validation, which then should produce
equally complex error messages.

A data validation framework is not a toy project.

~~~
zepolen
Having used both colander and marshmallow extensively - I prefer colander
mainly because it has first class _explicit_ handling of null, missing and
required values and it's support of nested and inheritance is also much nicer
than marshmallow.

~~~
Varriount
I've recently been using good[0] , which also allows for minor data
transformation. Looking at Marshmallow, it doesn't seem like it allows inline
declaration of nested schemas - each level of the schema needs its own data
type.

[0] [https://github.com/kolypto/py-good](https://github.com/kolypto/py-good)

------
svisser
For those looking to validate dictionaries / JSON responses in Python, the
voluptuous library works quite well:
[http://github.com/alecthomas/voluptuous](http://github.com/alecthomas/voluptuous).
It also works for lists and other data types.

~~~
user5994461
JSON has a well defined schema system that works across languages.
[http://json-schema.org/](http://json-schema.org/)

Python comes with the jsonschema module out of the box.

~~~
StavrosK
That's rather verbose, though. I prefer the much more functional "schema"
library:

[https://github.com/keleshev/schema](https://github.com/keleshev/schema)

~~~
ris
Thing is, none of these are language agnostic. jsonschema is the only one that
makes an attempt at that.

It's a slight pity that the python jsonschema package doesn't support some of
the more powerful recent features though.

------
vog
Recently I had a similar wrote a library that creates (and dumps) your typed
NamedTuples, datetimes and similar objects from simple JSON, using type
annotations:

"JSON support for named tuples, datetime and other objects, preventing
ambiguity via type annotations"

[https://github.com/m-click/jsontyping](https://github.com/m-click/jsontyping)

If you are interested, please have a look at the first unit tests to see how
it works:

[https://github.com/m-click/jsontyping/blob/master/tests/test...](https://github.com/m-click/jsontyping/blob/master/tests/test_jsontyping.py)

Note that the tests currently use the "ugly" NamedTuple syntax to be
compatible with Python 3.5 and 2.7.

------
thristian
I've recently been using attrs as an easy way to make simple datatypes, but
its only gesture towards validation is an arbitrary callback per field.
Hooking into Python 3 type annotations is a great idea!

Does/will Pydantic handle all the standard dunder fields like __eq__, __lt__,
__hash__, __cmp__ and faux-immutability like namedtuple and attrs do?

~~~
noisy_boy
Do you have a writeup/blog post on your approach? I would like to read more
about it.

~~~
thristian
In the small, it's just using attrs the way it's described in its
documentation.

In the large, I've been learning about Rust recently and wrapping my head
around the design-patterns of static typing. For internal data-structures the
benefit is not as clear, but for serialising and deserialising external data
(like from config files or JSON APIs) I really prefer having specific, named
types instead of a generic bucket of dicts.

API documentation can be more concise. You can say "this argument must be an
instance of BuildArtifact" rather than "this argument must be a dict with an
'href' key whose value is the URL to a build artifact and a 'hash' key whose
value is the SHA256 of that artifact" in every relevant API.

Debugging is easier when inspecting a variable starts with "<BuildArtifact
...>" rather than just dumping a dict at you.

If you need to operate on a particular kind of data, a named class gives you
an obvious place to hang a method, instead of having a loose function rattling
about. For operations between two data-types (like 'merge' or 'intersection'),
a loose function might still be the most appropriate, but operations like
searching or summarizing are naturally methods.

------
cidnurh
Interesting project! I'm collecting different ORM/ODM/Mappings for python in
this repository: [https://github.com/grundic/awesome-python-
models](https://github.com/grundic/awesome-python-models). Added your library.
Thanks!

------
juni0r
I usually use PyComb

~~~
mkesper
Without telling why that post is rather meaningless.

