

Schema for Clojure(Script) Data Shape Declaration and Validation - sorenmacbeth
http://blog.getprismatic.com/blog/2013/9/4/schema-for-clojurescript-data-shape-declaration-and-validation

======
jamii
I've been working on a similar idea for replacing types. If you can specify
how to take apart a type, name the parts and put them back together you can
then get validators, pattern matching, recursive-descent parsing with
backtracking, generic traversals, lenses and generators more or less for free.
By using simple data-structures to describe the types and compiling them to
clojure at the call site you can have first-class types without being
penalised in performance.

It's still work in progress but there are working examples as of
[https://github.com/jamii/strucjure/commit/e0e56a25c1b880c382...](https://github.com/jamii/strucjure/commit/e0e56a25c1b880c38259f2bf59768afc7350fa9c)

    
    
        (using strucjure.sugar
          ;; define a pattern
          (def peano-pattern
            (graph num ~(or ~succ ~zero)
                   succ (succ ~num)
                   zero zero))
          (comment ;; desugars to 
            {'num (->Or (->Node 'succ) (->Node 'zero))
             'succ (list 'succ (->Node 'num))
             'zero 'zero})
    
          ;; define a view over that pattern
          (def peano->int
            (view peano-graph {'succ (fnk [num] (inc num))
                               'zero (fnk [] 0)}))
    
          (peano->int 'zero) ;; => 0
          (peano->int '(succ (succ zero))) ;; => 2
          (peano->int '(succ (succ succ))) ;; => throws MatchFailure
        )
                               

It's similar to the old ideas at [http://scattered-
thoughts.net/blog/2012/12/04/strucjure-moti...](http://scattered-
thoughts.net/blog/2012/12/04/strucjure-motivation/) but significantly simpler.
I'm hoping to be able to release at least the core functionality in a few
weeks.

~~~
adamgravitis
Very cool. Also check out the implementation of encapsulation in John Shutt's
Kernel Lisp:
[http://web.cs.wpi.edu/~jshutt/kernel.html](http://web.cs.wpi.edu/~jshutt/kernel.html)

~~~
jamii
Interesting. I've seen his fexpr work before on LTU but never really looked at
it closely. I suppose the only way to go is to read the whole thesis?

------
danneu
The opening paragraphs really resonated with me including that code example
that can be replaced with any function I wrote yesterday.

A lot of my time in Clojure is spent re-reading and repl-evaluating my
function implementations just to remind myself what its symbols look like.
Even code I wrote an hour ago. Often I stoop to the level of caching my
findings in a docstring/comment.

    
    
        (defn alpha-map-sum-combining-thing
          "TODO: Think of better name.
           Example input: {:a 1 :b 2}
           Example output: {[:a :b] 3}"
          [alpha-map]
          ...)
    

Sometimes I can spot abstractions and patterns and protocols that unify a
namespace of functions so that their inputs/outputs are obvious, but often I
can't.

This kind of tool is essential for me.

~~~
lkrubner
I do this a lot:

"Often I stoop to the level of caching my findings in a docstring/comment."

I am not ashamed of this. When my co-workers have to read what I wrote, this
kind of information makes clear what the function does. If I don't do this,
then they will eventually figure it out by running it at the REPL and
examining the input and the output, but if I document that in the docstring, I
have saved them a lot of time. I also make it possible for them to actually
read the code -- otherwise they can't really read it, expect by repeating it
at the REPL.

~~~
danneu
Yeah, you're right. I would love if every Clojure function came with an
example input and output.

What I meant to suggest is that my docstring example is something that could
be expressed in code.

~~~
freshhawk
A friend of mine has been doing similar things with the Swagger documentation
spec
([https://developers.helloreverb.com/swagger/](https://developers.helloreverb.com/swagger/))

While browsing the docs in a web browser you can fill out some fields for
input and see what the output for that input is.

My friend forked it and extracts and lists any example data so you can click
to load that example into those fields (my idea!). I tried it out and it's a
very nice way to learn/explore a new API, especially testing edge cases where
the docs are ambiguous.

It really hits a sweet spot when you have documentation browsing, live
execution _and_ example data. I think creating a similar interface combining
docstrings, the repl and example data from test fixtures would be a nice tool
to have (I find doctests to be one of those things that are better in theory
than in practice as they get too long and maintenance is annoying, although
perhaps a more literate style in your source code would change that).

------
ambrosebs
core.typed author here.

It might be easy to consider Schema as "competitor" to core.typed. In fact,
it's the opposite: once they play nicely together they will form a formidable
bug-fighting, finely-documenting team. :)

core.typed has accurate compile time checking, and Schema gives an expressive
contracts interface for runtime checking.

Once they understand each other, you can start pushing and pulling between
static and dynamic checking by using both libraries to their strengths.

Currently, core.typed requires all vars have top level annotations. This is
partly because there is no way to recover type information once inside a
function body. However, if we have an entire namespace using Schema liberally,
we can use schemas to recover information!

This means we can lean on schemas for most annotations, and rely on core.typed
to catch arity mismatches, bad arguments to clojure.core functions, potential
null-pointer exceptions and many more nasties at compile time.

Then you might start adding static annotations or removing schemas, depending
on the kind of code your dealing with. You might do some "static" debugging to
ask whether a schema is needed to prevent a type error. core.typed would also
let you know when your contracts are insufficient to rule out type errors.
Really, you're free to use both tools as you'd like.

Schema looks very nice, thanks for open sourcing it Prismatic folks!

~~~
w01fe
I've been really pumped about core.typed since I first heard about it, and I'm
excited about the possibilities of combining (or eventually replacing) schema
with core.typed.

I think core.typed has come a long way since last time I looked at it (when we
started developing schema). Obviously it's a much bigger and more serious
effort, and schema was about getting as much bang _right away_ as we could for
a few bucks.

I'd love to talk in more detail about how we can collaborate, or at least make
sure we play nice together.

~~~
ambrosebs
Fantastic, bring it on!

~~~
AlexBaranosky
I wonder if there's some way to combine schema, core.typed and clj-schema in
some way that is better than the sum of its parts. ([https://github.com/runa-
dev/clj-schema](https://github.com/runa-dev/clj-schema))

I regrettably haven't had much time to see where the cross-section between
these libraries is exactly, being very occupied at work. But I'd rather work
on one grand thing, than end up with disparate libraries that aren't as fully-
featured.

------
aria
Author here. Happy to answer any questions or comments.

~~~
jwr
This is a very good idea. And the reason I say that is because in our code
base we have developed something similar, feeling the need for it :-). Our
stuff is half-baked, though and I never got around to releasing it.

What I'd like to see is something similar to what we did — checking more than
just types. Here's an example from our code:

(conforms example-object [map :from string? :to [map :required-keys
{"products" [seq string?] "type" [seq #{"type1" "type2"}]}]])

This describes a map containing mappings from strings to maps of a certain
kind (where "products" and "type" are required, "products" must map to a
sequence of strings and "type" must map to a sequence of strings from a
specific set).

From our "conforms" docstring:

Returns true if value conforms to typespec. typespec can be either: \- a
string, which value must match literally; \- a predicate, which value must
satisfy; \- [seq sub-typespec], which means that value must be a collection of
elements, each of which must conform to sub-typespec; \- [map specifiers...],
where optional specifiers may include: :from sub-typespec -- keys in the map
must match sub-typespec; :to sub-typespec -- likewise for values; :required-
keys [key1 typespec1 key2 typespec2...] -- map must include the required keys
and their values must match the typespecs.

We don't use that for function contracts and it might be overkill for that,
but I'm pretty certain I'd like to be able to specify more than just types for
map keys. A list of possible values would be tremendously useful.

This could provide you with some ideas. I can of course contribute the
"conforms" function (although it isn't that difficult to write).

~~~
w01fe
Thanks for the feedback!

Actually, schema can express arbitrary constraints. Your example translates to
schema as:

    
    
      {String {(s/required-key "product") [String]
               (s/required-key "type") (s/enum "type1" "type2")
                s/Any s/Any}} ;; allow any other k-v pairs

~~~
jwr
Interesting! Is the enum automatically a vector/sequence, or is that just an
omission in the example?

Looks like I'll be using this library sooner than I thought, thanks!

~~~
w01fe
Nope, I missed that in the example -- the enum should be in square brackets.

------
olenhad
This is really brilliant, and damned useful. Reminds me of how type
declarations provide self documenting code in Haskell. Thanks alot!

~~~
aria
Definitely an inspiration. The core utility of Schema is that the declarations
are still data and can be utilized in code. I think we'll have some
interesting applications of this idea soon...

------
dsabanin
Great stuff, thanks for releasing this. Definitely going to use it soon.

~~~
aria
Thanks so much. We spent a long time thinking about how to have the benefits
of types in a Clojure-y data-oriented way. The advantage of having Schemas as
data is that we can use Clojure code to process them and generate things like
Objective-C classes
([https://github.com/Prismatic/schema/blob/d82bf0b049fc1205a81...](https://github.com/Prismatic/schema/blob/d82bf0b049fc1205a81a79d88a84872bb9f9846b/src/clj/client/objc.clj))
and Avro specifications. It will be our glue to declarations in other
languages.

------
JackMorgan
What timing! I have been making this exact thing myself.

Would you be interested in pull requests making your api simpler? Maybe allow
parameters to not have to have shapes? Perhaps allowing a syntax that allows a
simple way to shapes in the meta alongside the ability to change the signature
for people who don't want to couple too tightly to the library? When I showed
mine to my local user group last month, that was one of their biggest requests
(that I was in the middle of on mine but would be happy to attempt in yours).

Check my (simpler and more immature) library here
[https://github.com/steveshogren/deft](https://github.com/steveshogren/deft)

~~~
JackMorgan
Also, what about having your defn macro generate a second function like foo-t
that just calls with-fn-validation on foo, to clean up that extra call? Would
you be interested in such a pull request?

~~~
w01fe
I'd have to think about that a bit -- I'm hesitant to create names not
provided by the user, but maybe there is another way.

Quite honestly that part of things (turning validation on and off, etc) isn't
really done yet -- we have plans to make it much more flexible and powerful,
but haven't got there yet.

------
RichMorin
Interesting work. BTW, there's a typo: "shee here" should be "see here".

~~~
adambard
I assumed it was meant to be read in Sean Connery's voice.

------
dustingetz
I feel your argument against just using Scala was weak, but you guys are
obviously smart and feel building Schema was the right choice. So could you
elaborate a bit on the decision to stick with Clojure when your problem domain
lends itself to types?

~~~
marknutter
My guess would be they want to continue using a Lisp-style language.

------
miner
Very interesting. I have a somewhat related project called Herbert, which
attempts to define a schema language for EDN.

[https://github.com/miner/herbert](https://github.com/miner/herbert)

------
mrcactu5
Newbie here, mostly familiar with OOP.

What is the difference between how Clojure and other functional programming
languages declare types.

~~~
tel
Seems like the major distinctions I see here compared to say Haskell (and
Scala to the degree that Scala is equivalent to Hs) is that Schema are dynamic
and first-class.

Dynamic means that their usefulness is tied somewhat to your ability to
exercise code paths. Compare this to Hs's static types where the type logic of
your entire program (and all dependent libraries) is checked upfront before
compilation. Schemata likely must be triggered by a validation function being
called on live code. Endless further argumentation about this distinction goes
here.

First-class comes from Schema's dynamic nature as well but is worth further
investigation. Schema look like they can be arbitrary functions of the
arguments, much like inserting `assert`s at the beginning of a function and
then flipping those on or off at a later time. They can also be
composed/decomposed/analyzed as Clojure values. This vastly increases the
flexibility and complexity of Schema for better and worse. You can express
much more sophisticated invariants in your Schema than you can in Haskell
types. It looks like it's even possible for these invariants to be value-
based—a concept which, in static typing land, is deep into research territory.

I'd say these contract-like invariant checkers are in a pretty different boat
from static types. They check different classes of things at different times
and make vastly different promises. What they both provide however, so long as
your Schema don't get too complex, is some wonderful "living" documentation.

------
jared314
This looks like it occupies the same space as core.contracts and core.typed.
Is the main benefit, over those two, cljs support?

~~~
w01fe
I think there are a few benefits.

First and foremost, schemas are _simple_ , minimal, easy to read and write,
and gracefully extend Clojure's existing type hints. This means that (in my
biased opinion) they are significantly better for documentation, which was the
primary motivation for developing them.

Second, schemas are data, so it's easy to do more with them beyond
documentation. Runtime data validation is one such use, but we can also easily
do things like generate core.typed annotations, generate model classes for
clients, generate test data, and so on.

~~~
ambrosebs
Author of core.typed here.

I will challenge that schemas are "significantly" better for _documentation_
than core.typed types.

With unions, intersections, heterogeneous maps, parameterised classes,
recursive types we can be _very_ expressive.

Here's some examples of the syntax for types:
[https://github.com/clojure/core.typed/wiki/Types](https://github.com/clojure/core.typed/wiki/Types)

And some declarative types in action:
[https://github.com/frenchy64/core.typed-
example/blob/master/...](https://github.com/frenchy64/core.typed-
example/blob/master/src/fire/simulate.clj#L42)

I think core.typed is restrictive enough to allow annotations like the one
above, while being opinionated enough to guide the programmer to write clear,
comprehensible types.

~~~
w01fe
Sorry, that didn't come across the way I wanted -- it certainly wasn't meant
as a dig on the expressiveness or power of core.typed, which is a project I'm
really excited about.

The main driver for Schema was to make annotating function inputs and outputs
as simple and readable as possible. Personally, I find annotations directly on
the function arguments easier to parse than separate function type
declarations, but I suppose that's a matter of preference.

~~~
ambrosebs
No problem, just was eager to clarify :)

FWIW there are macros like fn> that allow you to write (fn> :- Number [a :-
Symbol] 1), but you lose the ability to write ordered function types with
multiple cases, so I don't use it very much.

------
krosaen
pretty cool - I've used something similar for python JSON validation:

[https://code.google.com/p/jsonvalidator/](https://code.google.com/p/jsonvalidator/)

but love the idea of something for validating nested data structures as well.

~~~
naiquevin
I had come across another schema validation lib for Python on Github a while
back (but never tried/used it):

[https://github.com/halst/schema](https://github.com/halst/schema)

------
rsamvit
This is excellent. I used to occasionally use assertions to accomplish the
same thing.

