Hacker News new | past | comments | ask | show | jobs | submit login
What Clojure Spec is and what you can do with it (pixelated-noise.com)
263 points by icey 9 months ago | hide | past | favorite | 63 comments

I think the instrumenting and generator stuff gets disproportionate attention. For me by far the biggest win from spec has been with parsing. This completely changes how you'd write a library that takes a data structure and parses it into something meaningful (for example, what hiccup does for html or what honeysql does for sql).

In the past, this required a lot of very ugly parsing code and manual error-checking. With spec, you write specs and call s/conform. If it failed, you get a nice error, especially if you pair it with expound. If it succeeded, you get a destructured value that is really easy to pull data out of. I've done this in a half dozen different libraries and i'm pretty sure i wouldn't have even written them without spec.

Totally agree with this.

I started playing with spec because of the idea of automated test generation, but the reality of it is that I use it as a super-charged validation library.

I think this emphasis actually does the library a disservice in that I see new users ask questions along the lines of "Should I use s/valid? to manually check inputs to my API"? The answer to that, in my usage, is "Yes! Of course!", but many people seem to think that they are using Spec wrong if they use it for something other than instrumentation and generation.

I remember doing just that - writing some ugly parsing code, thinking that I should be a good team member and add some specs for what I was doing, and when I tried calling conform... Oh, it did the parsing for me!

Yup, surprised the article doesn't mention parsing.

When writing complicated macros, Spec conforming is so useful!

and if you then combine s/conform with core.match, you can build really elegant code to traverse these structures.

Why not just pattern match immediately if you are going to bother with using core.match?

How would you pattern match with Clojure without using core.match? Do you mean using another library?

Edit: oh I understand what you mean, you were thinking of skipping the s/conform part, and use core.match directly. Personally, I consider spec an excellent library to describe the shape of data, and core.match allows for better describing the actions you want to take based upon that.

For example, with spec you can define a bunch of alternatives using s/or, and then use core.match to then easily traverse the results.

It’s more a matter of separation of concerns to me. I don’t use core.match for validation or describing shapes of data.

Do you have to adapt spec definition to core.match somehow? Is there a resource to read more how to use them together?

No you don't have to do anything.

Let's say I have something like

  (s/def :thespec (s/or :foo ::foo 
                        :bar ::bar))
I can then use it in conjunction with core.match like

  (match [(s/conform :thespec val)]
    [:foo x] (do-something-with x)
    [:bar y] (do-something-else-with y))
Which imho is an easier way to navigate these things.

can you "stream" conform ? for UI live input

so that "john mcallister" yields { :name "john" :lastname "mcallister" } but "john " would yield { :name "john" :lastname nil } (or :todo even)

You could call it on every input change, though I'd say Spec conforming is not the most performant parsing library in the world, so not sure if it be fast enough for running it on each input.

yeah that's why I was looking for a lazy / stream variant in case it wasn't already possible

/me goes to prolog

This is why languages like Ocaml are really nice to work with for this stuff.

Does OCaml have a good parsing DSL build in?

Yes, it's called Ocaml ;)

Could you elaborate on your point?

Ocaml's strong static typing with type inference and pattern-matching (with exhaustivity checking) suite this kind of work quite well, there's plenty of literature around for it.

As a very heavy user of spec, I’ve since switched to using Malli [0], which is similar but uses plain data to model specs (and doesn’t use Clojure Spec at all).

Also, Malli is being funded / supported by Clojurists Together [1], which is a wonderful initiative that’s also worth a look.

[0]: https://github.com/metosin/malli

[1]: https://www.clojuriststogether.org/news/q3-2020-funding-anno...

I found spec very useful and use it more and more. I'm looking forward to newer revisions, with support for optionality, it's been a big problem area in my case.

Here's a quick list of gotchas (well, they got me, so perhaps other people will find this list useful):

* `s/valid?` does not actually tell you that the data is valid

The naming of `s/valid?` suggests that you can call it on your data and find out if the data is valid according to the spec. This isn't true. What it actually tells you is if the data, when conformed, will be valid according to the spec. If you pass the data as-is to your functions (without passing it through `s/conform`), you might find that they will be surprised at what they get.

* Conformers are likely not what you think. They are not intended for coercion and have many pitfalls (for example, multi-specs dispatch on unconformed value).

* s/merge doesn't necessarily do what you wanted if you're using conformers, only the last spec passed to merge will be used for s/conform (but you're not using conformers, right?)

* specs are checked eagerly. If you think that (s/valid? ::my-spec x) will only check ::my-spec, that is not the case. It will check any key of x found in the spec registry.

I settled on a subset of spec, because of the pitfalls.

I think all of those problems are only true when you try to do coercion with custom conformers. Which is not the intended use of conformers.

Conformers are meant to parse the data when there are multiple possibility of what something can validate against, the conformer will disambiguate and return a result that tells you which path was chosen.

Coercion is not supported as part of Spec, you're expected to do that seperatly either before or after validating/conforming.

Yes, that was largely my point. Many people (me included) assume that s/conform is a kind of coercion, which it is not.

Not all of the above problems are due to coercion, but the majority are.

To be clear: I'm not complaining here, I find spec to be very useful and I like it, just pointing out traps for the unwary.

Ya fair enough, I've definitely seen a lot of people think conforming is meant for coercion. But it's not, it's only meant for disambiguating the chosen path (when multiple are possible) for validation.

There are a few libraries that make it possible without abusing conformers, one of which is https://github.com/exoscale/coax

Looks nice, I knew of spec-coerce, but not coax.

Can you elaborate with an example for s/valid? on needing to preconfrom your data. I have not had this issue.

Also note for others with regard to eagerness, this is only for maps. "When conformance is checked on a map, it does two things - checking that the required attributes are included, and checking that every registered key has a conforming value. We’ll see later where optional attributes can be useful. Also note that ALL attributes are checked via keys, not just those listed in the :req and :opt keys. Thus a bare (s/keys) is valid and will check all attributes of a map without checking which keys are required or optional."

Can you explain the point of :opt with s/keys if it will always check any registered spec in if present?

> Can you elaborate with an example for s/valid? on needing to preconfrom your data. I have not had this issue.

If you have any conformers, s/valid? will use them before validating.

So, if you have a 'set' conformer, for example, s/valid? will tell you that the data is valid even if the value is not a set, but a vector, for example.

Your code must explicitly call 'conform', checking with s/valid? is not enough.

Found the answer to s/keys and :opt "The :opt keys serve as documentation and may be used by the generator."

The important aspect to keep in mind is that it makes the key optional, not the value of the key.

Thus if you have a map without the key, the value won't be validated. But if the key is present, then it will validate its value.

If you want to make the value of the key optional, in the spec for the value of the key, you need to add `nil?` as a valid value.

> Thus if you have a map without the key, the value won't be validated. But if the key is present, then it will validate its value.

Exactly. Which is something I did not expect. I expected '(s/valid? ::my-spec x)' to tell me if x is valid according to ::my-spec, checking only those keys that ::my-spec lists in :req and :opt (if present).

For maps, you might as well think of s/valid? as ignoring the first parameter. It validates anything it can.

Oh, you mean because it will validate even qualified keys that arn't defined on req and opt if they have a corresponding spec?

I admit, that's surprising. Not sure why they made it so.

> with support for optionality

You should be careful asking for this.

Both Protobufs and Cap'n Proto eventually decided that "optional" creates more grief that it saves.

It is also a bit of a religious flamewar.

Caution is advised.

That is exactly my point. Optionality is complex and causes a lot of bugs in applications (certainly in mine, I would say it's the #1 root cause of bugs).

Which is why I'd like spec to help me with managing it. It's not obvious, because whether certain data is required or optional depends on context. But from what I've heard, bright minds at Cognitect are thinking about it, and given their track record so far, I'm pretty confident I will like the solution.

They went with null?

We use spec A LOT to validate a huge config file/DSL thing for our internal ETL application.

My general feelings are:

- spec is great, you should use it, the composeability and flexibility are awesome features.

- I've never once used conformers - maybe I just don't "get it" (which if so, speaks badly of them I think since I've been heavily using spec for years), but the use cases for them seem strange to me and I feel they cause more confusion than they're worth. I wish they were separated out into more separate/optional functionality.

- It's SO MUCH more powerful than things like JSON schema, but that comes at the cost of portability - there's no way we could send our spec over the wire and have someone else in a different environment use it. But also, there's no way we could implement some of our features in a tool like JSON Schema [and have it be portable] ("Is the graph represented by this data-structure free of cycles?" "When parsed by the spark SQL query parser, is this string a syntactically valid SQL query?").

- Being able to spec the whole data input up front has saved hundreds of lines of error-checking code and allows us to give much better errors up-front to our users and devs

- Spec has a lot of really cool features for generative testing, but we rarely use them since we've implemented lots of complex specs where it's not really practical to implement a generator (i.e. "strings which are valid sql queries" or "maps of maps of maps which when loaded in a particular way meet all the other requirements and are also valid DAGs"). I feel torn about this because the test features are great, but the extreme extensibility of spec is what I love most about it. I haven't often found a scenario where I actually have a use for the generative features (either the data is so simple I don't need them, or so complex that they don't work).

At a previous job I streamed Clojure over a trusted wire and executed it remotely in some circumstances. We weren't using Spec (Prismatic Schema at the time), but it worked pretty well and we even validated our schemas in javascript via cljs-wrapped library. I don't see why this wouldn't be possible with Spec, although you're a bit limited by code that will execute everywhere.

> I've never once used conformers - maybe I just don't "get it" (which if so, speaks badly of them I think since I've been heavily using spec for years), but the use cases for them seem strange to me and I feel they cause more confusion than they're worth. I wish they were separated out into more separate/optional functionality

That's because you don't Spec your functions and macros.

A lot of people have only used Spec to validate data that enters and leaves the boundary of their application. Which is a great use of Spec, and I use Spec mostly for that as well.

But there is a whole other world where Spec was designed to validate your functions and macros as well.

That's where conformers make sense.

For macros, you can use conformers to help you with writing a macro, by using Spec to define a DSL and conform to parse it out for you. It both validates the macro DSL and makes it easier for you to parse it.

For functions, conform can be useful to assert the output is what you expect for some given input. Often times, the output might depend on what kind of input you got. Conform basically tells you the kind of input it was, so in your validation you can validate differently based on each kind conform tells you it received.

Your feeling on there being no way to transport this over the wire is puzzling to me but I admit I don't have all the details. My feeling is why not? Surely if we have wire formats for self describing binary objects that can then be serialized into an in memory structure, transporting a spec shouldn't be harder than that?

Not that there's _no_ way to transport it over the wire, it just requires the full environment (a JVM on the other end - because we have java-specific stuff like spark calls). I'd put it at an order-of-magnitude more complex than something declarative like a JSON schema which is pretty safe to execute anywhere.

I don't think this is a really big failing of spec - I don't know of ANY validation tools that don't have to compromise between power/extensibility/ease-and-safety-of-execution-somewhere-else. Maybe if you implemented some kind of uber-validator in purely functional prolog or something?

Specs in the general case require code execution, so you'd essentially need to execute that (untrusted) clojure on the other end of the wire.

Again, apologies if this sounds ignorant, but we have pretty standard practices now for sandbox execution of untrusted code. A LISP seems especially suitable for this type of task.

I don't have a clue how you would implement this. The difference in portability between spec and something like json-schema/protobuf/avro is that you can serialize the schema in these and then clojure and (say) python, go, java, C#, JavaScript applications can talk to one an other.

How would you propose to serialize clojure spec's and use them from a python app? Port the clojure compiler to python?

I second this emotion - "How would you check a spec from anything other than clj/cljs" is (IMO) the critical question here. Sure you could check out my git repo in a safe VM and execute it there, but that's a WHOLE lot more hassle than an XML or JSON schema. It's not just a language barrier thing.

There's nothing stopping spec predicates from making network calls, looping forever, etc. If I wanted to be able to call my spec from other apps I'm writing, I could package it as a library easily, but a workflow like rest call->get spec->validate data (which I've implemented many times for JSON schema for simpler things) wouldn't really be practical with spec (without at least setting some really tight restrictions on what features of spec you're allowed to use)

Again, not really a failing of spec, it's just not designed for that kind of workflow.

Yeah spec is quite a general thing, there are libraries which convert specs into database schemas, JSON schema, swagger etc

If communicating constraints to another environment is required they should help

Yeah not a fault of spec at all, which is really awesome and even inspirational IMO.


SCI (Small Clojure Interpreter) is 4500LOC and if you just want a barebones clojure interpreter to carefully evaluate '(> x y), you could probably fit it in 100 LOC clj. Ok you want to use Python, so 500 LOC py. Or port SCI, 4500LOC port is a few person-months given a reference implementation.

For the "strings which are valid sql queries" would it not be good enough to just hardcode some number of example sql queries? Say, 10-100 examples?

In fact we do very similar things - that is to say, of course spec cannot reverse-engineer my "this is valid sql" predicate to do it automatically and I end up having to hand-code a lot of generators anyways. The generation features of spec [for us, not for everyone] end up being a relatively minor value add compared to plain-ole test.check

I strongly recommend that anyone using spec for validation should check out Orchestra, to instrument the functions and have them automatically validated on each and every call: https://github.com/jeaye/orchestra

For my team, generators and parsing are basically useless with spec. We just don't use them. But describing the shape of data and instrumenting our functions, using defn-spec, to ensure that the data is correct as it flows through the system is exactly what we want and nothing I've seen in Clojure land does it like spec + Orchestra can.

I think part of this may boil down to different types of testing. We primarily use functional testing, especially for our back-end, so we're starting from HTTP and hitting each endpoint as the client would. Then we ensure that the response is correct and any effects we wanted to happen did happen. This is much closer to running production code, but we do it with full instrumentation. Being able to see an error describing exactly how the data is malformed, which function was called, and what the callstack was is such a relief in Clojure.

    Call to #'com.okletsplay.back-end.challenge.lol.util/provider-url did not conform to spec.

    -- Spec failed --------------------

    Function arguments


    should satisfy


    -- Relevant specs -------


    Detected 1 error

I'm onboard with orchestra because I tend to be lazy. But I also want to explain the rational for why Spec's instrumentation only instrument the input.

This has to do with the philosophy. If you want to write bug free programs, and I mean, if you care A LOT about software correctness.

The idea in that case will be that all your functions will have a set of unit tests and generative tests over them that asserts that for most possible inputs they return the correct output.

Once you know that, you know that if provided valid input, your functions are going to return valid output. Because you know your function works without any bugs.

Thus, you no longer need to validate the output, only the input. Because as I just said, you now know that any valid input will result in your code returning valid output as well. So re-validating the output would be redundant.

And this goes one further. After you've thoroughly tested each functions, now you want to test the integration of your functions together. So you'd instrument your app, and now you'd write a bunch of integration tests (some maybe even using generative testing), to make sure that all possible input from the user (or external systems if intended for machine use) will result in correct program behavior and an arrangement of functions that all call each other with valid input.

Once you've tested that, you now also know that the interaction/integration of all your functions work.

At this point you are confident that given any valid user input, your program will behave as expected in output and side-effect.

You can thus now disable instrumentation.

But before you go to prod, you need one more thing, you have to protect yourself against invalid user input, because you haven't tested that and don't know how your program would behave for it. Thus with Spec, you add explicit validation over your user input which reject at the boundary the input from the user of invalid.

You now know there things:

1. All your individual functions given valid input behave as expected in output and side-effect.

2. Your integration of those functions into a program works for all given valid user input.

3. Your program rejects all invalid user input, and will only process valid user input.

Thus you can go to prod with high confidence that everything will work without any defect.


Now back to orchestra. Orchestra assumes that you weren't as vigilant as I just described, and that you might have not tested each and every function, or that you only wrote a small amount of tests for them which only tested a small range of inputs. Thus it assumed because of that, probably when you go towards running functional/integ tests, you want to continue to assert the output of each function is still valid, as you anticipate those will probably create inputs to functions that your tests over that function did not test.

Something like Haskell, or even Rust, requires similar vigilance in order to get the program even into a working state. With thorough, strong, static type checkers, novel borrow checkers, and more, a lot of development time is spent up front, dealing with compiler/type errors. Thus you can go to prod with high confidence that everything will work without any defect.

Now, back to Clojure. Clojure assumes that you weren't as vigilant as I just described, and that you don't have static type checking for each function, or that you don't have a fixed domain for all of your enums. Thus it is assumed because of that, probably when you go running toward testing (unit, functional, or otherwise), you want to assert the validity of all of this data.

My point in re-painting your words is that we all trade certain guarantees in correctness for ease of development, maintainability, or whatever other reasons. Developers may choose Clojure over Haskell, for example, because maintaining all of that extra vigilance is undesirable overhead. Similarly, developers may reasonably choose not to unit test every single function in the code base, but instead functionally test the public endpoints and unit test only certain systems (such as the one which validates input for the public endpoints), because maintaining all of that extra vigilance is undesirable overhead.

Also note that if you try thinking with types, you may start seeing them as tools rather than overhead.

A good blog post about this is:


My pet project is a partial evaluator for Clojure code that uses data generated from spec to fuzz code and optimize it. The coverage is accepted as complete, so there are no guards and deoptimizations like you'd have in a JIT, the programs are just wrong.

It seems like a fairly powerful technique, although you couldn't ever rely on it with production code. After several years of tinkering I managed to get a Forth interpreter written in Clojure executing a specific input string partially evaluating down to OpenGL shader code, to hardware accelerate my friend's Stackie experiment (link to his version below).

(nth (sort [0 n 5]) 1) where sort is a merge sort also successfully compiles down to just the branches you'd hand optimize it to, which is Graal's party trick. Although they're solving the problem in a bulletproof general way, so the difficulty is incomparable.

The eventual goal is to write Clojure in Clojure without it being horrendously inefficient.


At Hyperfiddle we are using spec to specify UIs, so e.g. (s/fdef foo :args (s/cat :search-needle string? :selection keyword?) :ret (s/coll-of (s/keys ...))) describes exactly what you need to render a masterlist table with some query parameters. It's being used with pilot customers, now in production. This stuff works!

Malli[1] and specmonstah[2] are my favourite Clojure(Script) libraries built on top of Clojure Spec.

[1] https://github.com/metosin/malli

[2] https://github.com/reifyhealth/specmonstah

+1 for Specmonstah; been very useful for us at Reify Health.

I find spec really interesting but struggle to find a good fit for it in my projects. I end up getting bogged down writing generators and give up, or end up breaking down functions into such small pieces that speccing every one of them results in an unreasonable proliferation of specs and generators. This was an interesting and helpful overview of some different ideas about how to take advantage of spec, so thank you!

Also I definitely share your feelings about clojure massively increasing my job satisfaction and teaching me how to think about programming in a new (and better tbh) way.

We use spec at work for validating data passed to our HTTP routes in our luminus web application. When we get data as JSON we use spec to validate that there are the required fields being passed in and if it isn't valid we send back an error HTTP status code. It is nice to be able to easily see what fields are required without having to dig deep into code.

If you're considering using spec, you should know that spec alpha2 is significantly different internally, and is much easier to write tooling for, but has not been ported to ClojureScript. I would highly recommend trying alpha2 if you're not using CLJS, even if it's a little less stable.


> Still alpha: spec is still alpha and it looks like it will remain like that since the intention is for spec2 to completely replace it in the future.

Out of curiosity for the folks in the know: why is this the state of affairs? Where we have the library in alpha for a long time, and being replaced already?

I remember Rich mentioning his in a talk. There's one problem he considers unresolved and he wouldn't finalize the API until he had a good solution. Unfortunately, I can't remember what it was. Also, this must have been more than a year ago, whatever goes on in spec2 might have evolved since then.

for simple typing, does this seem more complex than plumatic/schema?

I have used both Spec and Schema professionally.

It's not exactly a secret that Spec is not intended to be a type-like system. But it turns out, it's perfectly possible to use Spec as a building block to do so!

As for complexity, I think instrumentation (Spec's approach) is complex. People routinely struggle with it, in fact you must use an external lib (Orchestra) to fully enable it. It also slows down execution, sometimes disproportionally.

Whereas Schema simply has a global toggle that also can be overriden in a fine-grained manner. It's explicitly made to be fast and have a type-like use.

As for type definitions, honestly both are elegant and powerful; you can quite easily use arbitrary predicates as "types" with both, and compose those predicates. Ultimately Spec is better designed because it fosters namespaced-qualifed keywords which compose better.

For addressing the instrumentation problem, I created https://github.com/nedap/speced.def which uses Clojure's :pre system. I have used :pre for over a decade so I know it has an extreme simplicity. Spec can take :pre's usefulness to the next level.

Very interesting

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact