
Transit: JSON Data Interchange Format - dedalus
https://github.com/cognitect/transit-format
======
lars512
I couldn't get the why of the project from the Github page alone. Rich
Hickey's post introducing it a year ago is clearer:

[http://blog.cognitect.com/blog/2014/7/22/transit](http://blog.cognitect.com/blog/2014/7/22/transit)

JSON has become the go-to schemaless format between languages, despite being
verbose and having problems with typing. Transit aims to be a general-purpose
successor to JSON here.

Stronger typing allows the optional use of a more compact binary format (using
MessagePack). Otherwise it too uses JSON on the wire.

Anyone who knows more, please correct me.

~~~
gioele
> JSON has become the go-to schemaless format between languages, despite being
> verbose and having problems with typing. Transit aims to be a general-
> purpose successor to JSON here.

XML has become the go-to schemaless format between languages, despite being
verbose and having problems with typing. JSON aims to be a general-purpose
successor to XML here.

~~~
brianberns
Yes, except that XML has a schema (XSD).

~~~
nmeofthestate
Ahem. [http://json-schema.org/](http://json-schema.org/)

~~~
orless
Quoting [http://json-schema.org/latest/json-schema-core.html](http://json-
schema.org/latest/json-schema-core.html): "This Internet-Draft will expire on
August 3, 2013."

~~~
orless
... which, by the way, does not mean that JSON Schema is bad or unusable. I
actually quite like it, it has certaint elegance. Just not _really_ standard.

------
khgvljhkb
Am I the only one amazed by what the Clojure community and core team are
conjuring up?

Doing client-side programming with things like CLJS, Figwheel, Reagent and
core.async feels miles ahead of what we have in moden-js-land (es6/7, babel,
webpack, React, promises).

If you were to start a startup today, would you be comfortable going with
something like Clojure/script?

~~~
grayrest
The Clojurescript community are ahead but not by that much. To list specifics,
cljs->babel+and immutable lib, figwheel->react-hot-loader, reagent->react 0.14
pure components, core.async->js-csp (or async/await).

In terms of non-component organization, I believe re-frame is a significant
improvement over redux. Reactions are good when you don't control the
endpoints, splitting out the reducers into pure functions is good, but adding
middleware on them is the real win. On the other hand, it wouldn't be that
hard to adapt the model to redux.

The next phase of organization is integrating Relay/Falcor concepts. David
Nolan gave a talk about this in Om Next at NYC Clojure last month and there is
a video. Om Next as presented is very compelling if you're on Datomic and less
so otherwise.

As to your startup question:

I've been a full time cljs dev on a b2b app at Reuters for the past 9 months.
I took up the job specifically because I wanted to write cljs. I had been
involved with the Clojure community (I care about state) but only working on
toy projects for the previous ~4 years.

My experience with Clojurescript is that it was less of an improvement on
modern js than I was expecting. The biggest advantages are protocols and the
standard library being both rich and standard. Nice to haves but non-critical
are native syntax for immutable maps and multimethods. I guesstimate I write
~10% less code in cljs versus js but you're ultimately writing the same stuff.

Problems I've run into:

Full build time for this app is long. Our app is in the 15k LoC range across
~150 files and cold compile is 140s on a 2012 MBA. It's annoying but
incremental compilation times are sub-second after some build config tweaking.

We have one component in particular that tends to get lost when switching
branches and the missing namespace forces a fresh compile. Our cljs version is
from June so this may be fixed. I've also spent a number of hours debugging
problems that turned out to be stale build issues.

I tried a couple times unsuccessfully to get Emacs (cider) to connect to a
figwheel repl. After a few evals things simply become unresponsive. Just using
figwheel is good enough but I miss the in-editor repl. Haven't tracked down
the reason, could be my lack of emacs knowledge.

If you're using core.async, the main loop has a try/catch/rethrow. This causes
Chrome Dev Tools to break in the outer loop instead of actually at the
problem. You have to explicitly err.stack in the console (which is not source
mapped) and don't have access to the locals unless you manually set a
breakpoint at the error and reload. You also get to learn to read the JS
representation of Clojure literals. None of this is impossible and if you're
working in a tight loop you tend to have a pretty good idea of what the error
is without jumping through debugger hoops but if you're doing something like
switching branches or refactoring it's annoying.

I like Reagent but I've had a number of times where its behavior doesn't match
my expectations. In particular, figuring out what part of the vdom is
invalidated on a ratom change caused me problems. There's a gotcha that
sequences must be forced with doall or you'll get weird behavior. At the
moment I have a very expensive reaction (list processing ~6k items) that's
getting run 8 times in response to a single key change in the source ratom so
I'll be tracking that down tomorrow.

I don't consider this list a reason to not adopt Clojurescript. I can make a
similar list for the Babel stack.

As for the question of would I be comfortable, I like writing clojurescript
but I'd only really recommend it if you're committed to full stack Clojure. It
takes a number of weeks for a new frontend hire to ramp up on the language.
I've discussed this with the other frontend specialist on the team and our
consensus is that cljs is a better language but we're not that much more
productive in the language compared to ES2015 so I'm not really convinced the
weeks of ramp up time are worth it. Our experience with hiring has been that
we've had very few candidates but they've all been skilled.

~~~
Scarbutt
_The Clojurescript community are ahead but not by that much. To list
specifics, cljs- >babel+and immutable lib, figwheel->react-hot-loader,
reagent->react 0.14 pure components, core.async->js-csp (or async/await)._

Immutable, js-csp, etc... were all inspired by Clojurescript, I guess that's
the point the grandparent was trying to make.

------
creshal
Reinventing XML, one data type at a time.

~~~
andyjohnson0
My thoughts exactly. JSON is great for Javascript clients, but if you're
dealing with clients written in multiple languages then there is already a
good language-neutral serialisation format: XML. Just because it's not
fashionable (with some) doesn't mean it doesn't work.

Edit: So why the downvotes? How about a conversation instead?

~~~
rjbwork
I used to agree, but I've lately come over to the JSON side, I think. It's
easier to read by a human, and it doesn't have this weird "should it be a node
or an attribute" thing. Single things are properties, many things are arrays.

And now with Schemas and editor support for them, I think it is an acceptable
replacement personally.

~~~
andyjohnson0
Nodes for data, attributes for metadata.

I think you made a good argument, although I personally prefer the more mature
XML tooling and metadata support for versioning.

~~~
dragonwriter
> Nodes for data, attributes for metadata.

Its a nice soundbite, but it ends up being less than useful in practice,
because all metadata is data, and almost any data can be viewed as metadata, a
distinction which is both subjective and strongly influenced by the use to
which a consumer is putting the data rather than being determined on the basis
solely of the inherent nature of the data.

~~~
andyjohnson0
Its a rule of thumb (heuristic). Judgement is still required.

(A soundbite is something altogether different.)

------
escherize
The most novel use of transit is in the Sente [1] library for clojure/script.
It is an abstraction over long-polling / websockets that lets us treat it as a
core.async channel (which is like a go-block in Go).

It's worked awesome for updates, and using Transit to keep the transmissions
minimal has let us focus on the API for a realtime system.

[1] - [https://github.com/ptaoussanis/sente#sente-channel-
sockets-f...](https://github.com/ptaoussanis/sente#sente-channel-sockets-for-
clojure)

~~~
retrogradeorbit
In a project I tried both chord and sente and settled on chord. Chord was much
simpler.

It was a point to point system. Sente seemed to have better support for point
to multipoint (like a chat app). It was overkill for what we were doing and
chord fit the bill nicely.

[https://github.com/jarohen/chord](https://github.com/jarohen/chord)

------
mukundmr
Why choose this over Google's Protocol buffers?
[https://github.com/google/protobuf](https://github.com/google/protobuf)

~~~
mateuszf
One reason is compatibility - it can be encoded to json and thus is handled by
http proxies, etc. It is also human readable in encoded form.

~~~
icebraining
Protobuf is just a serialization format; you can send it over HTTP.

------
nly
Not sure I like the idea of cramming ASCII type tags in to the encoded JSON.

I'm more partial to the way Avro does it, where the encoded JSON remains type-
tag and cruft free, and a separate schema (also JSON) is used (and required)
to interpret the types, or encode to the correct binary encoding.

~~~
Skinney
Transit also allows you to use msgpack, but JSON is more performant on the
web.

Another feature Transit has is that it caches identical keys (can also cache
values with some additional code), giving you a smaller footprint.

~~~
nly
Sure, but all that comes at the huge disadvantage of throwing away the ability
for someone to just grab a JSON library and access your data adhoc. Now you
have to fiddle with stripping away tildes and cache references and such.

At the same time, I don't see how you can do anything sensible with Transit
except throw it in to a hashmap/dict alongside runtime type information. This
is natural for dynamic languages, but, without a schema, you're left to write
you're still left to write your own error-prone structural validation, and
can't fully leverage the efficiency of static languages. MsgPack has this
problem as well.

Optimising the JSON representation just doesn't seem worth it to me. Avro for
example encodes enums by exposing the type names as JSON keys. It sucks, and
it's something I'd change, but at the end of the day you're building on a poor
transport. Caching reminds me a lot of DNS label compression, or a poor mans
preset zlib dictionary.

~~~
Skinney
The great thing about Transit is that I can extend the data layer with new
types (like JodaTime). Even without that, I really like Transit's builtin
support for sets, keywords, symbols, lists and vectors (which really helps in
a clojure environment).

Another great thing (again, for someone using functional languages) is that
Transit can ensure identity for equal values.

When it comes to structural validation, that is something I use
prismatic.schema for. Of course, it helps that I can share the validation code
between backend and server (using Clojure and ClojureScript).

When using plain JSON, I had to convert types myself after decoding/encoding,
which was error-prone and gave a surprising amount of bugs (especially
regarding dates).

Transit solves a big problem for me, while still retaining the easy to read
syntax which JSON has, and also has great tooling in browsers.

Also, caching isn't only about smaller size. It allows for faster parsing of
the initial data, allowing Transit to be just as fast as plain JSON for
certain payloads, even considering that it has to expand from cache and decode
values.

------
sandij
These posts by the author of transit-js clear up some things and go in depth
about performance:

[http://swannodette.github.io/2014/07/23/a-closer-look-at-
tra...](http://swannodette.github.io/2014/07/23/a-closer-look-at-transit/)
[http://swannodette.github.io/2014/07/26/transit--
clojurescri...](http://swannodette.github.io/2014/07/26/transit--
clojurescript/) [http://swannodette.github.io/2014/07/30/hijacking-
json/](http://swannodette.github.io/2014/07/30/hijacking-json/)

------
fnordsensei
I recently used this in a project where I simply wanted a typing guarantee
that JSON can't provide (i.e., that a timestamp really is a timestamp when it
arrives on the other side, not a string like in JSON). It's very easy to use,
more or less just a drop-in middleware.

------
kayamon
I can't help but wonder if it isn't simpler just to use gzipped JSON. I'd be
interested to see a size comparison of the two. It seems like they're going to
an awful lot of work to hand-roll a suboptimal text compression scheme here.

~~~
easytiger
It's the whole FAST/FIX clusterfuck all over again. Use a binary format and
stop cocking around. You can't serialise/deserialise it faster and JSON is a
string based mess. No low latency path can absorb it. Even influxdb have
removed JSON support because it was their slowest part of their critical path
that could not be optimised. Complete lack of mechanical sympathy + laziness
is why JSON is popular outside of the web world

~~~
waxjar
I don't think I would enjoy debugging a service that talks in a binary format.

~~~
to3m
I've never done it with JS. It's not really a problem in C, though. If there's
a lot of data, you're no better off with text; if there's not much, reading
binary data isn't too bad. You've usually got other problems on top of
decoding the actual data - like, why is this data coming in the first place?
Why isn't the code accepting it properly? Why isn't the other end listening?
The data encoding is just the tip of the iceberg.

Either way, the advantages of binary often outweigh the difficulty of having a
human read the data. As the saying goes, code is executed far more often than
it's read.

(Most binary formats have some fairly obvious possible text representations,
at least for key fields (they're just encoding ints, floats, strings,
bitfields, etc.). When you're having difficulty, or you're simply curious, you
can print them out. If the format is any good, malformed messages are easy to
check for in code - this code is not harder to debug than any other. If
anything it's actually easier than with some kind of text format.)

~~~
waxjar
Sure, but it's not very convenient.

I regularly MITM a connection between a mobile app and a HTTP server, when
something isn't going quite right. A look at the JSON they exchange exposes
the problem in under a minute more often than not. If I want to test something
out quickly, I simply modify the incoming / outgoing JSON by hand. It's rare
to get the syntax wrong and involves no context switches. I can then go back
to the code, find the relevant section and make the changes I need to make. I
find this a very convenient way of debugging and I don't think it would be as
nice to do this with a binary format.

------
hyperpallium
> The extension mechanism is open, allowing programs using Transit to add new
> elements specific to their needs. Users of data formats without such
> facilities must rely on either schemas, convention, or context to convey
> elements not included in the base set,

The extension mechanism is writing handlers in all languages communicated
with, since its stated purpose is cross language value conveyance.

In contrast, a schema language allows extensions to be described once, in one
language.

I was expecting this to be a sort of macros for data notation (an inline
schema language), but it seems more like an extendible serialization library.

~~~
dragonwriter
> In contrast, a schema language allows extensions to be described once, in
> one language.

No, it doesn't. A schema language allows that to be done for the syntax, but
requires the semantics to be implemented for each language. Schema languages
often _include_ , as part of their specifications, core types which must be
supported; when an application restricts itself to these core types and types
whose only important semantics are derived from them (e.g., restricted subsets
of core types in most cases), then the fact that every full implementation
that supports the core types will already have this work done for every
language means no additional work is necessary. But that's not a product of a
schema language preventing types to be implemented for all host languages,
that's a result of the fact that a predefined set of core types accompanying
the schema language means that all implementations are required to have
already done the work of implementing the core types for all languages.

------
Perceptes
Previous HN discussion from when it was announced last year:
[https://news.ycombinator.com/item?id=8069346](https://news.ycombinator.com/item?id=8069346)

------
agentgt
The main advantage it seems for Transit is that JSON is fast for many
different clients. Otherwise I'm not sure I would ever use it for internal
services considering there are so many other probably faster options
(Protobuf,Avro,SBE ... etc).

Are folks using it for internal services?

------
nablaone
I think i saw it before:

[http://www.lispworks.com/documentation/HyperSpec/Body/02_dh....](http://www.lispworks.com/documentation/HyperSpec/Body/02_dh.htm)

------
agopaul
So it's basically a set of libraries used to marshal/unmarshal objects without
using a schema or can it also be used as an RPC library?

------
maweki
So the python implementation is 2.7 only...

------
latenightcoding
Very interesting (just commenting to read it later)

~~~
icebraining
You can just upvote it, then go to
[https://news.ycombinator.com/saved?id=latenightcoding](https://news.ycombinator.com/saved?id=latenightcoding)

Or use bookmarks :)

