
Transit – A format for conveying values between different languages - _halgari
http://blog.cognitect.com/blog/2014/7/22/transit
======
haberman
I really think the future is schema-based.

The evolution of technologies goes something like this:

1\. Generation 1 is statically typed / schemaful because it's principled and
and offers performance benefits.

2\. Everyone recoils in horror at how complicated and over-designed generation
1 is. Generation 2 is dynamically typed / schemaless, and conventional wisdom
becomes that this is generally more programmer-friendly.

3\. The drawbacks of schemaless become more clear (annoying runtime errors,
misspelled field names, harder to statically analyze the program/system/etc).
Meanwhile the static typing people have figured out how offer the benefits of
static typing without making it feel so complicated.

We see this with programming languages:

1\. C++

2\. Ruby/Python/PHP/etc.

3\. Swift, Dart, Go, Rust to some extent, as well as the general trend of
inferred types and optional type annotations

Or messaging formats:

1\. CORBA, ASN.1, XML Schema, SOAP

2\. JSON

3\. Protocol Buffers, Cap'n Proto, Avro, Thrift

Or databases:

1\. SQL

2\. NoSQL

3\. well, sort of a return to SQL to some extent, it wasn't that bad to begin
with given the right tooling.

If you are allergic to the idea of schemas, I would be curious to ask:

1\. isn't most of your data "de facto" schemaful anyway? Like when you send an
API call with JSON, isn't there a standard set of keys that the server is
expecting? Isn't it nicer to actually write down this set of keys and their
expected types in a way that a machine can understand, instead of it just
being documentation on a web page?

2\. Is it the schema _itself_ that you are opposed to, or the pain that clunky
schema-based technologies have imposed on you? If importing your schema types
was as simple as importing any other library function in your native language,
are you still opposed to it?

~~~
BinaryIdiot
I like the idea of schemas but only when they're built into the message (which
is probably not a good way of conveying my meaning).

Essentially JSON gives you numbers, strings and nulls so when accepted on the
other side it obviously knows what's a number, what's a string, etc.

Honestly if JSON could be expanded to essentially be the same but add
additional types along with bolting on new types (extensible) then I think it
would be perfect for the job.

At least in my opinion.

~~~
stdbrouw
You're talking about typing, not really about schemas. With JSON, you can know
whether a value is a number as opposed to a string, but you can't know whether
it's _supposed_ to be a number.

~~~
ochoa
Dumb question: why doesn't the value being a number tell us it's suppose to be
a number?

~~~
kalleboo
Example: Postal codes. Say you're transferring an address in JSON and you have
a postal code field. In the UK, postal codes are strings (e.g. "BS42BG"), easy
enough. Now, someone enters a US postal code (90505). Should we transfer it as
a number, or a string?

~~~
elros
Definitely as a string. Numbers aren't things that have digits. Numbers are
things you do math with.

~~~
kalleboo
OK, that's logical. So where do we specify this without a schema? What happens
if a client sends a number instead of a string to the server? Should it accept
it and convert it, or return an error?

------
lnmx
So, EDN [1] is a formalization of Clojure data-literal syntax that includes
tagged types, has a text representation, and no built-in caching.

Fressian [2] supports the same types and extensibility as EDN, has a compact
binary encoding, and the serializer/writer can choose its own caching strategy
(so-called domain-specific caching[3]). I believe it was created to provide a
serialization format for Datomic.

Transit sounds like an evolution of EDN and Fressian: make the bottom layer
pluggable to support human-readable/browser-friendly JSON or use the well-
established msgpack for compactness. Caching is still there, but it can only
be used for keywords/strings/symbols/etc. instead of arbitrary values like
Fressian -- probably a good trade-off for simplicity.

[1]: [http://edn-format.org](http://edn-format.org) [2]:
[http://fressian.org](http://fressian.org) [3]:
[https://github.com/Datomic/fressian/wiki/Caching](https://github.com/Datomic/fressian/wiki/Caching)

~~~
devin
Nailed it.

------
nimish
1\. Protobuf 2\. Avro 3\. Thrift 4\. MsgPack 5\. CORBA 6\. ASN.1 7\. Cap'n
Proto 8\. FlatBuffer

\+ whatever internal stuff big software companies have cooked up etc.

What was so special about your use-case that demanded a totally new standard?

I hate to bring up that xkcd but it's actually relevant here.

Is it the higher-level semantics on top that allow abstraction over the
underlying serialization format?

The "caching" doesn't seem to be that big of a win where network latency is
high and some of the other formats can be directly mmapped, but it looks
intriguing however it seems like something that could be added in a versioned
binary format that some of the others provide.

~~~
_halgari
Protobuf - static schema, non self describing Avro - not self describing
MsgPack - limited data types (no URLs, Dates, etc) etc.

Go find a format that offers everything transit does, and when you don't find
a perfect match for all the goals, you'll understand why this library was
created.

Cross-platform (without writing in C), self describing, schema-less,
extensible, support for caching, etc.

~~~
fleitz
JSON does all the things Transit does, with fewer types. (And less stupidity)

Should rename it to Enterprise JSON, because it's JSON with more complexity
for those architects who don't realize you can easily store a date as a int,
or a URL as a string. (Or cache ANY document)

Seriously... why does a document format need support for caching? It would
seem to me that a document format should be agnostic to whether it has been
cached or not.

Also, why does a document care what language writes it? I don't understand how
a document couldn't be cross platform, like maybe if you're using 36-bit words
or some fuckery, but most people these days store documents using 8 bit words.
Does anyone seriously have issues with JSON on a PDP-10?

~~~
jlebar
> JSON does all the things Transit does, with fewer types. (And less
> stupidity)

I really hate comments like this.

These guys took the time to show the world this thing they created to fill a
need they had, and this comment takes a dump on it without its author first
getting any experience using the system. As though the author understands
Transit's purpose better than Transit's authors do.

Hey, I get it: Transit /does/ (at first glance) seem largely redundant with
all the other serialization libraries out there. But before we assume that its
authors spent all this time on their project because they're "stupid", it
behooves us to try to understand their motivations.

In the end they're not hurting anyone by releasing this thing they built. If
it's bad, you don't have to use it. There's no need to be mean or get upset.

~~~
fleitz
I'm not upset, I'm fine with other people using it. I still think _the format_
is stupid. For the same reasons I think XML is stupid, and no I don't have to
use XML, nor do I.

Also, my comment isn't hurting anyone, if you don't like it you don't have to
read it, there's no reason to hate :)

No where in my post did I say Rich Hickey is stupid, he has some very great
ideas I just don't think this is one of them.

~~~
eropple
Your comment is hurting people. It hurts the developers, based on at best
subjective and at worst ignorant evidence. And it lowers the quality of
discussion because people end up having to address your culturally poor
behavior rather than the topic at hand.

Being mindful isn't hard.

------
sgrove
One of the big issues we've been struggling with is getting large
ClojureScript data structures with tons and tons of structural sharing (think
application state history) 1.) small enough to transmit to the server 2.) for
efficient storage.

It sounds like Transit may help with this via its caching etc.? Can someone
from Cognitect comment on whether this is a suitable use?

~~~
swannodette
I don't think it can help here - caching only applies to map keys, transit
keywords, transit symbols, and transit tags and it's not as of yet
configurable.

~~~
sgrove
Ah, bummer. The search goes on then - thanks for the quick reply!

------
fogus
A tour of the JS implementation is at [http://cognitect.github.io/transit-
tour/](http://cognitect.github.io/transit-tour/)

------
ciniglio
The actual spec is here: [https://github.com/cognitect/transit-
format](https://github.com/cognitect/transit-format)

------
tetsuoironman
This is likely a prelude to how Datomic will support multiple languages
outside of the JVM.

------
Ixiaus
Why re-invent the wheel when MessagePack already exists, supports a similar
set of types, and has far greater implementation reach?

~~~
richhickey
The biggest difference is that MessagePack extensibility (which is not yet
widely implemented) is based upon binary blobs, whereas Transit defines
extensions in terms of other Transit types. Also, Transit can reach the
browser via JSON. And Transit has caching...

~~~
Ixiaus
Okay, thanks for edifying. As someone else said (to the creators) it would be
nice to see a "why I would use this" blurb.

------
arosequist
They also released a podcast episode about Transit, which doesn't seem to be
mentioned on the blog post:

[http://blog.cognitect.com/cognicast/060-tim-
ewald](http://blog.cognitect.com/cognicast/060-tim-ewald)

------
rubiquity
Literally about an hour ago I was browsing around Rich Hickey's Twitter
account and the Cognitect website because I thought, "Hey, I haven't heard
anything from him/them in a while", and voila! Just like that, this appears.

~~~
jeletonskelly
Just when you think Rich Hickey has retired to his hammock to play classic
guitar, he throws out something new and awesome.

------
anentropic
Is this just NIHism...? I find it frustrating the announcement didn't explain
why they felt the need to create an alternative to Avro, Cap'n Proto etc.
Transit doesn't seem to do anything new... maybe it's better, maybe not

~~~
tonsky
It is accessible from browser (and is fast, on par with JSON), it has rich set
of basic types + extensibility built in

------
joeevans
Can anyone explain what this would mean for the day-to-day programmer?

~~~
ciniglio
It's basically a format that lets you transmit data somewhat more extensibly
than JSON would on its own (and doesn't take a huge performance hit).

This means that you can for example, transmit an array of dates and not need
to worry about parsing the dates in the right location when you receive them.

Additionally, it's extensible, so you can define formats for any domain
specific data that you're dealing with, if you need to.

~~~
nogridbag
Manually marshalling/unmarshalling JSON has been a pain point for me on recent
projects. I took a quick look at Protocol buffers just now and instantly
understood how that will solve most of my issues with JSON. I define a simple
message format (.proto) and generate native classes. My service methods can
use the generated classes as parameters.

The only thing missing for me was native support for JS (but I quickly found
3rd party libraries).

I don't quite understand how transit, since it's schema-less, addresses that
problem. From transit-java docs:

Object data = reader.read();

I might be missing something, but it seems I have to manually create the
native classes on both endpoints and cast to those classes. Either that or I
still have to manually extract values using the reader API.

------
mahmoudimus
This comment is not necessarily related to Transit, but any serialization
specification -- in XML, we had XSLT which could transform any well-defined
input to an XML output and vice versa.

What's the equivalent for JSON/Transit etc? When parsing and validating the
correctness of an input, what is the standard protocol for propagating error
messages laced with contextual domain information?

The two solutions I've found was: \- use XSLT \- use a domain specific
language

------
bellerocky
Reminds me of Thrift[1] which is an Apache foundation project started by
Facebook and supports more languages. It also is battle tested. I've seen it
used in production under heavy load. I don't know if Thrift does the caching
or needs to. Data on the wire is already compressible via gzip which should
handle repetitive values.

[1] [https://thrift.apache.org/](https://thrift.apache.org/)

~~~
lclarkmichalek
Thrift is non self describing, which seems to be incompatible with one of the
aims of Transit (be self describing)

------
mijoharas
I couldn't ask someone to explain the similarities and differences between
this and EDN could I? I realize that this seems more aimed at transferring
data whereas EDN may have been more targeted at serialization (is that
correct? please correct me if I'm misremembering) but I thought they covered
similar use cases? (with EDN obviously not including the performance
enhancements that transit seems to have).

~~~
grayrest
It's EDN but on top of JSON/MsgPack instead of having its own serialization
format.

------
saosebastiao
It wasn't exactly clear from the post...how is this different from msgpack? Is
it just an implementation of more complex data types on top of it?

------
wicknicks
I couldn't find any documentation about it, but is there any way to achieve
forward/backward compatibility with Transit?

------
iheart2code
As neat as this sounds, I would prefer to do a little extra parsing by hand in
exchange for the readability of JSON. Looking at some of Transit's examples,
it seems like it would be difficult to gain as complete an understanding of a
set of information at a glance.

~~~
rdtsc
JSON is readable in small quantities. A 100MB JSON file is just as readable as
a 100MB binary file, except it is slower and bigger to parse.

Also unless you can see electrons bouncing on the wire, JSON is readable
because there is a program that decodes and shows it to you. It would probably
take a couple of lines of code in python to cat a msgpack file.

~~~
icebraining
Displaying files is not the only way to read JSON. I've often read it on the
browser development tools, on Wireshark, on tcpdump, etc.

------
brianolson
For another binary-JSON, see "CBOR" Concise Binary Object Representation, IETF
RFC 7049 [http://cbor.io/](http://cbor.io/) I wrote Python and Go
implementations of CBOR. It packs smaller and parses faster than JSON.

------
EGreg
I think this is very helpful to keep in mind... consumers pushing their
demands to producers... and eliminating waste and inefficiency.

[http://en.wikipedia.org/wiki/Lean_manufacturing](http://en.wikipedia.org/wiki/Lean_manufacturing)

------
antihero
I think it would be nice if more serialization formats could at least support
timezone-aware date/times and deltas, as they are used really really
frequently and it's a total pain to have to do a second parse to deserialize
them.

------
bhouston
Can this be used for saving state to disk when you don't want to use a
database?

~~~
ciniglio
They're suggesting not using this for storage at the moment since the spec is
still in flux. But I think once the spec stabilizes, that seems reasonable.

------
peterkelly
It looks promising, but is there a mapping for XML? I would recommend adding
this (as an optional profile) in a future version of the spec. It would help
interoperability with legacy (non-Transit based) systems.

------
ciroduran
Obligatory XKCD standards post - [http://xkcd.com/927/](http://xkcd.com/927/)

------
squigs25
Cool, now just add a TransitSchema package for every scripting language and we
can use it in place of protocol buffers

------
c4pt0r
something like BSON?

------
ape4
As easy as:

[["^ ","~:district/region","~:region/e","~:db/id",["^
","~:idx",-1000001,"~:part","~:db.part/user"],"~:district/name","East"],["^
","^2",["^ ...

~~~
swannodette
Or with verbose output:

    
    
      [{"~:district/region": "~:region/e",
        "~:db/id": {"~:idx": -1000001, "~:part": "~:db.part/user"},
        "~:district/name": "East"}, ...]

------
zcam
Underwhelming... From the teasers it seemed like it could be something
actually novel.

~~~
fogus
You say that as if it's a bad thing. It may not be novel, but it'll definitely
be useful.

