
Show HN: Avsc – a pure JavaScript Avro implementation - mtth
https://github.com/mtth/avsc
======
cmpb
I poked around your code; it's quite clean and readable!

I've never heard of Avro, but it looks really interesting. Could you enlighten
me on some use cases, and situations where it really shines?

~~~
mtth
Thanks!

I like to think of Avro in two (related) parts:

\+ A very compact binary serialization, which lets you efficiently store or
transfer data over the wire. (Avro actually also defines a way to encode your
data as JSON in cases when you need a human-readable representation.)

\+ A way to define "data types". For example this schema [1] defines what we
expect a "Human" entry to look like, and we can now do things like check
whether we are missing any information, make sure all fields are valid, sort,
efficiently copy objects, etc.

This is glossing over many things (files, RPC, ...) but I hope this helps a
bit! You can find a lot more information in the official Avro documentation
[2] if you're curious.

[1]
[https://github.com/mtth/avsc/blob/master/etc/benchmarks/sche...](https://github.com/mtth/avsc/blob/master/etc/benchmarks/schemas/Human.avsc)

[2]
[https://avro.apache.org/docs/current/index.html](https://avro.apache.org/docs/current/index.html)

------
tristanz
This is terrific if the benchmarks are accurate. The vast majority of these
types of libraries end up slower than JSON when implemented in JS. What were
the key performance tricks?

The benchmarks show a fair number of -1s (not supported). What isn't supported
exactly? I don't see these limitations listed in the limitations section.

~~~
mtth
I'll start with the second question since it's easier to answer: some of the
other libraries don't support all Avro features (for example I don't think
`node-avro-io` supports encoding records defined inside arrays), so they're
unable to run some of the benchmarks. Both the reference Java implementation
and `avsc` are able to parse all these schemas so don't have any -1s in their
columns.

Performance-wise, the main benefits came from doing as much preprocessing as
possible. Here are a couple examples of what is done when a schema is parsed:

\+ All methods for encoding and decoding nested types get fully resolved,
removing the need for lookups (for example an array of integers would have a
reference to the method for encoding integers).

\+ A constructor, and encoding/decoding methods gets programmatically
generated for each record type. This avoids having to iterate over the fields
each time. Same for unions. (Having a "Class" for each record type also lets
us attach useful methods to its prototype, available for all record
instances.)

Then there are a few other general ideas like making sure v8 is able to
optimize all the methods on the critical path, or reusing buffers as much as
possible to avoid allocating too many objects.

------
tlb
Would you expect Chrome browser performance to be similar to node performance?
My use case involves shipping large arrays of doubles to the browser, and JSON
read/write is a bottleneck.

~~~
mtth
I haven't tested this but my guess would be that performance would be similar
as they both run on v8. The only difference is the buffer implementation which
is roughly as fast it seems (benchmarks here [1]).

[1]
[https://github.com/feross/buffer#performance](https://github.com/feross/buffer#performance)

------
mtth
Creator here. If you have any questions, please ask!

~~~
caust1c
Why did you name it avsc? Are you concerned that it might cause confusion with
the avsc file extension that avro uses for schema files?

[https://avro.apache.org/docs/1.7.7/gettingstartedjava.html](https://avro.apache.org/docs/1.7.7/gettingstartedjava.html)

~~~
mtth
Mostly because `avro` was already taken on NPM and I wanted a short name to
`require`. Since schemas (and schema files) seemed to me to be at the heart of
Avro, I thought it was a reasonable option.

------
zump
Why is Avro better than Thrift?

~~~
mring33621
Not that it's better, but Avro is a widely supported format in open source
'big data' tools.

