Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Ion Specification (amazon-ion.github.io)
70 points by rewmie on Sept 13, 2023 | hide | past | favorite | 36 comments



Previous discussions on Ion:

https://news.ycombinator.com/item?id=29284428 (2 years ago, 229 comments)

https://news.ycombinator.com/item?id=23921610 (3 years ago, 110 comments)

https://news.ycombinator.com/item?id=11546098 (7 years ago, 163 comments)


On the one hand, this seems more real than a lot of promotion-ware I've seen: https://github.com/amazon-ion/ion-intellij-plugin#readme

On the other hand, they're not using this for the boto schemas, which seems like a natural place to show that it's able to capture real-world schemas so that makes it hard for me to think this has any traction


The SDKs use Smithy[1] which is tailored for defining+generating services and SDKs, Ion is more of a pure data serialization format. It's definitely niche but my org uses it in a few places and it has some nice properties that fit our case pretty well (rapidly evolving schema, most clients only care about a small subset of attributes, ability to apply multiple and different schemas based on regions or businesses).

It's the sort of thing where I'd advise exploring other options first and only using it if the whys[2] really resonate with you because it definitely comes with some overhead.

[1] https://smithy.io/2.0/index.html [2] https://amazon-ion.github.io/ion-docs/guides/why.html


Ion is heavily used on the retail side of Amazon, but it's only recently started to appear in AWS products.

AWS is starting support PartiQL (https://partiql.org/) queries in some places and PartiQL uses Ion's type system internally.


I would be interested to see how this compares to something like msgpack [1] in performance and final size of the binary. Msgpack has been my go-to for binary serialization for years due to how simple and fast it is, and how easy it is to make it work with native Clojure data structures.

[1] https://msgpack.org/index.html


That comparison would depend heavily on what you're storing.

Ion has the option of using symbol tables to replace strings (e.g. in struct/map keys or in values). So, if you benchmark had a large number of records with similar structures, I would expect Ion to pull ahead. On the other hand, if each record had nothing in common, I'd expect them to perform similarly.

One feature of the Ion libraries that I've liked is the parser will take any of the formats and figure out what to do with it (text, binary, compressed binary). It's one less thing to worry about. You can switch encodings later without breaking consumers, you can write plain text Ion when you're testing, etc.


Symbol tables, compression, etc seem one level of abstraction above what msgpack provides. Such features could be implemented on top of vanilla msgpack as long as all parties agree on the msgpack schema.


MessagePack goes to "extremes" to shrink message size; I suspect that at least to win out.


Saw this the other day, but the multiple types of null kind of turned me off - e.g. `null.int`, `null.float`, `null.null`. Is there a good justification for this? Seems like a kluge in any case.


Seems like the justification would be to keep the type information when going back and forth to Ion. More like "multiple nullable types" instead of "multiple types of null"

userBirthDay: null <-- ok, but what type is it? String? Int? Timestamp?

userBirthDay: null.timestamp <-- ok, it's a timestamp typed variable, but we don't know the value. Yay, happy programmer.


My guess is buffer size calculations.


typed nulls good


Sounds like the grug brained developer speaking... Hi! ;)


Am I reading this right that it's a binary format for real-time streaming data, similar to Avro, but can include arbitrarily deep nested structures unlike Avro?


Ion has a binary format but is not specifically about real-time streaming. It is a JSON replacement.

Ion originated 10+ years ago from the Amazon catalog team - the team that kept data about the hundreds of millions of items available on Amazon. Nearly every team in the company called the catalog to get information about items all the time - scanning the entire catalog, parts of the catalog, millions of individual item lookups every second, etc.

They did the math and some very large percentage of network traffic in Amazon Retails data centers was catalog data. If that data, currently in XML or JSON format, was sent in a more compact format it would save some ridiculous millions of dollars every year. So Ion was born and eventually open sourced.


Why do you single out avro & not any of the hundreds of other ser/de systems? Is that what you know best? Is there something specific about avro that makes it feel particularly similar? https://github.com/maximveksler/awesome-serialization


People keep inventing new ones because the old ones suck or they think the old ones suck. Look at all the discontents around JSON (no comments!), people react violently when people try to apply a little extra like JSON-LD. Then there are all the things like YAML, TOML and such that try to be a little better but are widely thought to be a little worse. (And that's just the human readable data formats) Then there is always

https://en.wikipedia.org/wiki/ASN.1

which is forgotten but not gone.


Oh man ASN.1, makes me shiver, oh the memories…

I hadn’t thought about it in like a decade but yeah it’s still silently in the background…


In fairness, in the list you link to,

- Avro and Ion are the only two that are labeled Textual/Binary

- They are in the same Big Data grouping

- They both are schema-embedded, and support some rich nested datastructures, though they deviate on many of the specifics

So I think it's reasonable to pick out Avro as an especially similar point of comparison.


I work in an a shop where Kafka and Avro are everywhere. If I worked someone else I might make reference to something else if it was front-of-mind all the time.


lol, the internal docs on this at Amazon were something very close to "we invented this before Avro and we think that's probably a better choice if you need binary serialization."

My 2 cents: don't use it.


My 2 cents: don't use avro or anything like it unless you can prove its going to save you money


What would you suggest? Just JSON everywhere?


json, csv, text, html, binary blobs you dont create, whatever is easiest


I have never used Ion so I cannot speak to its use in practice, but I haven't really had too much of an issue with msgpack. It's faster than JSON, more compressed than JSON, without being any more difficult than any JSON library I've used. It's an almost-universal good for me; the only thing you lose is the ability to easily introspect the messages if there's an issue.


cbor even.

Honestly, if you’re in a case where you absolutely know none of these work for you and you can absolutely prove you need another, you’re probably just going to write your own. And that’s a fleetingly rare case.


My $0.02 is "yes, JSON everywhere". Specifically, one object per line, newline delimited, sorted keys, compressed with zstd or gzip.


Why?


I’m interested in the answer as well. Also interested what’s wrong with Ion


Am I right in assuming that this is like Protobuf but just for JSON objects?


It’s a superset of JSON with an isomorphic binary encoding, additional data types (blobs, s-exps, timestamps, symbols, etc.), better number handling, annotations, and the ability to pre-share symbol metadata for more efficient binary encoding (similar to how protobufs encodes fields, but optional).

You can write Ion by hand (like JSON) and share it without a schema (unlike protobufs). There’s fewer ways to express values than YAML, but more data types.

Having S-exps is convenient for writing DSLs in a data language that’s easily readable from other languages.


What's the pros and cons of this versus CBOR, which we had great success with in our system.

https://cbor.io


Pros of Ion vs CBOR:

Wider range of data types - Ion supports decimals, symbols, blobs, and clobs which don't exist in CBOR. Optional schemas and annotations - Ion allows attaching type/schema information to data for validation purposes. CBOR has no schema support. Text format - Ion provides a human-readable text format for data interchange, CBOR is binary only. Maturity - Ion has been used in production at Amazon since 2009, CBOR is a newer standard (RFC 7049 in 2014). Language support - More mature library ecosystem around Ion vs CBOR which is still gaining adoption.

Pros of CBOR vs Ion:

Standardized - CBOR is an IETF standard, Ion is an Amazon-proprietary format. Simplicity - CBOR has a smaller set of basic data types making it simpler to implement. Used in other standards - CBOR is used in data formats like COSE for crypto operations and CWT for web tokens. Efficiency - The CBOR binary format can have a smaller encoding size than Ion's. JSON interoperability - CBOR is designed to be a JSON-compatible binary format. Ion is JSON-like but not fully compatible.

In summary, Ion has richer data typing and schema capabilities and a long production history. But CBOR is simpler, standardized, and gaining momentum - especially in crypto and web standards using it as a binary encoding basis.

So Ion may be better for applications dealing with complex, annotated data. But CBOR has advantages for an efficient binary interchange format, particularly when standards compatibility is important.


Can this be used similarly to GraphQL?


This is just the data serialization format, you have to build any other functionality yourself. We do have a pattern on a few of our APIs where there's a big fixed schema (i.e. it's just a struct and you can't do GraphQL things like following references and hydrating them into objects) and clients select the subset of attributes they want and we only return that. It's useful for reducing response sizes but the main benefit is we can pretty easily track which attributes are actually used over time. That helps us deprecate attributes with a lot less pain.


(2016)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: