
Amazon Ion - BerislavLopac
https://amzn.github.io/ion-docs/
======
galkk
After working (internally) with Ion and related tooling, I'd say that I was
opposite of fan. Protobuf strength is in good tooling/codegen around it
(especially inside of Google), and with Ion you just have "superset of json".

Ion never had nice code wrappers around serialized structures, and most of the
time, especially with rich structures it was frustrating experience.

~~~
nine_k
But do you see any reason why such tooling could not be built?

I suppose it's mostly an under-investment of time, not a shortcoming of the
format itself.

~~~
maxmcd
Might it be though? Protobuf's tooling seems like a byproduct of the fact that
you can't read protobuf and it's strict and type safe enough that you can
generate lots of things.

Ion is readable and (seemingly) not very strict about schema. Seems like that
would not readily incentivise additional tooling.

~~~
vii
Protobuf actually has a canonical text format. It's easy to produce in C++ or
Java [https://developers.google.com/protocol-
buffers/docs/referenc...](https://developers.google.com/protocol-
buffers/docs/reference/cpp/google.protobuf.text_format)

The format has much less extra syntactical noise than JSON.

For example,

    
    
        name: "vii"  # comments allowed!
        id: 23923373
    

Pretty nifty as it allows readable configuration files with structured data.

~~~
tandr
if it is an "easy to produce or consume in language X" it does not mean it is
canonical - it means that language X has an extension that allows to do so. Is
there a place in protobuf spec or documentation mentioning this to be a part
of the protocol?

~~~
vii
Language X is at least C++ (as linked), Java
[https://developers.google.com/protocol-
buffers/docs/referenc...](https://developers.google.com/protocol-
buffers/docs/reference/java/com/google/protobuf/TextFormat) , and Python
[https://github.com/protocolbuffers/protobuf/blob/master/pyth...](https://github.com/protocolbuffers/protobuf/blob/master/python/google/protobuf/text_format.py)

I think the Protobuf spec focuses on the binary serialization - the text
format and JSON representations are not related to that at all, of course.

------
gavinray
I think that talking about Ion, without talking about PartiQL, is not setting
people up with proper context.

PartiQL is AWS's specification for a parser/query language that is compatible
with standard SQL, but can query semi-structured or unstructured data (think
JSON, Parquet, CSV/TSV etc)

[https://aws.amazon.com/blogs/opensource/announcing-
partiql-o...](https://aws.amazon.com/blogs/opensource/announcing-partiql-one-
query-language-for-all-your-data/)

PartiQL uses Ion as it's backbone and data format:

[https://partiql.org/faqs.html#why-do-you-choose-ion-to-
exten...](https://partiql.org/faqs.html#why-do-you-choose-ion-to-extend-SQLs-
type-system-over)

[https://github.com/partiql/partiql-lang-
kotlin/blob/master/e...](https://github.com/partiql/partiql-lang-
kotlin/blob/master/examples/src/kotlin/org/partiql/examples/ParserExample.kt#L17-L18)

I looked pretty deeply into this, but failed a bit short of understanding what
they meant when "if your query engine supports PartiQL." Does that mean
writing a new DB that delegates incoming queries to PartiQL? Not sure.

Anyways, they use it in Quantum Ledger DB, and a few other internal projects:

[https://docs.aws.amazon.com/qldb/latest/developerguide/ql-
re...](https://docs.aws.amazon.com/qldb/latest/developerguide/ql-
reference.query.html)

So maybe that can give some more context around "what the hell is this, why
does it exist, how would you use it?"

~~~
ta1234567890
Given Postgres already supports document types, do you know what is the main
advantage of using AWS' instead?

~~~
nine_k
Does Postgres support column-oriented formats, like Parquet?

~~~
dragonwriter
> Does Postgres support column-oriented formats, like Parquet?

With an appropriate FDW, sure, and I'm pretty sure I've seen an FDW for
parquet specifically, as well as other columnar formats.

------
Scaevolus
Previously (2016, 165 comments):
[https://news.ycombinator.com/item?id=11546098](https://news.ycombinator.com/item?id=11546098)

~~~
k__
So it didn't really catch on?

------
willvarfar
How does this compare to protobuf, thrift, msgpack etc?

It’s roughly the same vintage as protobuf and thrift, from google and Facebook
respectively, so perhaps it’s just Amazon’s equivalent, which they just never
released as quick as the others did?

Obvious pros and cons, or yet another serialization format with no obvious
benefits over anything else?

~~~
b203
There are some painpoints that are being addressed:

1) timestamp : I have had issues with a round-tripping timestamp
representation quite a bit 2) decimal : currency is denoted in decimal rather
than float and shows the Amazon retail heritage. This is very useful. 3)
symbols : I've had cases where symbol table/dictionary would have made big
difference in serialized size

~~~
abiogenesis
I think using decimals (or arbitrary size integers) for currency is common
knowledge by now.

~~~
gwright
I don't know. It was common knowledge for me in college (as in it was taught
as part of the curriculum) but as far as I can tell in the intervening 30+
years that knowledge seems to have been lost and relearned many times over.

~~~
boulos
Terrifyingly, I discovered recently that Plaid’s API uses _floats_ instead of
decimal. For example, security prices:

[https://plaid.com/docs/#security-schema](https://plaid.com/docs/#security-
schema)

~~~
hchz
cash values should be represented in fixed precision to maintain the integrity
of the transaction and your book, while the prices for securities represent
something different.

In securities transactions, the quantity and quote are critical. You aren’t
buying securities from Plaid, right?

If you try to liquidate or resize based on the Plaid quote, your brokerage or
counterparty is going to provide a totally different quote, and one from a
system engineered to provide quotes aligned exactly to the market standards.

I don’t see the risk/terror.

------
danso
As others have already pointed out, this was released in 2016 and already
discussed on HN [0], and seemingly hasn't taken the world by storm since. But
just glancing at the amzn Github activity, and it looks like the docs and the
tooling [1] are recently and frequently updated (including a new CLI in Rust
[2])?

Can anyone currently at Amazon shed some light on how prevalent Ion is
internally?

[0]
[https://news.ycombinator.com/item?id=23922278](https://news.ycombinator.com/item?id=23922278)

[1] [https://github.com/amzn/ion-docs/commits/](https://github.com/amzn/ion-
docs/commits/)
[https://github.com/amzn?q=ion&type=&language=](https://github.com/amzn?q=ion&type=&language=)

[2] [https://github.com/amzn/ion-cli](https://github.com/amzn/ion-cli)

~~~
8v-0q_6-drhjp9x
I left Amazon a bit over a year ago, after being there seven years. It always
struck me as a combination of "not invented here" syndrome and a solution in
search of a problem. It has no real world benefits over JSON, the tooling is
limited, but you inevitably have to deal with some other team that regrets
choosing it and now it's their API. I'm so happy I never have to look at it
ever again, and seeing this post today is a real throwback to wasted
engineering effort. Just let out go, Amazon.

~~~
xxpor
It has a decimal type. That alone is reason enough for amazon to use it over
json.

~~~
goldenkey
You can easily encode a decimal type as binary data. Not a huge deal.

------
101008
Looks nice. I saw that there is no PHP implementation yet. Doing it and
publish it on Github would give me something, besides a "kudos" from Amazon? I
am not asking for a position at Amazon, but maybe an interview?

~~~
urda
PHP is a banned language internally Amazon and Amazon subsidiaries, so they
will not care.

... why am I getting downvoted for offering direct experience as an AMZN
engineer? Amazon InfoSec forbids PHP. See also:
[https://news.ycombinator.com/item?id=23030330](https://news.ycombinator.com/item?id=23030330)

~~~
bowmessage
untrue for client libraries, they would care.

[https://aws.amazon.com/sdk-for-php/](https://aws.amazon.com/sdk-for-php/)

~~~
urda
SDK != Corp Policy

You can't use PHP internally at Amazon. Downvotes and ignoring facts do not
suddenly make my factual comment "untrue".

See also:
[https://news.ycombinator.com/item?id=23030330](https://news.ycombinator.com/item?id=23030330)

~~~
trevor-e
Now I'm confused, are you saying Amazon uses Hack internally which compiles to
PHP? The Hack website doesn't have much info and I'm not familiar with it.
There's clearly an Amazon Github repo for an an AWS SDK written in PHP, but
you're adamant that Amazon does not use PHP at all. So which is it?

~~~
bowmessage
They vend an AWS SDK in PHP for their customers, but they never run any
internal software on PHP.

The AWS SDK in PHP helps generate web requests to Amazon's services, most
written in Java.

~~~
senderista
They ran the internal wiki on PHP while I was there. Don’t know if it’s been
upgraded since.

------
novok
Interesting they don't have a kotlin or swift version. Do their iOS clients
just communicate with plain json? Are they all secretly written in javascript?

~~~
moltar
iOS shopping app is a web app and it uses plain JSON.

------
bob1029
From the document:

> The following timestamp encoded as a JSON string requires 26 bytes

> ...

> This timestamp requires just 11 bytes when encoded in Ion binary

So, we just use JSON, and our solution to this problem has been to pass 64 bit
unix timestamps around. It doesn't provide arbitrary precision, but for most
use cases it is more than enough practical range & precision to get the job
done. And of course we store & transmit everything as UTC, so there is no
weirdness around needing to store additional timezone information. To give you
an idea, our database columns are named things like CreatedUnixTimestamp.

It is also trivial to compare 64-bit timestamps without conversion, so any SQL
storage of these as integers should yield massive speedups to queries against
these types - Assuming you are coming from some more complex datatype like a
string or byte array.

~~~
xyzzy_plugh
Pass 64-bit Unix timestamps around as JSON numbers? That's a bad idea, seeing
as they're 64-bit floats. You're better off formatting your 64-bit integers as
strings.

~~~
bob1029
53 bits of usable range is plenty for our purposes. Our serializer & database
are not hobbled by the limitations of javascript, so the representation is
only compromised as it is processed at the end client. This is not a concern
for us.

For reference, MAX_SAFE_INTEGER can represent something around the year
285428751.

------
leetrout
It’s not listed but there is a Go library

[https://github.com/amzn/ion-go](https://github.com/amzn/ion-go)

------
whoevercares
I’ll start use it when AWS adopt it :) This is only used in retail orgs... the
ecosystem is the biggest issue.

Edit: in Public API

~~~
dodobirdlord
For the public API, customers want JSON, so they get JSON. Internally there's
Coral, and something like Coral/Protobuf outright superior for the use case of
an API where a schema can be distributed in advance. The only real use case
for Ion is when you have data that's already JSON-formatted for whatever
reason and you want to compress it for storage or transit.

~~~
yftsui
Yep and Coral is also open sourced as AWS Smithy, it makes no sense to assume
AWS usage means anything or vice versa.

~~~
senderista
Coral being open sourced is huge! Why wasn’t this on HN?

------
erik_seaberg
I was hoping to see a UUID type, since so many people choose either unreadable
base64 or wasteful strings. It looks like
0x12341234_1234_1234_1234_123412341234 should convey the bits, but it won't
pprint or validate the way a dedicated type would. Ditto for IPv6 addresses.

------
atomicbeanie
I would say
[https://cognitect.com/blog/2014/7/22/transit](https://cognitect.com/blog/2014/7/22/transit)
is a better option, no?

------
roenxi
An interesting point - I browse with JavaScript disabled. The example at the
bottom of the page rendered for me without newlines, in a manner that meant
the thing rendered in a completely unparsable way due to comments like:

    
    
      // Field names
    

This experience has reminded me why JSON is such a great format.

And having a whinge while I'm writing, "superset of JSON" is basically false
advertising even though it is true; JSONs refusal to admit that line breaks
are a thing is a major feature. I don't care it if it is technically correct
and useful to some customers, if line breaks matter it is inappropriate to
talk about a format's relation to JSON because people will get the wrong idea.
The JSON brand is so strong because it is nigh-impossible to get wrong. This
format gets screwed up - eg, for people who don't like JS.

~~~
skywhopper
I think "superset" is a clear relationship. It means "legal JSON is legal
Ion", just like "legal JSON is legal YAML". I don't think it's inappropriate
to point that out. In fact, it's an excellent feature.

------
m1sta_
I really like the type::value pattern. It provides some attractive options for
embedded languages. If python allowed

    
    
        fun x:
          query = sql :: 
             select * from table
        

I'd be pretty happy.

~~~
BerislavLopac
It kind of does:

    
    
        class SQL(str):
            ...
    
        query = SQL("""
            select * from table
        """)

------
schappim
The base64 encoded text in the example is: 'To infinity... and beyond!'

------
awinter-py
> This binary format supports rapid skip-scanning of data to materialize only
> key values within Ion streams.

(1) awesome but (2) 'key values' is a confusing way to say this

------
user5994461
Lots of empty promises:

* int: arbitrary size integers

* decimal: arbitrary precision, base-10 encoded real numbers

* timestamp: arbitrary precision date / timestamps, with ISO 8601 format "2019-05-01T18:12:53.472-0800".

So exact same drawbacks as JSON basically:

* Large integers will be casted to 32 or 64 bits in most languages no matter what.

* Arbitrary decimal will be casted to float or double as well.

* ISO timestamps are not well specified when it comes to millisecond, microsecond and timezone.

~~~
lilyball
This is awfully negative. JSON explicitly does not declare the represented
range of floats or integers, and doesn’t have a distinct arbitrary-precision
decimal type. I haven’t read the Ion spec, only the description, but since
it’s advertising arbitrary precision, presumably any implementation that does
not support that is not a correct implementation at all.

~~~
user5994461
In practice that means having (or adding) support for arbitrary numbers and
decimals in the languages/platforms they want to cover. I am skeptic they
would do that in C for example.

~~~
lilyball
The C implementation bundles the ICU decNumber library for decimal numbers.

------
miohtama
What has bugged me a lot with JavaScript that it lacked standard presentation
of dates and decimals (like money), making it feel inferior for application
development. Happy to finally seeing this addressed on both JavaScript tself
and then also in serialisation formats.

(Though looks like Ion is not solely targeting JS, but I make an assumption it
is nice to consume Ion data in frontend)

~~~
imglorp
Nope, backend if anything. For example, their new QLDB product uses it to get
consistent hashing of documents on account of Ion being a canonical format.

------
dstaley
Fun fact: Ion is used heavily in KFX, the book format for Kindles.

------
paultopia
Why do we need more random serialization formats?

------
RedShift1
After switching as much as possible over to JSON or LZ4 compressed JSON, life
is good. Never going back to another serialization format.

------
monadic2
Why the name?

~~~
dodobirdlord
Ion is actually two formats, with Ion data having a canonical representation
both in binary and in human-readable text. The text format's file extension is
".ion" and the binary format's file extension is ".10n", and I think that's
the entire motivation.

------
hansdieter1337
The obligatory xkcd: [https://xkcd.com/927/](https://xkcd.com/927/)

~~~
galkk
If I recall correctly, Ion preceed even Google's protobuf, and is 20+ year old
technology. This isn't result of "yet another standard" but parallel evolution

------
NegativeLatency
embrace extend extinguish

also

[https://xkcd.com/927/](https://xkcd.com/927/)

------
phre4k
Oh no, the xckd.com/927 begins.

We had JSON5, now we have Ion. Google and Microsoft will probably run their
own, too, soon.

Why the IT community always forks their standards and never merges baffles me
since >20 years.

~~~
teej
Google has already protobuf for a long time.

