Hacker News new | past | comments | ask | show | jobs | submit | chaokunyang's comments login


See https://github.com/apache/fury for details about Apache Fury serialization framework


Apache Fury is a blazingly-fast serialization framework for jdk. Apache Fury use automatic codegen for object graph serialization at graalvm native image build time. It eliminate all reflection and introspection at runtime. In this way and its optimized serialization protocol, It runs up to 50x than graalvm jdk serialization. And user don't have to specify reflection config json file, no annotation are needed. It's easy to use and minimize user mind overhead.



JSON/Protobuf used a KV layout when serialization, it will write field names/types multiple times for multiple objects of same type. And the sparse layout is not friendly for CPU cache and compression.

We proposed a scoped meta packing share mode in Apache Fury 0.6.0 which can improves performance and space greatly.

With meta share, we can write field name&type meta of a struct only once for multiple objects of same type, which will save space and improve performance comparedto protobuf. And we can also encode the meta into binary in advance, and use one memory copy to write it which will be much faster.

In our test, for a list of numeric struct, Fury is 6x faster and 1/2 payload smaller than protobuf.


Pretty cool, multiple level IS is great. Does KQIR support predicate pushdown?


We've already support such dict encoding, we let users register class with an id. And write class by id, id will be encoded as a varint, which uses only 1~5 bytes. But not every users like the registration. Meta string encoding here is for cases where not mapping is available


Meta string is used only for ascii chars, so every char is in range of a byte. We can just random access them. But your reminder is good, I just find out that we didn't check the passed string are ascii string, although our case always pass ascii string


Glad to see H2 database engine benifits from this. I use H2 too.


Yes, we only use this encoding when it brings benifits. `wire format` is a good term to describe such cases, which the context are constrained to current message only. So the data for compression are small, and many statistical methods won't work since it's a `wire format`. The stats itself must be send to peers, which will have bigger space and cpu cost


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: