Hacker News new | past | comments | ask | show | jobs | submit login

In rpc/serialization systems, we often need to send namespace/path/filename/fieldName/packageName/moduleName/className/enumValue string between processes.

Those strings are mostly ascii strings. In order to transfer between processes, we encode such strings using utf-8 encodings. Such encoding will take one byte for every char, which is not space efficient actually.

If we take a deeper look, we will found that most chars are lowercase chars, ., $ and _, which can be expressed in a much smaller range 0~32. But one byte can represent range 0~255, the significant bits are wasted, and this cost is not ignorable. In a dynamic serialization framework, such meta will take considerable cost compared to actual data.

So we proposed a new string encoding which we called meta string encoding in Fury. It will encode most chars using 5 bits instead of 8 bits in utf-8 encoding, which can bring 37.5% space cost savings compared to utf-8 encoding.

For string can't be represented by 5 bits, we also proposed encoding using 6 bits which can bring 25% space cost savings




You don't include '/' in the list of special characters so how are you using this to encode paths? As an array of strings?


We can use `|` instead, `|` is not used in path. Currently we have only 2 chars can be extended. I haven't decided what to inlucde




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: