Hacker News new | past | comments | ask | show | jobs | submit login

There are tools for shrinking SVG that do some of what you say. One advantage of the text format is that you can reduce precision of the path data to a couple of significant figures. That would be hard to do with a binary format. You'd have to decide up front on 8 bytes or 16 bytes per coordinate pair.



Binary formats need not use fixed-length fields. You could use a format that stores small integers in 8 bits, yet allows for larger ones. See for example https://developers.google.com/protocol-buffers/docs/encoding....

You don’t even have to use byte-sized fields, but doing that makes encoding and decoding harder, and the overhead of also storing the actual lengths in bits of variable sized fields may be too much to make it worth that.


I think you missed my point. SVG coordinates are effectively arbitrary precision. 3.5 bits per byte of arbitrary precision, but arbitrary precision nonetheless. There are no small integers, only scalar or vector values.


varint style encoding is also arbitrary precision (although protobuf doesn't support arbitrary precision) It is trivial to get 7 bits per byte and you can do even better for large values by encoding the length upfront (this also makes decoding faster).


There's existing variable-length binary number formats [1] that already exist. SVG already has so many variable-length things in it that the downsides would probably be insignificant.

I don't know if any of the existing Protobuf-like things have these built in, though. I know some of them have variable integers in them but I don't know about variable floats.

This is kind of what I was getting at in that we have a lot of prior art now. Someone putting variable ints or floats into SVG no longer has to create their own bespoke format, and proceed to fall into the various traps themselves and accidentally write it forever into the spec; there's a lot more of these sorts of pieces that you can pick up off the shelf than there used to be.

[1] https://github.com/regexident/var_float


I don't see how var_float lets you detect the length of a number field from the input. It seems to be designed to take arbitrary input from another function (which has already decided how big the data is out-of-band).

Am I missing something, or can I not read 2 bytes from a file and detect if that's the end of the number or I need to read the next 2 bytes?


I dunno if that specific library has that feature, but it is certainly something that can be implemented. It's just a matter of programming.


yes, that is half of how utf8 works as an encoding: the first byte of a unit is either 0xxxxxxx or 11xxxxxx, in the first it is a single ascii byte and in the second case all the following bytes that start with 10xxxxxx are part of the same unit.

for this purpose it would be enough to revert the order of the stream.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: