
Tny: A simple data serializer in C - luu
https://github.com/BobMarlon/Tny/
======
ChuckMcM
So I wonder if the author read the XDR code. XDR, which is a lame acronym for
eXternal Data Representation was what Sun's remote procedure call library used
to encode (or 'marshall' in the vernacular) arbitrary data into something you
could send over a network and reliably unmarshal on the other side and get
back the same data structure.

The positives (like tny) were that the code was very straightforward and easy
to use. And one of the negatives were that it wasn't "self describing" (which
JSON is to some extent).

Of course the ultimate in binary self describing data is ASN.1 syntax :-) but
lets not go there.

~~~
huhtenberg
Serialization formats are dime a dozen. Cooking up your own variation of TLV
(+ TV for fixed-size T's) is dead easy and can typically be done a spot by
anyone without any prior experience.

As such I doubt "the author read the XDR code", though perhaps you weren't
wondering _that_ at all, but rather wanted to mention XDR and ASN.1, in which
case you should've really just done that. As it's written, it makes "the
author" look like either an ignorant imbecile who doesn't know such basics as
XDR or as someone who piggy-backed on XDR without acknowledging it.

------
codehero
I think it's interesting and I get that it's a simple data parser, but if I am
using C I prefer performance over simplicity. The Tny structs and their string
values are individually malloc'ed, worsening the performance, so allocation
and deletion is a slow process. Of course he's not the only one to do this,
yajl2 also makes liberal use of malloc and free when building its node tree.
One good thing to say is at least his encoding format distinguishes floats
from integers and accounts for NaN and Infinity values.

A first step to improving this library would be to add a SAX style parser so I
could build my own representation.

~~~
Demiurge
How much difference does sporadic malloc make these days? What is the loss in
performance compared to the overhead of virtual memory, randomization and
whatnot, assuming OS is also trying to optimize?

~~~
dxhdr
A large difference. Random mallocs in library calls completely defeat the
purpose of using C in the first place. Not only for performance but for
fragmentation, especially on embedded devices. It makes it impossible to use
my own memory arenas and instead takes the lazy route of allocating from the
global heap. Look at Jsmn for an example of how to do this sort of thing
beautifully and cleanly with no global side effects.

~~~
Demiurge
What do you mean by 'own memory arena' and what is the performance (or other)
difference between that and global heap?

~~~
vidarh
Consider that you _know_ that certain data is always de-allocated together. An
example is when a document is opened and then closed. In that case you can
make a dramatic amount of difference with arenas:

Often you can avoid almost any memory overhead. E.g. any structures that are
allocated on opening a document, and deallocated on closure, can be allocated
from larger buffers without keeping any information about the individual
allocations. That can save anything up to 16-20 bytes per allocation with many
malloc() implementations, and reduces typical allocation cost to incrementing
a pointer and checking whether or not you need to allocate a new buffer (and
the allocation cost for new buffers might be amortised over anything up to
thousands of small allocations).

For a practical example, there's a font library that calls malloc() to
allocate structures for every single glyph when opening a font. Most of the
(thousands) of allocations are 4-8 bytes. Changing that to using an arena for
an application I did, cut memory usage per font to about 25%-30% and cut load
time for fonts to <10% by avoiding the malloc() calls.

You can achieve the same by throwing abstractions out the window and putting
stuff in arrays etc. But arenas is often a very effective way of keeping the
abstractions while effectively getting almost the same performance and memory
usage.

~~~
Demiurge
Ok, thanks for the example! I'm surprised it makes such a difference.

------
zenocon
I've had great success using protobuf-c
[https://code.google.com/p/protobuf-c/](https://code.google.com/p/protobuf-c/)

~~~
mpetrov
I also used Protocol Buffers in multiple projects and languages. One huge
advantage was that I could serialize data in Objective-C on the iPad, stream
it to a C++ server on a desktop, and then forward some of it to a Ruby plugin
in a different application. The performance was great and the libraries in
each language made it seamless to use the same data structures across the
board.

------
coherentpony
Have you considered const-correctness[1]?

[1] [https://en.wikipedia.org/wiki/Const-
correctness](https://en.wikipedia.org/wiki/Const-correctness)

------
lsb
Also interesting: MessagePack. It's more concise than BSON.
[http://msgpack.org](http://msgpack.org)

~~~
stevelaz
Glad to see someone mention this! I've used msgpack in Lua apps and it's been
great so far.

------
deletes
I have no experience with Web programing, but i know C and low level stuff
pretty well, would anyone mind to elaborate what does Tny do?

~~~
hellcow
JSON is a key/value data structure for JavaScript, but it's useful across
languages due to the number of libraries. It typically looks like:

    
    
      {
        "hello": "world"
      }
    

Tny converts this JSON into a (usually, but not always) smaller form that's
faster to traverse and encode/decode. Tny's alternative, BSON, is being used
by MongoDB [1], so that might be a good place to learn more.

[1] [http://www.mongodb.org/](http://www.mongodb.org/)

------
waffenklang
didnt read the whole comments; so sorry, if duplicate.

After a short review of the add function for handling of complex structs i
found a possible bug as they miss to perform a deep copy of objects containing
references to other objects. they would only copy the value of the pointers of
a struct like that: struct { obj* ptr; } and not the content of ptr.

------
aidenn0
there seem to be a million things like this; what makes it better than bson or
tnetstrings?

------
kernelcurry
this is cool! i can't wait to play with it in one of my own applications!
thanks

------
vikas0380
anybody know any similar library or project for c++?

~~~
akandiah
Google's Protocol Buffers?
[http://code.google.com/p/protobuf/](http://code.google.com/p/protobuf/)

------
gpsarakis
Are there any speed & size comparisons between Tny and JSON?

~~~
huhtenberg
JSON is not a TLV format, so you can't skip over records of an unknown type
without reading them in full. So JSON is generally slower.

