Hacker News new | past | comments | ask | show | jobs | submit login
JSON for Modern C++ (github.com)
189 points by gballan 18 days ago | hide | past | web | favorite | 121 comments



I don't understand the JSON obsession. JSON as any other file format should be a small detail in any application and require very little plumbing code.

In every application, any dependency to JSON should be minimized, contained and preferably eliminated.


I tried to suppress the urge to release my frustration, but seeing that the only slightly critical comment in this thread is the downvoted one let me forget my good intentions. WTF. 10s of thousands of line of code, > 10K lines in include files (yay compile times!) for a task that should be only a side concern and should be straightforward to implement. JSON has how many? 2? data types, and writing super efficient parser primitives for it must be easily possible to do in <1K lines of implementation and <50 lines of headers. If there are special requirements in performance or type safety / structural integrity, JSON is far from the ideal choice. And even if one is required to use it under such circumstances due to political issues (in which case, sorry), that doesn't mean that all the other >99% of applications call for a massively oversized library like this. (Update, upon closer inspection I'm not even sure that speed is the main concern of this library. It really looks like it tries hard to be "Modern C++" by implementing all sorts of compile time insanity, bells and whistles (which will never fit 100% with clients and will be impossible to adapt)).


Recently had a discussion with a colleague about JSON in C++. It's actually a big issue, because reflection isn't really supported in C++. It's not just a blocker for parsing objects as JSON, it's a fundamental limitation for being able to map objects to any kind of format in a general fashion without creating initializers for every object. I actually have no idea what kind of black magic they're doing to achieve this, but it sounds pretty admirable to me.


Reflection isn't required; keys in JSON are strings, and there are basic data types that are supported (strings, numbers, booleans, arrays, and dictionaries which are more of the same).

What's wrong with writing "initializers" which are serializers/deserializers? If you're looking for automatic file format to C++ class object, why settle for JSON (whether it's this library or JSONCpp) why not use Thrift or Protocol Buffers?

- Thrift: https://thrift.apache.org/ - Protocol Buffers: https://developers.google.com/protocol-buffers/


>why settle for JSON

Because I'm not writing code in a vacuum, and JSON is what everyone else is using. JSON is chosen for simplicity & interoperability.


This exactly, sometimes you have to use something not because it is the best option, but because it is what your team is using, or because the ease of use is worth the performance drawbacks.


See also Mapry [0], a code generator we particularly made for JSON.

[0] https://github.com/Parquery/mapry


Why do you need reflection for a message format that requires objects (use hash map) and lists (array or linked list).


I think because looking up in a hash map is going to be bad for performance. You want it in an object so you can use fixed offsets. And using individual hash maps for each object is wasteful if they all have the same layout.


If performance is a concern, why on earth are you using JSON. Just the absurd amount of string-integer conversions kill any hope of that.


> If performance is a concern, why on earth are you using JSON.

Maybe JSON is what you get sent and you've got no control over that? Or JSON is what you want to expose and you've got no control over that either? You can want to achieve reasonable performance while still meeting external requirements.


If you're receiving JSON from outside of your system then the performance talk about converting to objects with fixed offsets and hashmaps with the same layout goes out the window, as you don't control the format.


You can meet an expected format in a fast path, and have a slow path fallback where it doesn’t meet and expected format.


You shouldn't take major compromises to improve a 90 to a 100, if you could get a 1000 by simply choosing the right approach (like using binary when performance really matters).


JSON is effectively a data interchange format. And for data interchange formats, there are reasons to emit property names (for example, because you want humans to be able to read, and maybe even write it). The fact that you want to use JSON as your serialization format shouldn't require you to make your internal data formats use hash tables instead of structs.


I think worries about reflection are overthinking the problem for the most part.

This has carried me for 99% of all use cases along with a try/catch that barfs and an array specialization that makes a vector:

template<typename T> T get(const char* key) { return boost::lexical_cast<T>( json.GetObject(key) ); }

The reality is that if I see some value of a type I wasn't expecting, something probably went wrong or changed in the protocol anyways.


Parsing is simple, you just need to be explicit about what you expect to parse. Like, write 10 lines of straightforward parse function (or maybe better, data description + your own code that you use for all your types) that fills your object's fields. Alternatively, if you have a super structured approach about your data (like you're required to have with template metaprogramming) then you can easily generate this stuff. Like, using a usable scripting language or such.

The idea that types can replace code is just broken. The type system is not a programming language. Or at least, as proven by C++/Haskell/..., not a remotely usable one.


> Parsing is simple

It's obviously not simple - no two parsers have the same behaviour!

http://seriot.ch/parsing_json.php


Tip, if you want to defeat an argument by quoting only three words, make sure you're talking about the same thing.

To be clear, I was not talking about lexing JSON, but about how to configure a "meta-parser" like this library to actually convert some JSON to concrete application data objects. The point of contention was, "should we really use such a massive library or mustn't it be sufficient to have a simple bag of parsing primitives (which we can use to actually build the right thing)".

And I'm not a fan of JSON either...


Parsing JSON is not simple - not even when you know the structure you're parsing against. For example that reference shows even parsing numbers is not the same everywhere, so that applies even when the structure is as expected.

> Tip, if you want to defeat an argument by quoting only three words, make sure you're talking about the same thing.

Please don't be snarky - if you disagree say so and why.


> if you disagree say so and why.

My apologies. Just did that - I'm habitually a post-then-improve type of commenter.


WTF indeed. Comment made me check out the source code.. I could only find some headerfiles which a) have the complete implementation in it and b) of course include all dependencies.

This is the exact nonsense that is happening all over the place, and why we need 512GB ram in workstations to compile something.

I can't help but think that some basic knowledge about libraries is missing.


I can tell this is a C++ thread, heaven forbid you touch anything new within the last 20 years...

The JSON obsession is that everything under the moon supports it- and it's a nice easy structure which can be converted to a dynamic or statically typed object with no issues. Plus it's very human readable, so it's great for debugging.

If you sit there and ONLY use C++, ok why bother? But many systems need to talk to different systems these days. I can setup a webapi that can communicate with my angular typescript, C#, matlab all through nice easy JSON. Each language just converts it to the object and I'm good to go.

The question is why are you so against such a widely accepted format of communication?


Developers obsess about JSON for a simple reason, JSON works natively with Javascript. If you have a backend service written in C++, then communicating over JSON is easily consumed by nodejs and the browser.

Of course there are other data transport protocols, XML and ProtoBuf to name a couple off the top of my head, but those are extremely heavyweight.

If you want to use JSON for config files, again, they are easy to make human readable and parse. Grab an off the shelf parser and support whatever scenario users may need. Sure we have yaml and ini, or whatever key=value config format your project creates, but you'll be much more on your own with how deep you want to support various value types.

As far as your complaint or desire for JSON to be minimized or eliminated as a philosophy, I'm not sure why this is the top comment about a very well made and loved C++ library to parse JSON. Are you hoping your comment will change people's minds about what library to use or whether their system needs JSON at all? I don't really understand your goal.


Json is easier for humans to read and modify, and it's more straightforward to work with in languages like python where you don't have to declare types. Json c++ implimentation is large probably because of trying to provide one library that does all the dynamic things json can do, but in a static language. It also has to be safe to parse, so there's that.


For config files at least, I've moved entirely to ini format. Python's configparser is handy for machine generating complicated configs and the nesting/list support in JSON adds way more complexity than I need compared to just using comma separated lists as values.

This is my entire ini parser for making a std::map<std::string, std::string>. I cast/parse the values at the use site:

    std::regex section_test("\\[(.*?)\\]");                                                                             
    std::regex value_test("([\\w\\.]+)\\s*=\\s*([^\\+]+(?!\\+{3}))");

    while(-1 != getline((char**)&buf, &len, file))                                                                      
    {                                                                                                                   
        std::string line(buf);                                                                                          
        trim(line);                                                                                                     
        if(!line.size() || line[0] == '#') continue;

        std::smatch match;                                                                                              
        if(std::regex_search(line, match, section_test))                                                                
            current_section = match[1].str();                                                                           
        else if(std::regex_search(line, match, value_test))                                                             
        {                                                                                                               
            std::string key(match[1].str());                                                                            
            std::string value(match[2].str());                                                                          
            info("INIReader: %s::%s = %s\n",                                                                            
                    current_section.c_str(), key.c_str(), value.c_str());                                               
            config[current_section][key] = value;                                                                       
        }                                                                                                               
    }


Our project is using something like that - it started nice and simple, but got lots of ugly hacks now. We have some values which use two different sub-delimiters, multiple “mini-formats”, values which contain the name of other keys, magic atrings to represent “null”, and so on.

We got used to this, but results are pretty ugly. We get people confused about delimiter order, broken copy-paste, and so on. For the next project, I’d recommend something which can support complex structures. A schema validation would be nice, too.


This has to pull in std::regex though which is pretty heavy


std::regex is part of c++11 anyways so there's nothing to integrate with your build system.

I put this code in a standalone cpp file so that it compiles to its own object in parallel with the rest of the system. I've never had issues with it with regards to build times.


Yes but JSON what everybody knows. I have seen a presentation about a not too small company where the engineer explained how much hurdle is to have JSON everywhere in their big data platform. I asked him why they do not use something like Avro instead. He just gave me this weird look like I am asking something extremely stupid and could not come up with any reason. Some engineers just have no idea of the alternatives or the pros and cons of chosing different file/message formats.


Well I work on embedded systems that have embedded webservers (think your router's webpage). And what data format is easiest to work with when dealing with webpages and browsers? JSON.

Hence, our embedded C++ backend can now easily accommodate and return JSON to the front end using this library.

Plus, what data format do you propose for things like configuration files, etc? No matter what format you choose, you are going to need a parser of that format for C/C++.


I think TOML is better, maybe with some modifications. I dislike that it has a person's name in it though; maybe we retcon it to Text Object Markup Language.

TOML has inline and multiline forms for most objects.


but in this context TOML and JSON are almost isomorphic (ok, parsing JSON is more unreliable than many realize apparently). As far as I can see the problem is that it is difficult to convert arbitrary JSON object to C++ objects. TOML (I also like it, for different reasons) isn't gonna help with that.


Wait, everyone is complaining you can't do `object.member.submember`? I don't see the issue with object['member']['sub']. It is slightly more typing, but such is life.


I do not know about everyone, but it I believe it is hard to convert a flexible JSON object into a C++ struct (or something with comparable performance), my understanding is that this library helps with that.


It's better to do object.at("member").at("sub") so that it throws an exception instead of seg faulting if one of the keys doesn't exist


Sure, I was kind of envisioning overloading []. It did make me think though: perhaps it is best to have an embedded interpreter for accessing member items? It is what gcrypt does, actually -- embedded lisp syntax. I always wondered why and I think now I know.


> In every application, any dependency to JSON should be minimized, contained and preferably eliminated.

how do you suggest interaction with $standardised_protocol which uses JSON in that case ?


JSON is just a transport. Get the payload off from library structures to your own data structures ASAP, to remove dependencies, and to profit from your own setup.

And how to make the transformation? Using your own setup. No point in expressing (duplicating!) your own data structure definitions in a random library's DDL (which, apart from the inflicted duplication, will not fit very well since it doesn't know your project).


But this advice only really applies to projects where you create every service and layer. If you want to incorporate a 3rd party service, or want to allow your service to be used by others, you need some sort of common format. JSON isn't the most efficient, but it gets a lot of exposure from being a first class citizen of the web. Msgpack is good if you want efficiency. But you need something if you want your program to communicate with others.


> Get the payload off from library structures to your own data structures ASAP,

I don't understand, doing this step is exactly why you need to have a library such as the one which is being discussed here. There's no in-language way in C++ to go from a string that looks like `{ "foo": [1,2,3] }` to `struct { int foo[3]; }`.


IMO we should not have a multi-dozens-KLOC library for something that should be almost invisible. And I don't think that being invisible is the purpose of this library.

I would just specify your case as a data item:

    ARRAY_ITEM("foo", foo, INT, 3)
And that's really all, specification-wise. Here ARRAY_ITEM is a simple macro that calculates the offset of the array within the struct (need to give the struct a name!), so that results maybe in { .srcfield="foo", .typekind=TYPE_ARRAY, .arraylength=3, .basetype=INT, offset=offsetof(MyStruct, foo) } or something like that. One can compute the length of the C array at compile time and verify that the lengths match, so there will be absolutely no danger of making a wrong data item.

The corresponding code that handles the data items of kind ARRAY_ITEM would be something like this:

    case TYPE_ARRAY: {
        char *store = add_offset(&myStruct, spec->offset);
        int elemSize = typeKindToElementSize[spec->typekind];
        JsonArray *jsonArray = get_json_array(jsonObject, spec->srcfield);
        for (int i = 0; i < spec->arraylength; i++) {
            JsonValue *jsonValue = get_json_array_element(jsonArray, i);
            /* This makes sure the dynamic type of jsonValue matches, then
            does the appropriate conversion and stores the thing. We also
            want to check errors, either now or at the end. */
            store_json_value(jsonValue, spec->basetype, store + (i * elemSize));
        }
    }
You'll need <10, maybe 5, of these cases, and basically it's all code that you will test once and then it just works without surprises. This is less than a day of work initially, will have very low maintenance, will work exactly with the type representations you choose and be easy to change. The other aspect is you're now free from a library that you don't understand, will give you Modern C++ headaches, and will be slow to compile.

I'm likely to write a new version of this for every new project, to avoid dependencies and to allow for change.


well, that's a possibility but only works in the simplest case of C-like structs. What if I have this API instead :

    struct MyWidget {
        void setFoos(const std::vector<int>& foos) { m_foos = foos; updateUI(); }
    };
or this :

    struct MyWidget {
        void addFooInstance(int);
    };
Also, ugh. macros. what happens if you talk to 5 different network protocols all with more-or-less-JSON-like semantics - do you now have ARRAY_ITEM_JSON, ARRAY_ITEM_BSON, ARRAY_ITEM_CBOR, ARRAY_ITEM_YAML ? What when the format changes so that `ARRAY_ITEM("foo", foo, INT, 3)` now wants floats ? woohoo, magic truncation instead of a compile error.

As they say : thanks but no thanks, I'll stay with `for (auto& [key, value] : o.items())` which ensures that everyone including the fresh-out-of-school student can understand what happens without needing to read obtuse macro definitions


> What if I have this API instead

I say let data be data and write code where you need code. How does the library handle this in one step? How long does it take you to find out?

> Also, ugh. macros.

Data macros are the best. You don't have to debug them at runtime, they let you get rid of a lot of boilerplate such as compile-time computations (e.g. offsetof(), sizeof())

> do you now have ARRAY_ITEM_JSON, ARRAY_ITEM_BSON, ARRAY_ITEM_CBOR, ARRAY_ITEM_YAML

I'm really sorry if you have to do that. I never had, and likely never will. But I'd probably just write the example I gave, in 5 flavours, in 5 separate implementation files. The alternative, dealing with 5 oversized libraries that approach this thing in a totally different way, is not appealing to me at all.

> What when the format changes so that `ARRAY_ITEM("foo", foo, INT, 3)` now wants floats ? woohoo, magic truncation instead of a compile error.

No, runtime error message about a wrong type in a JSON payload. Or if you mean this: the type of the C array's elements changed from int to float. Then just check the type of the array elements against the specified type (INT) which should expect a specific C type. There are easy ways to get a representation of the array element type in C++ as well as an in C (the macro can do it automatically).

> obtuse macro definitions

#define ARRAY_ITEM(_srcfield, _arraylength, _basetype, _dstfield) { .srcfield=_srcfield, .typekind=TYPE_ARRAY, .arraylength=_arraylength, .basetype=_basetype, .offset=offsetof(MyStruct, _dstfield) }

If you think that is obtuse reconsider C++ templates.


When implementing that protocol, dependency to JSON should be minimized, contained and preferably eliminated.


Yep this is why I like JSON Cpp, there's a single-file build and I've had success using it with Repl.it: https://neverfriday.com/2013/07/26/learning-jsoncpp/

Since JSON only has basic data types, it's easy to just parse and retrieve objects, similar to using XML tree parsing libraries back in the day.


Then you should be enlightened:

Of the myriads of existing transfer or serialization formats,

* JSON is still the only secure by default one (if you ignore the two later updates, which made it insecure),

* JSON is by far the easiest to parse (small and secure, no references),

* is natively supported by Javascript.

It has some minor design mistakes, and is not binary (such as msgpack, which is therefore faster), but still is a sane and proper default. Esp. it does not support objects and other extensions, with its initializer problems on MITM attacks, and is properly ended. Unlike XML, YAML, BJSON, BSON, ... which do have major problems, or msgpack, protobuf, ... which do have minor problems.

In every application, any dependency to XML should be minimized, contained and preferably eliminated.


> JSON is still the only secure by default one (if you ignore the two later updates, which made it insecure),

And enough people put it into eval that several companies started prepending while(1); to their JSON messages. Don't blindly trust user input no matter what format it comes in.

> JSON is by far the easiest to parse (small and secure, no references),

We all know its definition fits on a business card, which is the reason we have at least seven different specifications[1] on the details omitted from the card.

[1]http://seriot.ch/parsing_json.php


> And enough people put it into eval that several companies started prepending while(1); to their JSON messages.

while(1); existed to work around a browser vulnerability where the same-origin policy could be bypassed by including cross-origin JSON as a <script>. Nothing to do with eval.


> JSON is still the only secure by default one (if you ignore the two later updates, which made it

How can a data format be insecure?

> JSON is by far the easiest to parse (small and secure, no references),

It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be). But it also offers no structural integrity, which means that you need to augment the parsing code with terribly ugly validation code. (I don't really use JSON frequently, but the attempts at providing a principled validation framework on top of JSON that I've seen were... unsatisfying).

> In every application, any dependency to XML should be minimized, contained and preferably eliminated.

That's missing the point. It's not about JSON vs XML vs whatever. It's that application logic shouldn't be operating on transport data formats. And even if JSON has a simple representation as runtime data objects (in most scripting languages), that representation is pretty far from ideal compared to a representation tailored towards your specific application logic.


> How can a data format be insecure?

If it tries to be too featureful, then it requires parsers to be very flexible. Malicious input can trick a too-flexible parser into doing things that the developer of whatever function called the parser probably didn't want to happen.

For example, YAML:

https://arp242.net/yaml-config.html


I have this overview:

https://metacpan.org/pod/Cpanel::JSON::XS#SECURITY-CONSIDERA...

which just misses details on stack-overflows on overlarge nesting levels, or denial of service attacks on overlarge strings, arrays or maps.

Better formats which prepended sizes do have an advantage here, such as msgpack. But msgpack has no CRC or digest verification to detect missing or cut-off tails. JSON (its secure 1st RFC 4627) must be properly nested, it does not need this. From the 2nd RFC 7159 on it became insecure, and the 3rd RFC 8259 is merely a joke, as it didn't fix the known issues, only removed a harmless feature.


Could you elaborate on how are the 2nd and 3rd versions insecure and a joke? I've reread them and see no issues with either. Basically apart from clarifications and fluff about limits, security, and interoperability the only differences in the JSON spec itself are allowing any value at the top level and requiring UTF-8 for cross-system exchange.

Since UTF-8 is the only sensible format for JSON it makes little sense to require UTF-16 and UTF-32 support. ( unless you have some special requirements on encoding, in which case you can just disregard that part and convert it on both ends yourself )

The only "issue" with non-object values I see is the one mentioned in the above link where naively concatenating JSON might lead to errors when you send two consecutive numbers but that's going to rarely happen so your system can just reject top-level numbers if it doesn't expect them. And even then the simple solution is to just add whitespace around it.


The 2nd version made it insecure, as scalars are not delimited anymore, and MITM or version mismatches can change the value.

schmorp wrote this:

> For example, imagine you have two banks communicating, and on one side, the JSON coder gets upgraded. Two messages, such as 10 and 1000 might then be confused to mean 101000, something that couldn't happen in the original JSON, because neither of these messages would be valid JSON.

> If one side accepts these messages, then an upgrade in the coder on either side could result in this becoming exploitable.

The 3rd version was a joke, because the outstanding problems were not addressed at all, and removing BOM support for the 4 other encodings is just a joke. First, you cannot remove a feature once you explicitly allowed it, esp. since it's a minor and an almost unused one.

And remember that http://seriot.ch/parsing_json.php was already published then, and the most egregious spec omissions had been known for years already. such as undefined order of keys, or undefined if duplicate keys are allowed. Allowing unsorted keys is also a minor security risk, as it exposes the internal hash order, which can lead to hash seed calculation.


> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

Wrong. CSV is horrible to parse with its string quoting rules. JSON accepts only utf8 and is not misleaded by \, nor ",". Done both, JSON is much simpler and can represent nested structures, arrays or maps. CSV importers are mostly broken, and should be left to Excel folks only.

> It's that application logic shouldn't be operating on transport data formats

Nobody is talking about application logic here but you. JSON is one of the best transport data formats, and this library makes it much easier to encode/decode from/to JSON and C++.

We are not talking about javascript's hack to prefer JSON over full objects. Of course is an internal representation always better than an external, esp. such a simplified one. But for transports simple ones are preferred over complicated serialized ones, because then you can easily do MITM stack-overflows or abusing side effects on creating arbitrary objects.


Exactly this.

If you didn't do a compilers class and you want a simple language to play around with for lexing/parsing, JSON works great. Here's the core of a probably correct JSON lexer, albeit a super inefficient one, that I whipped up in <110 lines of Python a few years ago out of curiosity[1].

By comparison, check out the state machine transitions at the heart of the cpython implementation of the csv module[2]. It's not really a fair comparison (my JSON lexer is written in Python and uses regexes, the csv parser is written in C and does not use regexes) but even ignoring how nicely the csv parser handles different csv dialects, I still find it strictly more complex.

[1]: https://github.com/chucksmash/jsonish/blob/master/jsonish/to...

[2]: https://github.com/python/cpython/blob/41c57b335330ff48af098...


You misread me as well. I'm not saying that CSV is a good format. It's not, because it is ill-specified.

All I'm saying is that flat database tuples are even easier to parse than JSON (which is nested, so requires a runtime stack). It was a total side note (in parentheses!).

My main argument is that JSON is a mess to validate.


>> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

> Wrong. CSV is horrible to parse with its string quoting rules.

I made a reasonable attempt at preventing exactly this misunderstanding, but I guess if people want to misread you they will.

> Nobody is talking about application logic here but you.

The application logic (I'm including internal data representation here) is the meat of all efforts, so you should absolutely be considering it, instead of needlessly putting isolated efforts in optimization of side actors. Parsing JSON should IMHO be a total side concern, and using an oversized library like the one we're talking about fails to acknowledge that. Such a library is either a wasted effort (if its data types aren't used throughout the application), or else (if they are) use of the library leads to massive problems down the road.


CSV as a format doesn't really exist. CSV is a family of similar but not always compatible data formats each with their own special rules and edge cases.

Note that the quote is talking about a well-specified CSV, not any CSV in general. A well-specified CSV would indeed be fairly easy to parse.


> CSV as a format doesn't really exist.

RFC-4180


That was created after how many years of CSV in the wild? Nobody disagrees here that parsing CSV in practice is a horrible minefield with lots of manual adjustments.


RFC-4180 is dated 2005 - so your statement that a standard "doesn't exist" has been out of date for 14 years.

Yes of course there was no recognised standard before that. Just like before Greenwich Meantime there was no recognised standard for universal time coordination ...


It's not my statement, and also please let's not split hairs but look at the actual situation in practice. (Also, RFC-4180 sucks. It only codifies a subset of existing - bad - practice).


Apologies for the misattribution.

The situation in practice is that when people want a “standard” way to do CSV, there is in fact a standard they can use, that does cover most sensible things you’ll want to do with CSV, and addresses the most common corner cases (eg delimeter in field) in a fairly sensible way.

You are free yet to make whatever proprietary extensions or otherwise, at the risk of losing compatibility just as you are with any other standard.


> How can a data format be insecure?

XML external entities allow for arbitrary file inclusion: https://en.wikipedia.org/wiki/XML_external_entity_attack

You can make a badly configured XML parser allocate memory until it crashes: https://en.wikipedia.org/wiki/Billion_laughs_attack

You cannot host user-generated xmls on a domain without making yourself vulnerable to cross site scripting attacks. Browsers will happily execute any javascript you include: https://stackoverflow.com/questions/384639/how-to-include-ja...


> And even if JSON has a simple representation as runtime data objects (in most scripting languages), that representation is pretty far from ideal compared to a representation tailored towards your specific application logic.

Often you have control over the sending and receiving parts of an API and can design the transport format to fit your application logic.

For example, I am currently using a GraphQL API and the data structures I get from this API are exactly what I need in my application.


> In every application, any dependency to XML should be minimized, contained and preferably eliminated.

Of course, it's not limited to JSON. For applications, it should be a tiny insignificant detail what format is used for serialization. Be it JSON or some superior format, such as XML.


> In every application, any dependency to JSON should be minimized, contained and preferably eliminated.

It depends on the extant to which your program needs to communicate/interoperate with other programs...in particular ones without the same programming language (no Java RMI).

Under a "Unix" philosophy, this happens quite a bit.


How does this compare with RapidJSON, JSONCpp and JSON Spirit - other popular C++ JSON parser libraries?

Links:

http://rapidjson.org/ https://github.com/Tencent/rapidjson

https://github.com/open-source-parsers/jsoncpp

https://www.codeproject.com/Articles/20027/JSON-Spirit-A-C-J...


We used to use. We first micro-benchmarked all those and we found that Tencent/rapidjson was more than 100% faster than nlohmann/json :)

Our primary use is to parse http://cocodataset.org/ metadata files and RapidJson is 100% faster.


RapidJson is very efficient. I have used several JSON libraries for C++, and right now I'm really enjoying json_dto, which is built on RapidJson. It has great developer UX, including the most succint syntax I've used, and support for std::optional.

https://github.com/Stiffstream/json_dto


> 100% faster.

So, twice as fast - it parses the dataset in half the time?


On the other hand, nlohmann/json has a cleaner and more Python-like API, so if you don't care about performance that much, I'd say it's the way to go


Does anyone write C++ and not care about performance?


We use C++ and care about real time performance, but also use this JSON library, the thing is that we use JSON for loading and configuring data which happens during load time which is infrequent and don’t need JSON during the critical update path.

Previously we used rapid json which is faster but the syntax of the nlohmann json is nicer and it is much quicker to develop with.. we also use python a lot so the familiarity between the 2 is also useful, some of the other C++ json libs can have quite convoluted api’s for dealing with values, objects and arrays.


On the other hand: I'm a bit surprised at how many people are writing JSON and care about the performance.

The overwhelming majority of our IO is reading / writing data files, but those are stored in an optimized binary format. The configuration / metadata accounts for a much smaller fraction of IO. For this tiny fraction we care about flexibility and readability, a slow JSON parser is fine.


While I agree that in many cases JSON parsing is not the largest consumer of resources, it really sucks when it is.

At some point it seems like a general mindset shifted from making things efficient at every level to assuming things don’t matter if you’re probably doing something worse anyway.


If you use C++ for CPU-intensive tasks, then yes, in this case. E.g. reading 1kb of input, churning numbers for 10 minutes and outputting an h5 file. If you use C++ for IO-intensive tasks, then your tradeoff is probably different and you would optimise your IO for speed and not readability.


If you know you only get small JSON files, e.g. for configuration and so on, then - yes, you might not care about the parser performance and be more interested in the elegance of its API.


This is not a Boolean thing. If I spend a "long" time in processing the data in my business logic the time spent to load the data or storing the result can be neglectable.

If reading/writing is the hot path things are different, though.


First of all absolutely, but more importantly who cares about the performance of your json parsing? I can only think of very few applications where that would be relevant at all, even if you care about performance in general.


You dont necerrarily need to care for everything to be performant.


If that's a configuration file which is read once - yes


This is a very convenient library for getting started. It even has a mode that made it possible for me to parse numbers into a decimal class.

I would point out that its convenience in large part depends on extensive use of cast operators, which can lead to some tricky corner cases. It is also the only component in one of my projects that confuses xlclang++ on AIX.


I switched from rapidjson to this library (nlohmann's) because it is so simple to use and integrates into C++ very cleanly.

If I had enormous globs of JSON to read I might use something faster (I believe there's a very fast library that uses SIMD instructions) but my needs are quite limited: basically exchanging snippets of JSON with a network service. So clarity was far more important than performance.


My only mild complaint with this particular JSON library is that it's fairly easy to shoot yourself in the foot with the compile times if you don't explicitly ensure that you use the json_fwd.hpp header in your headers.


Could you elaborate on how to speed up compile times? We use this library extensively and I've never heard of "json_fwd.hpp" until now.


Kudos on the very thorough readme, complete with CMake instructions, most of the time people take it for granted that you know how to embed libraries and dependencies.

Writing documentation, and easing users into your project, is an underrated skill.


The fact that instructions are even required shows how poor the state of c++ packaging is.


No, is not. The instructions are for users of the lib that are not using a package manager. If you are using one, in my case vcpkg, then no special instructions are required.


Actually, a detailed documentation belongs in a Wiki or a separate markdown document, that README.md is a bit much...


The wiki is a GitHub thing and will not be included in your git clone.

I personally prefer a doc/ folder with RST files in it. That's what the Linux kernel does.


Wikis on GitHub are themselves git repos and can be cloned separately: https://help.github.com/en/articles/adding-or-editing-wiki-p...


Yeah, but they have to be cloned separately. And not all projects have them.

A doc directory gets you the same thing, easier.


Makes me wonder whether people have tried making the wiki repository a git submodule/subtree of the code repository.


This is a solid project and I’m using it in many different projects for years. The developer team is responsive and care about the quality of the project as well as people’s need. I can not recommend it enough.


Yea, I replaced cJSON with this library last year on one of my personal projects and never looked back. Good library. I came to this thread thinking, "Oh, did someone find something nicer than nlohmann json? Now I'm going to have to take some time to investigate it!" Turns out it's the one I've been using.


Note that this library basically requires exceptions to be enabled, which may or may not be the case in your environment.

(It is actually possible to use without, but then exceptions become aborts, and you have to dance around that.)


I don't understand those single header libraries. Anyone writing code in c or c++ knows how to link a library. It's really annoying when writing code for a platform with limited ram.


Unfortunately template-heavy libraries have to be header-only; otherwise the preprocessor doesn't know which flavors of the template you want to use.


> In languages such as Python, JSON feels like a first class data type.

Is that because the python dictionary happens to looks so much like json, or what do they mean?


I lost my mind at this sentence. If feels like a first-class data type because the result of parsing is one of the built-in data types (which can be round-tripped to a similar JSON string). And as soon as you care about serialisation of types it starts feeling incredibly clunky.


Yeah exactly. The equivalent would be to have a JSON parser in C++ which turns a JSON file/object into a C++ object that is accessed in similar ways to other typical C++ objects. So: {"key": value"} becomes json->key = "value" rather than json["key"]. In this library I feel as if they've used operator overloading to emulate the way JSON is used in other languages, rather than making the JSON feel like a first class C++ citizen so to speak.


I think as far as other languages are concerned, the best JSON handling gives youn Serde in Rust and F#, from what I read. If only C++ metaprogramming was easier without crazy complexity, Json library would be made to be much smaller and more efficient with less work.


If you want to convert it to an object, have you tried using dataclasses? I haven't used them much, but last time I tried them they felt much easier than trying to use the native dict/list/str/int/etc. that gets returned by default.


Yeah. It's the custom serialisation & schema/ctor arg validation that makes it clunky.


Yes, though more the other way around: part of json's appeal (on purpose) is that it covers the core subtypes of most dynamically typed programming languages (strings, number, booleans, arrays and maps), so moving data to and from json is straightforward and convenient.


This is part of it. It is extremly easy to import json and write and read json files and mapping them to Python dicts.


Yeah, that's probably what they meant.


I'm wondering what the difference between this and json-cpp https://github.com/open-source-parsers/jsoncpp is. They look like they provide the same functionality, albeit this looks more "modern C++ style"?


For starters nlohmann/json is fuzzed (look at the test directory). From my experience it is really stable. jsoncpp still has some quirks as it relies on unit tests only. Big no no for me.


Fuzzed? That's a rarity in tests, that's awesome!


nlohmann's is super nice. You can set a json object equal to a map or vector and it will automatically understand and do it. I doubt you can do that with jsoncpp.


I have been using json11 for a few years now (https://github.com/dropbox/json11). It looks like it has been abandoned though. This seems like a great alternative.


I wouldn't say abandoned, it states in the README:

Maintenance note: This repo is stable but no longer actively maintained. No further development is planned, and no new feature PRs will be merged. Bug fixes may be merged on a volunteer basis.

With JSON being an unchanging spec this makes sense.


People are biased towards github repos with recent commits in github as opposed to projects that are stabilized and frozen. This is a loss considering the man-hours spent on creating a stable library.


Still 13 open issues though. Most of them are marked as enhancement, but 3 as bugs.


Its a great library however 99% of its 'wow' moment can now be easily replicated with a few meta checks and can be adapted to basically any 3rd party json system.


I used this library on bare metal arduino a year ago. It felt funny doing high level programming on such crappy hardware!


We just use the good old Qt JSON parser/generator.

It's basic and has no compilation speed issues.


Gource for this repo: https://youtu.be/cyaCZDffAbQ




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: