Hacker News new | past | comments | ask | show | jobs | submit login
JSON for Classic C++ (github.com/jart)
117 points by davikr 21 days ago | hide | past | favorite | 89 comments



I remember searching for a JSON library with minimal dependencies a while ago, and came across this:

https://rawgit.com/miloyip/nativejson-benchmark/master/sampl...

The variance in feature set, design and performance is huge across all of them. I ultimately landed on libjson, written in C: https://github.com/vincenthz/libjson

It does a lot for you, but it notably does not build a tree for you and does not try to interpret numbers, which I found perfect for adding to languages with C FFI that have their own collection and number types. It’s also great for partial parsing if you need to do any sort of streaming.

It looks like this one can’t currently do partial parsing, but it looks great if C++ maps/vectors are your target.


If you want to go extremely lightweight, there’s jsmn: https://github.com/zserge/jsmn

It does no dynamic memory allocation, which is a plus in constrained IoT/embedded applications. But it’s really only a tokenizer. For example, if you want to parse fields out of a map, you have to write your own wrappers to iterate over key/value pairs. Since no data is copied out of the original buffer, all the “tokens” are given as byte offsets and lengths, not null-terminated strings, so you can’t just do printf(“%s”).

If you can’t (or don’t want to) malloc, it gets the job done. Not sure I’d recommend it for other applications though.


I actually evaluated and used jsmn and almost mentioned it in my comment. It was really quite cool, but I believe I couldn’t use it due to the lack of UTF-8 validation. Because UTF-8 validation is in the state machine for libjson, I can actually ignore incomplete UTF-8 escape sequences in incomplete JSON strings when streaming.


Is there a reason you can't do printf("%.*s", strlen, strptr); ?


A non-allocating library would be forced to return to you the unparsed string literal, since returning a parsed string can require an allocation. It might tell you that the literal is valid.

E.g., take the JSON:

  "\"\uD83D\uDCA9\""
There's no (pointer, length) into that that you can then printf(…, ptr, len); you'd get the escapes, raw.

Ofc., there might be situations like debugging where that's fine.


The terminating character is still the closing double quote and not a null, since the library does neither copy out nor alter the input. For example tiny_json replaces the closing quotes to create C strings, but that needs the full file to be in a mutable buffer which can be prohibitive for small controllers reading some config from flash only.


With the "%.*s" format you need no null at the end. It just counts out the characters:

    #include <stdio.h>

    void main()
    {
      char buff[10] = {'R', 'o', 'b', 'o', 't', 't', 'y', 'p', 'e', 's'};
      printf("=>%.*s<=", 4, ((char *)&buff + 3));
    }
prints

    =>otty<=


Ah. Ok. Scanning the length before printing is mandatory then.


@msarnoff had already stipulated that the json lib is returning lengths:

    all the “tokens” are given as byte offsets and lengths


You have to do (a variant of) one or the other, no?


Yes, that’s exactly what I’ve done.


Any other issues with it? Sibling comments mentioned potential unicode issues.


I've only used it in an application where we ensure the data is ASCII-only. The only issue is that I've had to write a bunch of wrapper code around it (for looking up object properties by key, iterating over key/value pairs or array elements, etc.)


I settled on this one too.

Far better than nicklohmann's monster build times.


Did you try cJSON? Works well for me. https://github.com/DaveGamble/cJSON


Somehow I didn't run across that one in my searching - I'll check it out. I've been working on a json C library myself:

https://github.com/nwpierce/jsb

My goal was to convert a stream of JSON to/from a binary stream that is easier to traverse and manipulate.


Compile time is largely a "developer problem", but so is the usability of a library. nlohmann/json's main perk that it is selling is that it's interface is usable. Whether or not a developer values usability at typing time vs compile time is an interesting thing to ponder for sure.


Compile time is a collective problem and usability is an individual problem. I work with llama.cpp. The files in that codebase that were made using nlohmann json take about a minute to compile using g++ -O3 -g, all because one guy who originally wrote it wanted to type fewer keystrokes on his keyboard by using a more magical library, and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.


> (...) and the rest of us have to suffer for it every time we experiment with a 1 line of code change to those files.

If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently? That means you can build it in parallel along with the whole project, and in the end you just link the resulting binaries.


” If you feel this is an issue then why don't you move it to an independent submodule that can be compiled independently?”

If it’s a header, you necessarily can’t. Header gets included every time you want to compile code that depends on a header.

Compilers may offer precompilation etc but if the code you want to change has direct dependency to a large header you need to recompile all of the dependencies.

This is one of the painpoints C++.


It is a pain point of build management regardless of the language, even with a language having proper modules one can have a cascade build, if the public interface or module ABI is impacted.

C++ modules are here, unfortunely outside VC++ and clang latest, plus MSBuild or CMake/ninja, they are not an option.


Are they? According to some people (github issue to support cpp modules on vscode) the standard is mess and is likely to go away. VSCode doesnt support modules atm.


Assuming that you're referring to https://github.com/microsoft/vscode-cpptools/issues/6302, I see two comments along these lines, neither of which is from actual implementers. That isn't evidence either that the standard is a mess, nor that it's likely to go away.

The reason why this is taking such a long time is because the entire approach is a rather drastic change to how C++ compilers usually work, and C++ compilers (or even frontends, such as the stuff used by IDEs) are complicated things that aren't trivial to make major changes to.


I don't understand why they don't just double down on these #include <__fwd/vector.h> etc. headers. They fix everything. The only downside to them is I can't use them on a header that defines a class with std::vector member variables. But if they could make a small tweak to the language so that I could, then I'd take that over the promised modules revolution any day.


Yeah but the people implementing it are prob more interested on you moving to VS, so there a slighty conflict of interest, I appreciate their work but I also a bit sceptical and think this is the main reason behind it. 6yo is a lot of time for such an important feature. They dont they to support it on all compilers/frontends in order to release it.


> Yeah but the people implementing it are prob more interested on you moving to VS (...)

Wasn't support for submodules first added to Clang and CMake?

Also, I don't think you fully grasp the scope of this change. Some projects are still stuck on C++1 to avoid having to pay for the technical debt of upgrading the compiler. This is a flag flip away.

For modules, you need to rework your whole build system and Eve rearchitect your projects.

Think about that for a second.


Visual Studio is what matters.

VSCode is never going to be as good, you are better of with Clion then.


This!

As win/mac user Visual Studio is my preferred tool, but in MacOS Clion (with vscode for few random workflow things not supported in Clion) is an adequate replacement (but Visual Studio remains king).

VSCode can be used as an industrial editor if one likes to, but if it does not feel right, it’s not a skill issue.


I just wrote a new server instead. There's nothing I won't do, no lengths I'm not willing to go, when it comes to cutting back on build latency.


I follow the same philosophy, to the point where at this point I barely use the STL; most of that template-heavy junk has been replaced in most of my projects. For instance, most of what I typically used <iostream> for was replaced with a 150-line .h (plus a 50-line .cpp that uses explicit template insantiation and a <charconv> include). {fmt} was too heavy for me. And I'm locked into C++17 because C++20 seems to double down on the 20k-line header madness.

When I was stuck with C++ codebases that forced me to take a mandatory coffee break every time I needed to run a bit of new code, it made me a little bit insane! Never again.


> I just wrote a new server instead.

I'm sorry, this makes no sense at all. Why would anyone write a new server just because a small component was taking a minute to build?


It makes no sense at all that person concerned with slow build time rewrote slow component to compile faster?


As a prolific contributor to open source yourself, I’d have expected you to be a little more sympathetic to other open source developers giving up their time freely.

For some contributors, they’ll have a day job, a family and other personal commitments. so writing open source code is a luxury they don’t have a lot of time for. I know this because I fall exactly into that camp myself.


I'm defending open source developers. We can't freely modify open source code if it has glacial build times. It's specifically because people are volunteering that we should aim to be as conscientious as possible when it comes to build latency. Someone who volunteers to contribute code that compiles slowly is not being respectful of the time of all the other volunteers, which is like pumping the brakes on the open source movement. So I will make my views clear that development practices need to improve.


Those people can create their own projects then. If the library/foss project doesn’t make it because of popularity then natural selection of the code worked, if people choose to accept the latency then it succeeds because it’s utility is worth more than time saved during a compile.


That's the same argument companies like Google make when they're in the exploit phase of their lifecycle. We can afford to let search results get worse, just so long as they're no so much worse people don't turn to our competitors. However they're doing it for money, so at least they have a good reason. Who can say why someone would make their software unpleasant with template metaprogramming.


Just because they give up their time freely it makes their decisions immune to criticism?


Constructive feedback is fine. jarts comment wasn’t that.


We will never know if jarts comment was constructive or not until we know original developers decision process.

If original decision process was indeed “less keystrokes”, then how is that not a constructive criticism?


The developers motives doesn’t change the snarky way jart wrote their comment.

And if you felt their comment was acceptable then I question how much you’ve contributed to open source yourself. Snarky comments like jarts are all too common and really demotivate people from maintaining popular projects.

But don’t just take my word on it, there’s a plethora of other contributors who’ve talked about this topic as well.


Where's the snark though? jart's comment reads true literally.

Compile times are a big deal, and 'jart is right about individual vs. collective problems. And unlike most other critics on the Internet, 'jart actually provided a solution along with the criticism. If that kind of behavior "demotivates [some] people from maintaining popular projects", I still feel it's a net win.


What difference does it even make if it’s an open source project or not? Compile times are a big deal.


I’m not talking about compile times. I’m talking about the way people should communicate respectfully.


Believe me, his message is as respectful as it gets.

I have far, far, FAR stronger words for “people” who don’t respect other people’s time by not caring about compile times. Words that would make Linus blush.


That’s more a statement on yourself though. You don’t need to comment that way, you choose to.

So my point stands.


Well maybe you should be, since focusing on the technology is what allows our disagreements to be impersonal.


Exactly. Which, unfortunately, isn’t how you framed your original comment. Instead you talked about the authors motives and indirectly called them lazy.

If you focused your comment purely on the technology rather than making ad hominem attacks to the contributor then we wouldn’t need to be having this conversation to begin with.


I don't think a supposedly bad decision has to be answered with being snarky. A pull request, or a fork focused on reducing build times are actual net gains. From that poster's original name, seems like they went on and did just that, which is great I believe.

At the very least, giving the original developer the benefit of doubt, or assuming their decision made sense under the circumstances they were in at the time, is IMO a better start than just public criticism.


While the pursuit of faster build times is definitely a worthy cause, I feel like there's something I'm not quite seeing here. Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty? Is there something inherent about the structure of the library that makes it unable to have its compilation be cached? Is the code structured in such a way that editing other code requires also invalidating the cache for the JSON-related code? I guess one way would be to break out the JSON parsing code to its own module and have it produce language-specific structs to be interacted with by the rest of the program.


Programming is the process of manipulating data structures, so if you're building a JSON server, then every piece of code in your server is going to be dealing with and operating on JSON data structures. It can't be neatly tucked away in a corner. Because it would be foolish to design a server that makes needless copies of all its inputs and outputs. This truth would be the same if you were using something like protobuf instead. Therefore it's important that your fundamental data structures be something that (a) you can control, and (b) doesn't make everything it touches take forever to build. Do you feel in control of someone else's 24000 line header full of template magic? If that thing is sitting between me and my data structures, then I will wipe it out of existence.


It seems like nlohmann/json is a header-only library, meaning the entire library has to be compiled once for every source file which uses it any time that source file or its includes has updated.

So I guess in a JSON-heavy code base or a code base where nlohmann/json has leaked into common headers, you may end up recompiling the library a few dozen times per build where a few dozen of your C++ source files must be recompiled (e.g due to common header changes)...

(But don't worry, the linker will then spend a bunch of time throwing away almost all of that work so you only get one copy of the library in your binary)


I missed that part. That is a pretty significant downside in that case.


It places a hard scaling limit on how big an open source project can become. Projects like the Linux Kernel spend enormous amounts of political capital restraining decadent programming practices, since it's the only way a codebase like that can maintain the support of its developers and grow. For example, Linux had a rule until 2018 that everything had to be able to compile with GCC 3.2 from 2003. They're much more laid back these days, since it's difficult to imagine Linux growing bigger than it already has. But I think for a newer project like llama.cpp would be well advised to follow by example what projects like Linux did in their growth phase, rather than following their leadership today. It requires an lot of discipline, toil, and restraint to be a leader in open source, because you're essentially offering the world a pot of gold, and that only works if you keep very little for yourself.


Looking at the source code for llama.cpp, it's not strictly speaking set up for success as far as build times are concerned, is it? Having 20k+ lines is a bad idea for many reasons, and this is certainly one of them.

I'm sure the authors are brilliant in many ways, but I suspect there's room for improvement in this area.


> Does the JSON-code change frequently enough to incur build cache misses and the full minute penalty?

The moment you switch branches - it changes.

If you develop for Android - it generates build for with hash name from some CMake/Gradle variables, the moment one of those changes (like AGP version) you get a new build dir and essentially have to compile from scratch.


If you're on something reasonably smart like Bazel it will be able to determine whether the module itself has been changed and requires recompilation instead of running from cache.


Nice.

We, and majority of Android projects, aren’t on Bazel, though.


This is true, and it's kind of a bummer to be honest. There's some serious time being wasted on recompilation that could be avoided with a really sharp build system.

Bazel comes with its own bag of sharp edges though so it's unfortunately not like you can just adopt it and be on your merry way.


> Compile time is largely a "developer problem", but so is the usability of a library.

Compiler time is way more than a "developer problem". It's an operational problem that ends up permeating to software architecture and development practices, and ultimately affects how the whole project is delivered and deployed.


Significantly faster compilation means less friction to iterate ideas, try things, which in the end lead to more polished results.

A nice interface is agreable, but maybe there are diminishing returns when you pay it with large compile time. I remember pondering about that when working with the Eigen math library, which is very nice but such a resource hog when you compile a project using it.


Code in jart's version is refreshingly clean and easy to read compared the nlohmann's version.

As an aside, I wonder: what are the ThomPike* set of macros actually doing in jart's implem ?

Also, a speed comparison of this vs the other one would be very welcome: conformance and simplicity are certainly important criteria when picking a JSON parser, but speed is rather crucial.


Thompson Pike encoding. It predates the UTF-8 standard and was invented on a napkin in a New Jersey diner. It allows the full spectrum of 32-bit numbers to be encoded, rather than restricting characters to only those also present in UTF-16. The json.cpp library enforces UTF-8 restrictions on parsing, because we have no choice. But you're allowed to serialize anything you want, thanks to the ThomPike macros.


Really interesting that nlohmann isn't fully compliant. What cases are these?

It seems to me though that if you're encountering the edges of json where nlohmann or simple parsing doesn't work properly, a binary format might be better. And if you're trying to serialize so much data that speed actually becomes an issue, then again, binary format might be what you really want.

The killer feature of nlohmann are the the NLOHMANN_DEFINE_TYPE_INTRUSIVE or NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE macros that handle all of the ??? -> json -> ??? steps for you. That alone make it my default go to unless the above reasons force me to go another direction.


Is there an example where nlohmann/json is not compliant? Would like to know (and fix) this.


On the other end of the spectrum there is [1]. It's both performance and usability oriented, although compile times are probably higher.

Nlohmann is the slowest out of the popular libraries, AFAIK, and not particularly more usable than rapidjson, in my experience. So "better than nlohmann" is not very novel.

[1] https://github.com/beached/daw_json_link


The moment nlohmann's library came out, I switched to it and I never looked back.

I loved the interface and its exactly how I would've designed a json library with modern c++.

Just maybe turn off the implicit conversion option, that can get a bit messy ;)


"This project is a reaction agains..." is such a punk move I can't do anything but appreciate.


jart is such a good programmer. a lot of people already know this but i just have to give props where it's due.


What does “Classic C++” mean?


This library is nicely concise, and the code is mostly readable (although there are some non-obvious tricks that could be better documented).

The Makefile could need some work:

  json_test.cpp:360:23: warning: missing terminating '"' character [-Winvalid-pp-token]
  { Json::success, R"({
                        ^
  fatal error: too many errors emitted, stopping now [-ferror-limit=]
  9 warnings and 20 errors generated.
  make: *** [json_test.o] Error 1
  % c++ --version                   
  Apple clang version 15.0.0 (clang-1500.1.0.2.5)
  Target: arm64-apple-darwin22.6.0
  Thread model: posix
  InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Compiling direclty with

  c++ --std=c++11 -c json.cpp
works fine, though.


There are approximately three major dialects of C++. They are distinguished by major changes in what idiomatic code looks like, enabled by the addition of core features to the language that made it more efficient and type-safe to express many things.

The era of so-called “modern” C++ started with C++11, which was a radical reworking of the language. All prior versions of C++ are “legacy” or “classic”. Idiomatic code in “modern” and “classic” dialects almost look like different languages.

C++20 arguably marks a new dialect break but it doesn’t have a colloquial label to distinguish it from “legacy” and “modern” AFAIK. Idiomatic C++20 looks pretty foreign from a C++11 perspective (but is unambiguously an improvement).


This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work. One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions. See https://x.com/JustineTunney/status/1795427808631758936


> This library supports building with C++11. I haven't tried compiling it with an older standard, but I imagine it might work.

I believe it does require C++11, due to std::nullptr_t and r-value references (&&), but that might be it. It's not a show stopper though since everyone should have a c++11 compiler now (even Ubuntu 14.04 LTS, which still has paid support I believe).

> One thing I like about the C++11 compilers like GCC 4.9 is they build code magnificently faster than recent editions

Kind of reminds me of gcc 2.95 which people kept around for the compiler speed. They would use gcc 3.x for the warning support and then compile with gcc 2.95 after fixing the warnings :).


Yes they'd be very trivial to remove locally. It might also be nice to have #ifdef statements around them like we're already doing for std::string_view. If we consider that many big name C projects like curl are still on C89 then there's surely got to be people still out there using 2000's era C++.


> It's not a show stopper though since everyone should have a c++11 compiler now (...)

I think the point of pointing out it's C++11 is that it's not "classic C++" as it's using "modern C++" features. Thus it's a mystery why it would be referred to as classic C++.


Just because I included an rvalue constructor doesn't make it C++11. This library was originally written in C. It hasn't changed a whole lot since Gautham and I originally wrote it: https://github.com/jart/cosmopolitan/blob/master/tool/net/lj... I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024. However if you disagree with me, and feel that classic means C++03, then I've made certain that your preferences are supported by this library too. Just remove the rvalue and nullptr_t constructors. I'll probably add #ifdefs soon to automate that too.


> Just because I included an rvalue constructor doesn't make it C++11.

Actually, it does. I mean, does it compile when you pass -std=c++98?

> This library was originally written in C.

Doesn't matter. If it uses C++11 features, it's C++11.

> I feel perfectly comfortable calling C++11 "classic" or even "baroque" compared to what people are doing with C++ in 2024.

Irrelevant. You can go the Humpty Dumpty way as far as you want to go and call anything any way. It doesn't matter. If you use C++11 features, it's C++11. If it's C++11 then you're discussing modern C++. You don't need to use all bells and whistles to quality.


I don't think it's entirely accurate. "Modern idiomatic C++" was a thing already before C++11 - that would be the kind of code that heavily used the standard library and especially STL containers, iterators etc (but also stuff like auto_ptr etc; and yes, for all its flaws, it was actually used).

And don't forget that C++03 TR1 also added a bunch of very useful stuff - most notably, std::shared_ptr and std::function. And, of course, Boost has been a thing long before C++11, filling many gaps for "modern C++" projects of the time.

"classic C++" from that perspective is C++ written more or less Java-style.


"Classic C++" and "Modern C++" refer to the language before and after C++11, respectively.

Some of the key differences are use of standard library and its containers, smart pointers, and other language features that look less like C. In this specific library, this refers to some of the techniques like bit manipulation, manual memory management and string parsing, and using things like enums to improve speed and reduce complexity.

An example of a more robust (but still "classic") library would be something like https://github.com/Tencent/rapidjson.


This is a fine library, but I use nlohmann extensively and haven't experienced any considerable compilation slowdown once I added it to the project.

Overloading from_json to modularize parsing is really useful, I think that should be a part of every templated C++ json parser library.

That said, I have seen these ThomPike* macros in cosmopolitan.h before, I wonder what the origin is.


What are "ThomPike* macros"?


Macros prefixed with ThomPike. I saw then used in json.cpp


https://github.com/jart/json.cpp/blob/4f0a02dab1af7d81888cf5...

The response doesn't tell you the location of the problem in the input.


That might actually be the explanation for why json.cpp benchmarks 39x faster than nlohmann's library if I include the failure test cases.


What are the performance numbers? nlohmann/json is no speed demon.


I've added benchmarks to the readme. https://github.com/jart/json.cpp?tab=readme-ov-file#benchmar... You're looking at a 2x or 3x performance advantage across the board. If you include invalid JSON handling, then 10x or more.


Sounds like there's a backlash to modern C++.


Interesting approach, but without providing a conan/vcpkg in (the end of) 2024, makes only friction.

We are not living in 90s anymore..


Dunking on nlohmann for performance is pretty easy. I’m interested in what the value proposition is over one of rapidjson, glaze, or simdjson (all of which have some amount of SIMD or SWAR optimization, and more importantly SAX and the use of something other than std::map)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: