Hacker News new | past | comments | ask | show | jobs | submit login
Writing a simple JSON library from scratch: a tour through modern C++ (eatonphil.com)
161 points by eatonphil on Aug 26, 2021 | hide | past | favorite | 113 comments



That

  struct JSONValue {
    std::optional<std::string> string;
    std::optional<double> number;
    std::optional<bool> boolean;
    std::optional<std::vector<JSONValue>> array;
    std::optional<std::map<std::string, JSONValue>> object;
    JSONValueType type;
  };
really ought to be a variant, that would simplify things a lot. It'd just be

  struct JSONValue;
  using variant_type = 
    std::variant<
     std::string
   , double
   , bool
   , std::vector<JSONValue>
   , std::map<std::string, JSONValue>
  >;
  struct JSONValue : variant_type { 
    using variant::variant;
  };
then instead of switches you'd do:

  std::string_view print_type(const variant_type& jtt) {
    using namespace std::literals;
    struct {
      auto operator()(const std::string&) const noexcept { return "String"sv; }
      auto operator()(double) const noexcept { return "Number"sv; }
      auto operator()(bool) const noexcept { return "Bool"sv; }
      auto operator()(const std::vector<JSONValue>&) const noexcept { return "Array"sv; }
      auto operator()(const std::map<std::string, JSONValue>&) const noexcept { return "Dict"sv; }
      auto operator()(std::monostate) const noexcept { return "Null"sv; }
    } vis;
    return std::visit(vis, jtt);
  }
which is much safer than switch/cases


C++ may get pattern matching with P1371, which may make visiting look more reasonable:

    std::string_view print_type(const variant_type& jtt) {
        using namespace std::literals;
        return inpect (jtt) {
            <double> __ => "Number"sv;
            <bool> __ => "Bool"sv;
            ...
        };
    }
In any case, `std::variant` should have been a language feature all along, instead of a library one; cf. Rust:

    enum Value {
        Number(f64),
        Bool(bool),
        ...
    }

    fn print_type(v: &Value) -> &str {
        match v {
            Value::Number(_) => "Number",
            Value::Bool(_) => "Bool",
            ...
        }
    }


Yes, pattern-matching like this would be the bee's knees


Since you're interested in readability, you may also like std::format:

    std::string format_parse_error(std::string base, JSONToken token) {
       std::ostringstream s;
       s << "Unexpected token '" << token.value << "', type '"
         << JSONTokenType_to_string(token.type) << "', index ";
       s << std::endl << base;
      return format_error(s.str(), *token.full_source, token.location);
    }
becomes

    std::string format_parse_error(std::string_view base, JSONToken token) {
      return format_error(
       std::format(
           "Unexpected token '{}', type '{}', index '{}'\n{}"
         , token.value
         , JSONTokenType_to_string(token.type)
         , index
         , base)
       , *token.full_source
       , token.location
      );
    }


If you really cared about readability

, you'd put your commas where normal human beings put them

. Instead of in a silly place like that

. Prefering structure over normal communication is not

, typically

, seen as a good thing

.


This is silly for several reasons that I will list: *

Programming is not prose, so we can temper our expectations. *

Even in prose, some symbols belong on the left, and some belong on the right. *

The example given is no less readable than the alternative. *

It's more important to be consistent in style than correct to one person's opinion e.g. some people still batch edit source via macros and appreciate when curly braces are not in unexpected places.


Prefix separators are an excellent way to avoid diff noise in langages without terminators (/ trailing comma support).


https://stackoverflow.com/questions/10483635/why-do-lots-of-...

it just removes a lot of hassle when you are used to it, I'd never go back to trailing commas


Trailing commas work just fine when the langage supports, well, trailing commas.


Having them aligned looks much better when you're used to it


Anything seems better once you're used to it, even "not having toes".


well no, because I was used to trailing commas but this was creating more friction and was overall less pleasurable, that's why I switched to leading commas where it feels like an improvement


That is the (or at the very least a) normal place to put them in C++. Human beings use things differently in different contexts.


I've been a professional C++ programmer for 24 years now, and neither I, nor anyone I've worked with over this period, used this ridiculous style.


I've used it on larger teams where some enums for feature flags end up getting conflicts all the damn time when you add to the end because the last entry doesn't have a comma.

This might be fixed in various languages/parsers by allowing a trailing comma (e.g. python).

At the end of the day, clang-tidy, black, gofmt, rustfmt all the things and let's never talk about comma placement again.


What if C++ actually allowed that comma? Because... it does.


That's only from C++11. The story I was telling you was from before then.

    $ cat test.cc 
    enum Foo {
     FOO,
     BAR,
    };
    int main(int argc, char** argv) { return 0; }

    $ g++ -std=c++98 -Wpedantic test.cc
    test.cc:3:5: warning: comma at end of enumerator list [-Wpedantic]
        3 |  BAR,
          |    

    $ g++ -std=c++03 -Wpedantic test.cc
    test.cc:3:5: warning: comma at end of enumerator list [-Wpedantic]
        3 |  BAR,
          |     ^

    $ g++ -std=c++11 -Wpedantic test.cc


not everywhere, I often have a need for it in template arguments for instance for type lists:

    std::conditional<
     some_property<
       T, 
       U,
     >,
     int, 
     float,
    >;
is a compile error, I really prefer

    std::conditional<
       some_property<
           T 
         , U
       >
     , int
     , float
    >;


How many code bases have you actually worked on? I can tell you it's very common in constructor initializer-lists to the point that both the mozilla and webkit styles of clang-format does it.


It’s a Haskell thing, and while I don’t use it in any other language, it does look very neat and readable in Haskell.


I thought it was an SQL thing.


It's because it's actually used in a lot of places because it makes sense


Happen very often in SQL codes :(


I was excited to use it! But I don't think it's available in Clang 12 (noted in this post) which is the latest version that comes with latest Fedora.

I didn't want to have to spring a newer compiler version than came with the distro.

Unless it's just a flag I did not turn on.


You can install it. dnf install fmt-devel


I do get that it's the type-safe approach but your resulting code looks more complicated than what I did (to me).

Still, thanks for sharing so that other people can see for themselves.

One of the biggest turnoffs about std::variant to me (which isn't relevant here) is that you can't name the types so the hacks for having multiple of the same type in std::variant look even hairier.


it's not only type-safe, it uses much less memory, takes less time to construct and does not duplicate the "null" state:

- your JSONStruct takes 160 bytes (on 64-bit systems with libstdc++), the variant one takes 56, almost three times less.

- it needs std::optional<T> for each member to avoid construction, while std::variant does that by default. But the compiler still have to initialize all these optionals to their default state.

- you need on every access two branches:

* one for the type switch

* one for the optional

the variant only has one level of indirection.


That's fair. I assumed it would be clear to anyone who knows C++ better than I that the point of this post wasn't exactly performance.

Actually I was most interested in just how simple and readable I could make it. But I didn't say that in the post anywhere. Will add.


Performance aside, using variant has an added benefit of making illegal states unrepresentable. Optionals + an enum still allows users to either nullopt all the members, or have more than one member that is not nullopt.


> I assumed it would be clear to anyone who knows

It's just not possible to assume that at all on the internet. Every year I have students who learn from blog posts like this for instance.


As someone who does quite a bit of C++ recently: for most uses readability is king.

If you did want squeeze out the extra performance you could also do something where you store `std::variant<>` inside a struct and provide an API like this:

    if (auto int_value = value.i()) {
      // *int_value == your int
    }
and expose a `std::variant<> &variant();` for use cases where you want the more complex/faster stuff.


Using variant as the storage type while providing a nice public accessor API makes sense given the huge difference in semantics between a struct containing optional fields and a variant.


How do you get the size of the struct? Is there a compiler command to do that?



Rather than using std::visit, you can just use a chain of if statements with get_if [1]. You keep the most important benefits of std::variant but the code is much more readable (IMO).

[1] https://news.ycombinator.com/item?id=25316192


Interesting, I find the list of of ifs much less easy on the eyes. I wonder why that is.


Agreed that it's a personal preference. If you geniunely have a standalone function, like print_type shown here, then std::visit seems roughly even with std::get_if to me (but I'd tend to use the latter for consistency with my other code). std::visit would clearly come out ahead if you had some sort of complex nested variant and you called out to other visitors. get_if is clearly ahead IMO if you're using a variant in passing in a broader function, and especially if you when you want to use control flow (if you return in an if branch you return from the wider function).


Yes, this case (checking for a type and returning early) is pretty much my only usage of get_if (and in this case it makes a lot of sense)


A nice follow up would also be looking at std::string_view as this has the potential to save a lot of memory.


Since I'm unfamiliar with the underlying concept, could you give an example of how you could see that being used?


`std::string_view` is essentially something like this:

   struct string_view {
      char *data;
      size_t length;
   }
It is a pointer to a subset of an existing string. It has all of the existing features of `std::string` but passing it around is zero copy. As long as the original memory allocation exists your `string_view`s are still valid. Also, it has an implicit constructor from `const std::string` and `const char #` allowing you you define a single function like `Thing ParseThing(std::string_view line)` and accept `char #` and `std::string` as an input.

More examples are here: https://abseil.io/tips/1

# -> *. HN formats bad


\* (backslash asterisk) will escape the asterisk.

>Also, it has an implicit constructor from `const std::string` and `const char *` allowing you you define a single function like `Thing ParseThing(std::string_view line)` and accept `char *` and `std::string` as an input.


every function where you are reading from a string without writing to it or saving it, you can use std::string_view. Same for vector / array: anything non-owning, read-only should use std::span instead.


Your alternative is undefined behavior due to using an incomplete type in std::map. C++17 allows std::vector to use an incomplete type but std::map does not.

Reference this paper for further details:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n451...


Ah, I thought it came along with std::vector in the authorized bundle. Well, one can just use the boost containers instead since they don't have that wording.. but to be honest I used incomplete std::vector in a similar fashion for years without issues.


Why doesn't the compiler error when compiling if it is undefined behavior? It seems really simple to detect if a type is incomplete.


It's undefined because the standard says so explicitly for these containers, but if you copy-pasted all the code of libstdc++ or libc++'s std::map in a new namespace and gave it your name it would be perfectly defined.

It's a bit like naming a global variable _Weeeeeeeeee. It's technically UB because the standard says so, but in practice the compiled code will be fine if your stdlib does not use that variable, the compiler isn't actively looking for elements of the spec to check and enforce, it's just using UB where it can as a tool to optimize


I was only really aware of undefined behavior that was relevant to runtime behavior. I wasn't aware using compiler reserved identifiers was undefined behavior; I assumed it was required to error.

I don't see the purpose in allowing trivially detectable UB to exist. In this case and in the case of reserved identifiers, I don't see how any optimization could be made using the UB. Noticing the UB allows you to do anything, so you might as well rewrite the program to do nothing.


C++ has a lot of these silly rules. For the case of reserved identifiers, they are reserved for the implementation of the standard library, but of course C++ uses a copy and paste system of #include's which takes place in the preprocessing phase, before the actual parsing/compiling phase. That means when you #include <vector> into your own main.cpp, you're copying and pasting "vector" into "main.cpp" and compiling "main.cpp", so the compiler would have to jump through all kinds of hoops to figure out that a certain section of "main.cpp" comes from "vector" and another section comes from this or that.

It's certainly possible to do this in principle, but the standard does not impose that burden on any implementation.

At any rate, the standard is full of this kind of compile time "undefined behavior". The formal term for this is "ill-formed, no diagnostic required" aka "NDR" which means that the program is invalid C++ but the compiler is not required to report an error, and running the subsequent translation is undefined behavior.


> I assumed it was required to error.

They are reserved because the compiler (and OS) writers need identifiers they can use in their own headers without breaking valid user code. Since there is no good way for the compiler to say which headers are user headers, erroring on those symbols would be counter-productive.


std::variant is an abomination. The idea is stellar. But actually using it is a frustrating nightmare. I prefer the simple way, unfortunately.


Good modern C++ uses shared_ptr very, very sparingly. Most programs do not use it at all. Overuse of shared_ptr is called Java Disease.

unique_ptr is, at base, very simple. A variable of this type owns whatever it points at. Since only one pointer can own the memory, ownersip passes from one instance to the next "by move". It is easy to know, at any place in the code, which owns the pointed-at thing.

It is generally a mistake to expose unique_ptr in an interface. It is an implementation primitive. Very often a class has only just one data member, a unique_ptr to a private-typed object in storage. Users of the type treat it as an ordinary value, but don't need to know anything at all about the implentation object. All the code that operates on it can be off in a ".cc" file, and you can change it without need to rebuild clients.

A vector of unique_ptr is another common class member, to hold ownership of a bunch of objects.

Once in a great while you want to take over control of how the memory is released when an owning unique_ptr goes out of scope. Maybe the memory was allocated with mmap and must be freed with munmap. This is easy to do with one-liner functions, or lambdas.


std::unique_ptr still suffers from The RAII Syndrome, where freeing memory, instead of being a constant-time operation, becomes linear, when one std::unique_ptr owns a struct, which owns other std::unique_ptrs and so on. Another problem with The RAII Syndrome is encouraging large numbers of microallocations, over allocating memory in bulk, increasing the number of places in your program where something may fail.

And although not strictly RAII, a related issue with the C++ model of constructors, destructors and whatnot (The Rule of N, where N goes to inf as the version of C++ goes to inf), is that when you use std::vector::push_back, and the std::vector needs to be resized, instead of executing a simple memcpy (or equivalent), move constructors for every element need to be called.


> instead of executing a simple memcpy (or equivalent), move constructors for every element need to be called.

But does that actually result in different machine code? If the compiler ends up concluding that you want to move all this contiguous data and by the as-if rule it does memmove() then the fact that notionally a bunch of other C++ code ran doesn't matter.

I agree it's theoretically cleaner to be able to just say "You can move these, it's fine" about a type and know that optimisation falls out automatically, but the end result machine code may be identical.


For it to produce the same machine code, a couple of things need to happen:

1. The move constructors need to be equivalent to a simple memcpy. This is often not the case. E.g. std::unique_ptr, which clears the source std::unique_ptr to null.

2. The compiler needs to either inline the move constructor to notice the optimization, or it needs to do inter-procedural optimization. Either way, it's unreliable. Sometimes it will happen, sometimes it won't. Sometimes it will happen in your current compiler version, but an update will come and it's not going to happen anymore. Generally, blindly relying on the compiler to optimize that is another case Sufficiently Smart Compiler[1]. Even if it happens, it's just not reliable. Chandler Carruth held a talk at CppCon once, where he showed how std::vector::push_back is optimized by Clang to a simple assignment and size increment, when it's preceded by an appropriate call to std::vector::reserve. The only problem was that when there were more than three calls to std::vector::push_back the optimization was turned off, so now not only did you pay the price of a function call (doesn't matter all that much), but also paid the price of increasing the number of branches that the CPU goes through (not that nice), because now on each std::vector::push_back, it needed to check whether it has enough capacity for the new size. Optimizations like that are very unstable.

And I'm sure this is going to receive mixed reactions among HN readers (not Rust again!), but Rust, although still suffers from The RAII Syndrome, when it comes to its standard library, it doesn't suffer from the problem with the move constructors, because it doesn't have move constructors. It's move-by-default, as opposed to copy-by-default in C++, and the moves are provided by the compiler, they are all the same; they can't be customized on a type-by-type basis. Which means that a Vec::push in Rust doesn't have the issue I described that std::vector::push_back has. It will always compile to some variant of memcpy.

[1]: <http://wiki.c2.com/?SufficientlySmartCompiler>


It is always irresistably tempting to claim as proof of virtue the products of what is really just good luck.

Rust had the good luck to come along behind C++, but not too close behind, and so get benefit of hindsight to choose nicer defaults.

I have seen people (even on HN!) earnestly assert that Rust originated the notion of language recognition for object ownership and lifetime, and move semantics—and even insist C++ copied those from Rust, badly. Rust's custodians are equally as bound by their unfortunate early choices as C++'s; sufficiently-aware readers will be able to list numerous cases where early choices are already regretted.

Rust benefits from much more than hindsight: it had nothing to be backward-compatible with, beyond its C ABI subset, where C++ was bound by almost all of C's early-'70s choices. (Function prototypes (along with implicit struct copy, "//" comments, void, and const) were successfully back-ported from C++ to C, freeing C++ from need to parse K&R function declarations.)

To a working programmer, history is void. Rust coders can luxuriate in its lucky defaults while sneering at C++ coders saddled with clumsy evolved syntax and backward-compatibilities. In their turn, old Rust coders will, if they are lucky enough, be sneered at by users of future languages that lucked into lessons Rust has taught by unfortunate example.


The existence of people who mistakenly believe that Rust is innovating is not at all remarkable, this happens all the time for technologies, and if you're setting HN as a high bar you are certain to be disappointed.

Of course one thing you certainly can learn from other people's experience is that you're going to learn from experience :)

Once upon a time Rust didn't reserve keywords for async and await. Once upon a time Rust's arrays weren't IntoIterator.

But when students in 2022 pick up Rust it will seem as though Rust always had async and await and its arrays always implemented IntoIterator. Only if those students dig through old code will they discover that these things were once not true.

Contrast C++ where the C++ beginner soon learns that there is a built-in array but it's awful and they mustn't use it. They might reasonably wonder why C++ hasn't fixed that.

There are two related things going on here. First and most obvious Rust Editions, the Editions get to change the standard prelude (a set of standard library components which are exported to your program by default without action, such as std::result::Result::Ok which can thus refer to in your program as just Ok); add new keywords (which may not yet have a definition when added); tweak the syntax of the language and even make syntax that previously gave a warning into a hard error in the new edition.

Secondly, and enabling Editions to be more powerful, the Rust language syntax was chosen to reserve room to improve and expand. Implementations are not licensed to just carve off potential future syntax for their own purposes. In particular many types of Rust token can grow to accommodate future needs. Once upon a time Rust's identifiers didn't have a raw syntax, which would have made new reserved words painful - so they grew one.

This is furthered by the choice to have token-oriented macro processing. Even scary proc macros (which could in principle cause absolute havoc) in practice mostly just manipulate tokens, meaning they don't blow up mysteriously as a result of new library features. A fundamental change to token parsing is invisible to a macro that only wanted tokens anyway, while tweaks to C++ grammar will upset your pre-processor macros because those don't process tokens they just mangle raw text.

I have no doubt that Rust will run into unforeseen (and perhaps unforeseeable) obstacles that render it less suited for a future some day than it is today. But I believe it is much better equipped to adapt than C++ was, and as we can see C++ managed to stick around anyway for decades.


Recognizing a need for non-disruptive extensability is yet another benefit of decades' hindsight. Invariably, though, dimensions not so well allowed for will sprout necessary extensions like thistles that need to be lived with indefinitely.

Of much greater value than smugness for bullets dodged is humility in recognition that not all will be (or, indeed, have been).


Sure, however, sometimes it turns out you did a good enough job and you're done. Subsequent work is academically interesting or has some narrow use cases, but there isn't going to be further revolutionary change.

In cryptography for example, AES-256 is probably enough. Permanently. When DES was designed its keys were purposefully too short and the block size was purposefully too small, and those are the only two elements that are actually broken decades later despite years of intense scrutiny, in AES the keys are long enough and the block size is big enough, and so we're done.

Or in audio. Some prototype work for Sony's Compact Disc assumed 14-bit PCM. 14-bit PCM is kinda marginal, some people are going to notice if the track is properly dithered, trivial transformations leave noticeable distortion, levels must be set correctly at all stages, and so you can make a reasonable argument it's not quite good enough. But they actually shipped 16-bit PCM. And 16-bit PCM is easily transparent. Bad dithering in the final mix? Can't hear it. The recording studio wants 20-bit PCM (for headroom), non-audio sampling applications may need 24-bit PCM, but for recorded music playback we are done.

It's also disingenuous to say bullets were merely dodged. Happening to not have two increment operators because you couldn't figure out how to do that in your parser would be a bullet dodged. Intentionally choosing not to provide either of them because they are footguns is something else. C++ is old, and C is even older, but neither of them is old enough to argue that all their mistakes were unrecognised at the time.


C++ had no choice about increment operators; they came to it directly from C, so anything "recognized" in the '80s would be wholly moot. If you expect to maintain that C's post-increment operator was recognized as some sort of problem in the mid-'70s, you will need documentary evidence.

As Bruce Schneier likes to say, cryptanalysis only ever gets better; it never gets worse. The best we can say about AES-256 is that it might be good enough for any chosen period, with probability declining monotonically as the period increases. But the key schedules originally published for AES-256 made it no stronger than AES-128. I don't know if that detail has been fixed in a revision, or if it is still no stronger than AES-128. For Wireguard, Jason chose ChaCha20.

It is many years too early to determine which parts of Rust are "done". What we can say with certainty is that some that were thought done will turn out not to be, some not provisioned for extensions will end up needing them, and some defaults will turn out to be regrettable but not fixable.


> C++ had no choice about increment operators

I don't buy this at all. Stroustrup was keen on this particular bad idea, it is even provisioned with operator overloading that has a special optional syntax so that early weak C++ compilers can overload it despite lacking a good way to distinguish the two cases. He spends two pages of the Second Edition defending this construction which, we may note, is wholly a C feature not something novel in C++:

  while (*p++ = *q++) ;
Stroustrup apparently admires this for being terse, admitting that this chosen example is neither commonly faster nor easier to understand than alternatives.

> cryptanalysis only ever gets better; it never gets worse

And it did. Eventually Linear cryptanalysis breaks DES, and it is theorised (but not demonstrated) that a refined multiple linear cryptanalysis breaks DES in just 2^41 known plaintexts less than fifty years after it was standardized... Completely useless of course, but that's the state of the art after fifty years of only getting better.

Work on AES has also proceeded as Bruce would suggest, and after twenty years, with just 2^254.3 operations you can do key recovery on AES-256.

AES-256 doesn't actually need to be stronger than AES-128 in the sense you're talking about, 128-bits is plenty, the trick is that the larger keys defeat Grover's algorithm and thus defang hypothetical inexpensive yet powerful quantum computers. With Grover's algorithm you get to do 2^128 quantum operations instead of 2^256 conventional operations to perform a brute force attack, but you can't do 2^128 operations either because that's far too many, so you lose.

> Jason chose ChaCha20

WireGuard practices zero-agility, it picks exactly one of each component and if there's a problem the entire protocol has to be refreshed. So, choosing ChaCha20 rather than AES means it's penalized slightly on fast modern general purpose hardware (e.g. your laptop) but much faster than it might be otherwise on cheap micro controllers and archaic gear, a very reasonable trade. [Modern TLS just can choose whichever both parties agree on].

> It is many years too early to determine which parts of Rust are "done".

I mostly agree with this. However, there's a spectrum. I'm pretty confident that in 2050 programmers will still find core::cmp::Ordering useful, I don't think a fundamental revolution in mathematics will have rendered the idea that 5 is Less than 10 useless. In contrast core::sync::atomic::Ordering might well have fallen entirely out of use in 2050 with a very different model of concurrent programming for very different hardware where Acquire-Release semantics seem silly. Or for example I'd be astonished if "constant by default" turned out to be a bad idea but not at all surprised if "narrowing as" goes away because it turns out that's a bad idea.


> I don't buy this at all. Stroustrup was keen on this...

Buy it or don't, it remains 100% fact. You may abandon fact, but at the cost of remaining credibility.

Where constrained to be backward compatible with C, as C++ was, there is no choice but to implement both increment operators. Period.

You may insist he could have avoided implementing overloading for increment operators. But then, generic code would not work. Crippling the language to conform to your personal preference would have killed its future, and we would be discussing neither C++ nor Rust, which depends utterly on C++ for most of its design. Furthermore, I daresay you had no such preference at the time, and probably could not have expressed it if you had.

Smugness about good luck is a great personal failing.


> Buy it or don't, it remains 100% fact.

That's not how facts work. It's a fact that C++ does have these operators, it is not a fact that this was unavoidable.

And remember Swift removed these operators. Earlier versions of Swift had them, current ones don't, they were a bad idea and they're gone now. That's a thing a serious language did, not a toy language used by six PhD students, a real practical language with a population of working programmers.

> Where constrained to be backward compatible with C, as C++ was

ie: conditionally. C++ chooses to be somewhat compatible, but not compatible enough that medium-sized C codebases work unaltered or that larger ones can reliably be converted at all. Which shouldn't be a huge surprise as its own "compatibility" story from C++ 98 onward is likewise a trail of misery and despair.

> Furthermore, I daresay you had no such preference at the time

In 1982 when Stroustrup named the language C++ and thus sealed this particular deal, no probably not. I believe I did not even purchase a macro assembler (my first way to write programs that weren't in either raw machine code or Basic) until slightly after that date. And I have no doubt that insufferable teenage me thought the pair of increment operators was a great idea when he got a primitive C++ compiler (having never had a C compiler before that) some years later, his poetry and protocol design were both awful too, we certainly shouldn't put him in charge of designing a modern programming language.


You have said something deeply stupid, and when called on it, doubled down. So, I will just leave it there.


How is that even relevant? What is the alternative? 'new' and 'delete'?


Usually std::unique_ptr or in a container like std::vector.

Being too afraid of std::shared_ptr can lead to using references or raw pointers in the wrong places. That way leads to the usual memory safety issues.


I was arguing for unique_ptr, not against. Parent was complaining about it.


> How is that even relevant?

What kind of a question is that? It's relevant, if you care about performance, reliability and code simplicity.

> What is the alternative? 'new' and 'delete'?

new and delete are part of the constructor-destructor business. "delete [] array" is a linear operation, when you have non-trivial destructors. Of course you may always just stick to trivial destructors, but the casual use of new and delete puts you into a mindset that it is entirely normal and nothing to wonder about day-to-day, and then you pay the price without being conscious of it.

There are multiple solutions:

1. Modern (concurrent?) garbage collectors are one solution.

2. Another solution is using arenas. Recently, in the compiler I'm writing in Rust I'm using the bumpalo crate, which provides an arena allocator. This allocator is not going to execute destructors of the objects stored in the arenas, when they are freed. I basically write code which doesn't rely on destructors of little objects. There is the destructor of the arena itself, but it's already a completely different picture than doing microallocations, with an unpredicted number of destructors. I can list off the top of my head places where in my code a destructor will be executed (and it's always just a single destructor, not N). I group objects according to the time at which they should be freed. So there is one arena which holds things like source code of different files (I can't mmap because of functionality provided) and arrays of tokens. This is going to be freed when the frontend finishes its job, because at that point I'm not going to quote the source code or tokens to the user (all the error messages doing that would be generated earlier). String literals have a longer lifetime. They're going to be there throughout the whole compilation process, so they go to a separate arena. I'm not making separate allocations for each separate string literal. Similarly, I'm not making separate allocations for each token array in the frontend. It also has the benefit that code is simpler, because I don't need to deal with different forms of the same data, only differing in "ownership". I have no issues like expecting in one place a String, but what I have is a &str. Or in C++ e.g. expecting a std::shared_ptr, when I have a normal inline value (whether it's on stack or part of another struct). Everything is a reference annotated with the lifetime of the arena it resides in. It's both faster and conceptually simple.

Another case: at work recently I needed to write an in-process sampling profiler to profile a subset of our code. Modern C++ would mandate that I should use a std::vector<StackTrace> type to store the collected stack traces and the StackTrace type would have a bunch of constructors defined and a destructor. What I did instead is I calculated the size of the buffer that I needed based on the sampling frequency, maximum stack trace depth, size of addresses on the target platform and the time the profiler was supposed to run for. Then I just allocated the whole thing in one go with mmap(2) with appropriate flags. Then I just appended subsequent stack traces in the buffer, holding a pointer to the end, and incrementing it each time an appropriate amount, terminating each stack trace with the 0 address, so that I know where one stack trace begins and another ends. No resizing, no destructors firing up out of nowhere, no microallocations for separate little objects, just one allocation in the beginning and one deallocation in the end. Only two points of failure, instead of N. The "modern C++" way would: be slower, consist of more code (The Rule of N), be slower to compile, have an incomparably bigger number of points of failure. Oh, using std::vector::push_back in a signal handler is also a terrible idea (dynamic allocation in a signal handler? good luck).

Another issue with destructors is that they are used for more things than just deallocation. E.g. they are used for sockets... So now you don't know whether your last packets reached the other end in TCP. For files, you don't know if an I/O error occurred. And, unlike the defer mechanism in various languages, they are invisible, so looking at some code it's not immediately obvious that some code that you do not see is going to be executed at a particular time.

Allocating things in large portions and forgoing that invisible destructor business brings a lot of benefits: performance, reliability, code simplicity.

Also, a big reason why manual memory management is so tricky and why people are so afraid of it is exactly because they were taught to allocate each little object separately, each little object having its separate lifetime. When your code is sprinkled with malloc(3) and free(3) all over the place, then no wonder you lose track of things. In my compiler, although it's written in Rust, I can easily point to places where memory allocation and deallocation happen. I could replace it with manual calls, because the objects are grouped together into larger buckets. It also helps avoid fighting the borrow-checker, when your lifetimes are so simple.

UPDATE: This also translates into efficient use of a garbage collector. If you allocate and free a lot of little objects frequently, then the garbage collector is going to be spending a lot of time on-CPU. And the developers are going to say "it's not me, it's the garbage collector!", but the truth is they are the ones that made all that garbage.


The overwhelming majority of programs have no particular performance requirement; whatever is most convenient is fast enough. That might mean writing in a scripting language, even bash.

Of the rest, the overwhelming majority of what most programs do has no particular performance requirement, so whatever is most convenient is fast enough. Usually only a small kernel of the program needs any attention to performance.

In such kernels, the overwhelming majority yield to very ordinary techniques, because (e.g.) they build their structures at startup, and just use them afterward, or only allocate occasionally.

The few remaining cases demand that you be smart, and make full use of the flexibility and power of the language and environment. If the language is not up to the job, you're stuck.

Going straight to mmapping hugetlb pages is a rookie mistake. Failing to mmap storage when the problem calls for it is another.

C++ and Rust are distinguished by supporting development at all of these levels. "Modern C++" does not imply confining yourself to any of them. A mature programmer uses the tools at hand as needed to meet requirements, without wasting time and without agonizing over ideology. Rust offers no advantage over C++ in this.


>> How is that even relevant?

> What kind of a question is that? It's relevant, if you care about performance, reliability and code simplicity.

The topic really was std::unique_ptr VS std::shared_ptr in the context of an introduction to modern C++... We are talking about standard best practices, not advanced high performance applications.


> std::shared_ptr means I don't have to manage any memory

You really should have left this out. It's very, very distracting, and makes me question if I even want to read the article.

shared_ptr is one tool to manage memory. It certainly doesn't mean you don't have to manage it, and it doesn't mean you should reach for it most of the time.

> I don't do a lot of C++ so I wanted to get a sense for what it can look like today.

I think this would be a great intro for the post.


>std::shared_ptr means I don't have to manage any memory; no more new/delete! (But try as I might to understand std::unique_ptr, I'm just not there yet.)

Eh? unique_ptr is just a scoped resource. You create a thing (or take ownership of said thing) and it gets deleted when it goes out of scope. Am I missing something?

It should be the first thing you reach for if you need to dynamically create an object (certainly way before shared_ptr).


A simple variable would be a scoped resource, unique_ptr can be transferred.

    {
       RAII file = ....;
       // do stuff
    } // file is no longer open
vs

    unique_ptr<RAII> outer_ptr;
    {
       unique_ptr<RAII> file = make_unique(...);
       // do stuff
       outer_ptr = file;
    } // resource still open


Your “RAII“ object can also be transferred if it supports move semantics (it really should). Both classes hold a unique resource (a file handle resp. a heap allocated object) that can be moved but not copied and the resource gets freed when the variable goes out of scope.


Yeah, I’m aware of that. I still don’t think it adds much to the complexity of unique_pet though.

It’s certainly less complicated than its ancestor auto_ptr.


I know it's only a tutorial but recursive function calls are a bad idea for a general purpose JSON parser. At the very least, limit the recursion depth. Otherwise, it's trivial to make the parser crash with a stack overflow.


No offence, but you should make it clear in the beginning that you are not too familiar with (modern) C++. This blog post might teach how to write a simple JSON parser, but it does not teach how to write good modern C++. There a many red flags which others have already pointed out.


Title: > a tour through modern C++

Then:

```

#ifndef JSON_H

#define JSON_H

// ...

#endif

```

Still I didn't get how #pragma once didn't end up in a standard, when every major compiler supports that.


I feel more and more like Microsoft/Apple/GNU need to do what Apple/Mozilla/Google/Microsoft did when they turned their backs on the w3c and formed WHATWG.

If the standard committee can't even agree on concepts like the existence of a filesystem, we need a different committee.


The standards committee have too broad of a scope and think they can make everyone happy. They want the same C++ to work on embedded and cloud servers while not making anyone update any of their old code and having full backwards compatiblility. This results in tradeoffs that make everyone unhappy.


We all know what happens when backwards compatibility gets broken.

Plus there are several options already when backwards compatibility isn't relevant.


> do what Apple/Mozilla/Microsoft did when they turned their backs on the w3c and formed WHATWG

You mean when they gave Google the power to add whatever they want to the standards and get the hate of web developers for "being the new IE" whenever they refuse to implement Google's API of the day?

Not sure I'd want that infinitely growing scope for compilers I use for production systems.


Who do you think calls most shots at ISO meetings?


There is nothing intrinsically wrong with using implementation features. If you are writing code for servers or a desktop app... just use #pragma once. Even in the unlikely scenario that you eventually need to use a compiler which does not support this, transforming #pragma once to a #ifdefs is a bash oneliner.


I thought modules would be modern C++ now? Am I missing something?


It's so modern (C++20), it's not actually implemented fully by all compilers yet.

https://en.cppreference.com/w/cpp/compiler_support#cpp20


It is already good enough on Visual C++.

Such is the price for an ISO language with multiple implementations.


I tried but couldn't get modules working, personally. The various sites online I found differed in how they work in the first place.

I couldn't find what the actual state of modules are in Clang.



That lengthy blog post is a perfect example of everything that is wrong with C++. Using modules should be a trivial part of the language. Instead, we have to take a detour via historical Objective-C (a language already superseded) and there are enough caveats to make a lawyer blush.


Some languages care to keep 30 years of code compiling, regardless of the new shinny features.


For C, there is a paper (pending publication) about this. No promises, of course, but the committee will at least have a look.


All our code uses #pragma once instead of the old way. Zero issues.


Same here and our games run on every possible device with a screen.


Same can be said about ISO C and several common C extensions.


I am curious to know why every single in function parameters in this project are passed by value. Too much RAM left unused in our modern desktops?


Why do you think pass-by-value takes up more memory? The data gets packed on the stack pretty densely, so as long as the data structures are tiny, that shouldn't be a problem. On the flip side heap allocation is typically prone to some level of fragmentation, which actually does waste memory.


I agree with your point for small size data structures. But in general, and in the case of the OP source code, he doesn't even need to copy its arguments to pass them by value, he should pass them via a const reference, and just pass the address.


Passing by value can fine, but then the author should use std::move when passing it on to other functions.

But you are right that passing by (const) reference is standard practice and less error prone.


Passing by value can be fine for small size structs, but it is not the common practice.

I expected to see any of this function to use const reference.

For general guidelines on parameters passing (and other guideilines), the C++ core guidelines are pretty useful for those who want to learn:

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines...


> Passing by value can be fine for small size structs, but it is not the common practice.

It can also be fine with larger objects (with heap memory) if you plan to make a copy anyway, because you can just use std::move. The advantage is that the function works the same with lvalue and rvalue references, so you don't have to write the same function twice. An alternative is to use "universal references" (T&& where T is a template parameter) + std::forward. This is often seen in constructors.

But yeah, usually you would just use const references.

BTW, instead of 'const std::string&' it is even better to use a std::string_view because it works both with std::string and C strings - without creating hidden copies for the latter. If you pass a C string as a 'const std::string&' argument, you actually create a new temporary std::string, whereas std::string_view will just store the pointer and length.


In the world of electron applications eating ram like nothing, this pales in comparison.


There is a good chance that electron comes with a highly optimized JSON parser that is vastly faster the one in the article


If you are looking into learning Modern C++ through library development, I highly recommend the following:

https://pragprog.com/titles/lotdd/modern-c-programming-with-...

https://stroustrup.com/Tour.html


Arguably, a Json type can be implemented with just std library thing in a less involved way: https://medium.com/@dennis.luxen/breaking-circular-dependenc...


If you want to have that in production, you should run a test suite on it. There is a popular test suite by nst.

Although some go too far. I just had to write my own JSON parser (resp. rewrite one), because the parser I was using did not pass the test suite I had.


[flagged]


Can you give me an example? Not asking disingenuously. I'm a noob and would be keen to learn more.


Internet hygiene best practices tell you to ignore advice that starts up saying that some tech/tool/language is “literally” dog shit.

The argument might even be valid sometimes, but the extreme and onesided view damages your learning.


There's a possibly apocryphal Dijkstra quote that "Object-oriented programming is an exceptionally bad idea which could only have originated in California."[1] Whether or not he actually said it, there is a pretty good discussion in that link.

[1] https://www.quora.com/Why-did-Dijkstra-say-that-%E2%80%9CObj...


I don’t know exactly what the troll you are responding to was getting at, but modern C++ generally agrees with them that OO isn’t the best way to do most things. For better and/or for worse, C++ can’t remove features, so classes aren’t going anywhere, but you don’t have to make big OO hierarchies.


While the C++ community is largely moving away from OO, the features added to support OO are still in heavy use. For example, std::function is typically implemented by holding a pointer to a templated subclass of a private abstract base class. There's still inheritance and virtual calls and such going on, and that's just not exposed as part of the API.


"I invented the term Object-Oriented and I can tell you I did not have C++ in mind." - Alan Kay


Why do people make websites that are so narrow that the code snippets have a horizontal scrollbar? I have a 4k monitor. Modern web design is almost as stupid as modern C++.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: