Rust uses it in their standard library[0]. Haskell uses it[1]. Lisp seems to have originated the concept in the form of box diagrams[2]. F# uses it [3].
I think it might be a functional programming thing.
While slightly different, the first place I remember hearing the term was Java. It was solving a slightly different problem: primitive types were not objects. You’d have an int, but many data structures require the things you stored to be subclasses of Object. There were a bunch of objects with names like Integer that you could use instead, but in the early days there was a non-trivial performance hit associated with the “boxed” values.
I think Java, Haskell and C# are the main languages where you have both boxed and unboxed values (C++ if you start doing smart pointer stuff I suppose, tho "String" could be considered a very fancy boxed char*).
Python, Ruby, and JavaScript all have exclusively boxed values. "Everything is an object".
I'm not sure what languages you use but that pretty dramatically covers most of the mainstream (top 10 at least).
Transparently copyable heap-allocated object would are a recipe for introducing invisible performance issues, especially in generic code.
Rust requires types to explicitly opt-in to being implicitly copied, while C++ requires you to opt-out by deleting the copy-constructor.
Accidentally copying small structs on the stack is a minor performance problem. Copying an std::box<int> in a hot loop could cause heap fragmentation, lock contention and huge amounts of wasted memory due to heap alignment requirements (32 bytes on 64-bit arches).
You mean, as opposed to the other transparently copyable heap-allocated objects such as ... std::vector, std::list, std::unordered_map, std::string, ... basically most of the standard library other than the smart pointers?
The problem is already there, Box wouldn't change anything.
I think there's a bit of confusion here around "value semantics".
No C++ smart pointer has "value semantics", relative to its target T. You can see this because == performs address comparison, not deep comparison, and `const` methods on the smart pointer can be used to mutate the target (e.g. in C++, operator* on unique_ptr is always const, and yields a T&).
This is in contrast to Rust, where Box performs deep equality, and has deep const/mut. In Rust, Box is basically just a wrapper around a value to have it on the heap (enabling things like dynamic polymorphism, like in C++). In C++, the pointer is its own entity, with its own separate equality, and so on.
Const-ness of operations, operator==, and assignment/copying behavior all have to be consistent with each other. For example, if `box` was simply `unique_ptr` with a copy constructor (somehow, and as the table in the blog post basically implies), then you would have that after `auto a = b;`, `a != b`, which obviously doesn't work. This means that the hypothetical `std::box` would have to have its comparison and const-ness adjusted as well. In C++ terms, this isn't really a pointer at all. The closest thing to what the author is suggesting is actually `polymorphic_value`, I believe, which IIRC has been proposed formally (note that it does not have pointer in the name).
Also as an aside, smart pointers are not suitable a) for building data structures in general, and b) building recursive data structures in particular. The former is because meaningfully using smart pointers (i.e. letting them handle destruction) inside an allocator aware data structure (as many C++ data structures tend to be, and even data structures in Rust) would require duplicating the allocator over and over. The latter is because compilers do not perform TCO in many real world examples (and certainly not in debug mode); if you write a linked list using `std::unique_ptr` the destructor will blow your stack.
> The former is because meaningfully using smart pointers (i.e. letting them handle destruction) inside an allocator aware data structure (as many C++ data structures tend to be, and even data structures in Rust) would require duplicating the allocator over and over.
Not quite clear what duplicating means here, but in general most smart_pointers can be constructed with an optional "deleter". Well implmented this would results into the addition one reference (64 bits) field to each instance of the smart_pointer (remove that by using some static member kung-fu but this is hardly worth it).
> The latter is because compilers do not perform TCO in many real world examples (and certainly not in debug mode); if you write a linked list using `std::unique_ptr` the destructor will blow your stack.
This true for all deeply nested structure. Same thing happen in reference counted system like swift.
One can mitigate this by simply controlling the order of destruction.
Duplicating meaning that the smart pointer deleter is going to need a copy of the allocator (or a pointer to it if you prefer, but allocators are already typically pointers). If your container is storing N separately allocated elements, holding them by unique_ptr instead of raw pointer will waste N pointers worth of space with commensurate extra cache use. Fine for homework but not production data structures.
If you control the order of destruction, then you're just manually asking for things to be destroyed, and not actually making use of the smart pointers main functionality. Why use them at that point? That's why I also used the phrase "meaningfully" use them earlier.
Look inside the STL, boost, abseil, etc. You'll very rarely see smart pointers used to implement containers/data structures.
You changed your focus from general data structure to containers specifically which have indeed differents (eg more specialized) design constraints. But still...
> If your container is storing N separately allocated elements, holding them by unique_ptr instead of raw pointer will waste N pointers worth of space with commensurate extra cache use. Fine for homework but not production data structures.
Designing a data structure is an exercise in compromise. You lose space/cache efficiency and gain (exception) safety and ease of use. And of course the lost of space/cache efficiency is a function of both the usage pattern and the size of the store elements...
Calling it "fine for home work" is needlessly dismissive.
> If you control the order of destruction, then you're just manually asking for things to be destroyed, and not actually making use of the smart pointers main functionality.
I disagree with the premise that the main functionally of smart pointers is automatic destruction. Exception safety and ease of use seems more important to me. So using smart_pointer and still controlling the order of destruction is a perfectly valid use case.
But an even more important point is that smart_pointer are designed in way that the length of the management can be different from the life time of resource being managed; By using the reset/release/construct_from_pointer methods. And indeed there are multiple designs when a resource is created unmanaged, then attached to a smart_pointer for a while, then returned to an unmanaged state.
This reflects the fact that the same resource can have different management needs depending on where in the program one is.
> Look inside the STL, boost, abseil, etc. You'll very rarely see smart pointers used to implement containers/data structures.
It doesn't imply any fault in the design of smart pointers.
Smart pointers are resource lifetime management tools, STL containers in general have very simple lifetime contracts... There is simply no need to use them.
I'm not sure I see the benefit vs std::unique_ptr. In the rare case you do want to deep-copy a unique_ptr, you can always use std::make_unique() to invoke the copy constructor
Kind of, though because the language does a lot of things for you when you use the built in copy, make_unique gets complicated when you want to use std::box inside of another structure. You would need to override the default copy constructor to get this to work. For instance, vector<box<Foo>> wouldn't be possible to implement with unique_ptr because you can't override the copy constructor for a templated type. std::box would allow you to copy it. As for why you would need to do this (over vector<Foo>), consider Foo having subclasses. Complexity breeds complexity...
Regardless, I think lifetime annotations would solve far more problems than std::box. I really do like box as a suggestion as it would help clean up types, make things a bit more explicit in a few places, but there are bigger issues with C++ right now. This is a great suggestion (as is unique_resource for similar on the stack), but a relatively minor thing in the scheme of things. Still nice.
Huh, isn't struct B literally what an implementation of std::box would look like? You would need a nullptr check in the copy ctor and copy assignment operator to make it complete, but other than that, that is exactly how I would implement box. Anything that would be in std, you can implement yourself, so I would encourage people to try implementing box themselves and see how it works for them.
Yes, it's not too difficult (but thanks for the pointer, edited). The question was why this smart pointer would be useful at all, not just whether it belongs in the standard library. Implementing the helper class yourself is not too bad, but putting this in domain classes is a lot of cruft.
Personally, I've never needed to copy something that was both not trivially copiable and the component parts of which were individually trivially copiable. If thing just gets initialized immediately, why do you put it in a pointer? Usually you would put it in a pointer because you need to initialize it in a specific order in the constructor relative to the other members (so presumably also in the copy constructor), or because it's a polymorphic object (so you can't just use the copy constructor of the static type). If neither of those is true, why use a pointer at all? Just make the member a simple object.
Types that are self-referential in one way or another, e.g. a graph where nodes point to each other, or a "main object" containing subobjects which point back to the main object. There is no way to implement an efficient move for such types (it would need to adjust all pointers), so implementing move operations would be misleading. If you want to be able to pass around ownership of such a type, it needs to happen through pointer indirection.
But you might want to allow a copy operation that recreates the entire structure. The proposed box<T> offers exactly the desired copy/move semantics.
It can be argued that any types like those described by you are wrongly designed.
For a self-referential type like a graph, either a value of the graph is completely stored in a contiguous region of memory, including all nodes, when the self-references should not be pointers, but offsets from the start of the region, and a graph value can be moved or copied anywhere without problems, or else a graph value consists only of a list of pointers, which point to node contents without pointers, which are scattered through the memory. In the second case, a graph value, i.e. the list of pointers, can also be copied or moved anywhere in the memory without any problems, like you would do with any single pointer.
Mixing pointers with data makes sense only for objects with embedded pointers whose purpose is to allow them to be inserted in linked links or trees, where such objects will never be copied or moved, but only created and destroyed.
Those sorts of types fail the second requirement. The members are not trivially copyable on their own because they contain tremendous structure that may only be discernible at the top level. A generic box pointer doesn't help you there because the problem is not implementing the copy methods of the members, but implementing the copy method of the outermost object.
That was my thought as well; even Rust requires a .clone() call to deep-copy a Box<T>, since it doesn't allow implicit copies of types without the Copy trait. (Types with that trait must effectively be "plain old data" that can be copied byte-by-byte.) So I don't see the issue with requiring an explicit copy function for std::unique_ptr<T> instead of an implicit copy constructor.
First of all, Box is a terrible name because the term has been used for boxed pointers (putting tag bits into unused parts of the pointer) and, in some languages, immediates, for four decades at least. Also all C++ standard identifiers are thankfully lowercase snake case.
I don't understand why the author writes "raw value is straightforward and efficient... However, you can't allocate them dynamically and you can't build recursive data structure such as a linked list or a tree with them." There is clearly something I don't understand here. Consider an int -- you can dynamically allocate one, you can put it in a tree. Putting a box into a tree will still require other data applicable to the tree itself, same as an int), and so on. So I don't understand the point being made here.
And the deep copy behavior is rarely what I want in a mutable structure anyway (it's always safe, if usually wasteful, in a R/O structure).
>There is clearly something I don't understand here. Consider an int -- you can dynamically allocate one, you can put it in a tree putting a box into a tree will still require other data applicable to the tree itself, same as an int), and so on. So I don't understand the point being made here.
To make a tree, you need something allowing an object (a tree node) to own an object of the same type as itself (another tree node). There are various options for this: raw pointer, unique_ptr, shared_ptr, auto_ptr, box. Without using one of those options, the tree isn't possible. That's the author's point. std::box can be used in place of one of the other pointer types.
I understand that fine. My problem is the author calls out the “raw value” as distinct from “raw pointer”, implying a non pointer value which would be the content of a tree node. This makes the entire paragraph hard to make sense of.
Can't you use edge lists/arrays to represent your tree? And maybe even topologically sort your node list according to whatever kind of traversal you want to make? At least that's what I used to do in the qbasic days.
Yeah, good point. You could use std::vector or std::list. Under the hood those both use pointers, but kind of similar to how under the hood std::unique_ptr or std::box would use raw pointers.
In the specific case of a binary tree, where you only need left and right, std::vector/std::list would be overkill and slow down your program. But in a b-tree, std::vector would be good.
You can't use std::array or plain C arrays for the same reason you can't use raw values.
I hadn't even thought about it, I was like Box<T> is basically std::unique_ptr<T> anyway so what's the point -- but yes, Rust's types all either can't be copied at all, or they implement Clone and thus Clone::clone, which is what you'd call a "deep copy" if you're used to that nomenclature.
I think the underlying cause is that Rust's assignment semantic is a destructive move, not a copy†, which frees up the opportunity for an actual copy to be potentially expensive, matching reality. In a language where assignment is copy, that operation must be cheap and so we've obliged to make up an excuse for how although this is a "copy" it doesn't behave the way you want, it's just a "shallow copy".
† Although it will nearly always work to think of Rust's assignments as destructive move, as an optimisation types whose representation is their meaning can choose to implement Copy, a trait which says to the compiler that it's fine to actually just copy my bits, I have no deeper meaning - thus if the type you're using is Copy then assignments for that type are in fact performed just by copying and don't destroy anything. So a byte, a 64-bit floating point number, a 4CC, an IP address none of those have some larger significance, they're Copy, but a string, a HashMap, some custom object you made (unless it can and did opt in to Copy), those are not Copy.
Crucially, from an understanding point of view. Implementing Copy requires a trivial implementation of Clone. As a result it feels very natural.
Seems to me that the critical problem with this idea is "deep copy."
There is no builtin deep copy facility. Without the facility then a box pointer would be dangerous leading to weird effects when the copy is too shallow.
You could solve deep copy with a template that relies on each class providing a deep copy function if one is needed. But again, this will make bugs if someone forgets to provide the function.
Rather than make an error-prone feature in the standard library, I think it would be better to just explicitly roll this yourself. A sensible constructor copy should already do a deep copy -- or ensure copy-on-write to simulate a deep copy. So copying is as easy a calling make_shared (original) or make_unique (original).
> std::box<T> addresses these issues by offering deep copying and automatic garbage collection
This is pretty much impossible when holding a pointer of base class. However, this is a primary reason for having pointers in the first place (polymorphism, and having abstract base classes).
In all other cases, you're probably better off with either the raw value, std::variant or std::reference_wrapper.
You always know the actual dynamic type at construction time, how would you otherwise construct it?
> For example shared-ptr to base can correctly invoke the correct derived type
Invoke what exactly? Im sorry I don't understand what you're trying to say here.
I guess you can force all derivied types to implement a clone() function, such that box<T> can do the deep copy, but Id consider that a fairly big inconvenience for such a simple pointer type.
With polymorphism, you typically want base classes that provide a general interface, that many classes can derive. In places where you use this pointer-to-base, you don't need/want any knowledge of the derived type. It is an unneeded depedancy, which would only increase compile time, or worse, cause circular dependancies.
I'm not a big fan of RTTI, and not even sure if it would work here. But once you start keeping track of all derived types, you might as well use an std:: variant. It's more cache friendly too, so more performant in many cases.
> Inspired by Box<T> in Rust, the std::box<T> would be a heap-allocated smart pointer.
so, is it the pointer that is heap-allocated or the pointee? frankly, i find this article somewhat incoherent, and importantly, it lacks code examples illustrating what it is talking about.
It doesn't seem that complicated: they want unique_ptr that is copyable and copies the underlying pointee as well.
I think this would be way too big of a footgun: with implicit copy it would be too easy to pass box instead of box& and accidentally make copies when you didn't mean to. Box is not copy in Rust which avoids that problem, so really the equivalent in C++ would be just to add a .deepcopy() function on uniqueptr which is only implemented if the underlying type has a copy ctor.
but you can do that anyway - just copy the thing the unique_ptr points to, and make another unique_ptr point to it. but i really don't understand what copying a unique_ptr implicitly would mean.
also, c++ has (wisely, imho) rejected the concept of a "deep copy" - we just have copies, of varying depths.
Yeah; the difference between Box and a regular object would be the object is on the heap instead of the stack (just like it is with uniqueptr). I say its a footgun to have an implicitly copyable uniqueptr because a reasonable use of using the heap that way is for stuff too big to go on the stack, and too big would also mean you also don't want to treat copy as a trivial operation.
Often classes don't have a copy constructor if it's expensive to copy for the same reason though, which means they wouldn't be eligible for the proposed Box behavior either though so maybe its not a real problem.
I like the idea. I have wondered why not have a "garbage collected" smart ptr std::gc_ptr<T> that can allow cycles with other such ptrs to avoid the short coming of std::shared_ptr<T>. You would need to define which gc_ptr's are in your "root set" to initiate the "mark and sweep" of the graph of ptrs. This would be useful for heavily linked data structures with cycles.
It will never be called gc_ptr because C++ programmers have an allergy to the term GC. However, an attempt was made to implement a similar solution. Take a look at tracked_ptr: https://github.com/pebal/sgcl
There are some (a lot of) changes that C++ should have inspired by Rust, but as the other comments have said. I really don’t feel like a smart pointer that acts like unique but copies by value is all that necessary.
That doesn’t seem to fix a memory bug (cause doing this with a unique ptr, then the compiler would yell at you for using copy), it seems to just make it easier than having to write `std::make_unique(*otherptr);`
I have written and been using that same smart pointer type for years, under the pretty horrible name of holder_cloner_t<> (at least it's clear). It is indeed the right solution to a very common and important type of problem. Looking forward to something like this in the standard library one of these decades.
Low level-capable languages need GC lifecycle hooks and replacements similar to Rust alloc, but flexible enough to plug in BWS, ORCA, or DIY.
I also think the semantics of shared and unshared const and mutable state need to be made explicit. Pony is very good about this more so than Rust by bringing into the language.
I've seen this sort of pointer (assuming the author means it's nullable) be called "clone_ptr<T>" but it called T's clone() method. Because T might be a base class, invoking the pointee's copy constructor in C++ is not a great idea.
I'd rather have an optionally-owned pointer type so I can handle writing virtual methods that can return a value aliasing an already existing value or create a new one on demand. Otherwise you either have to roll your own or bloat your API:
It'd probably have shared_ptr semantics but you'd have to treat it as a const ref for lifetime purposes, which might make it distasteful to the std library folks.
The author could implement the deep-copying pointer and share the .h on their GitHub. You don’t need a language extension for this in C++ as types can implement operator overrides and copy and move constructors.
But I doubt many people would use it, and that’s probably why it doesn’t belong in std::.
In contrast, before C++ 11, developers would write their own RAII-style smart pointers. So it made sense to save them the labor. I don’t think a pointer that doesn’t allow shallow copies is usually found in codebases. It sounds like a specific use-case pointer.
It’s a neat type that people coming from other languages could like, but maybe not quite standard library-ready?
that is enforced at compile time would be far more valuable for me. That would mean some kind of destructive move where the compiler guarantees that you can not access a moved from object.
As many as mentioned before, i think the authors is mixing a lot of concepts and might be trying to write rust in C++ ( or vice-versa).
From what i understand the key here is that authors seem to be mixing references and pointers.
What the author wants here (std::box) is a garbage collected/RAII'ed reference to a heap allocated object.
smart_pointers are pointers... which is to say their identity is distinct from pointed object.
I don't think std::box is missing as much as it doesn't really fit well in C++ current memory model, and distinction between references, pointer and ownership.
C++ has lots of so-called "value types" and this would just be another. box<T> is basically a vector<T> that is constrained to always have a size of 1. Both internally store a pointer to the heap and implement value semantics for the pointed-to object(s).
The question is not about the utility of box<T>, i am sure there are cases where such a structure would be need.
But box<T> is not a "pointer" in the C++ sense same as an array/vector of 1 is not a pointer.
The authors is needlessly complicating things and mixing concept there are pretty orthogonal. And i think this is the result of trying to fit box<T> which is not a pointer inside a taxonomy of pointers...
Is the normal stack size for a thread still 8MB? Most of my c++ has always been simulation or signal processing where a buffer of objects to be simulated or samples/pixels to be processed always exceed sane stack sizes. The standard answer is to preallocate these buffers and reuse them. Smart pointers are pretty useful in the simulation case, but not really necessary for the signal processing.
std::vector<MyType> is a pretty good 'box' like container. Dynamically allocated and applied RAII semantics. If you only want one instance, then dynamic allocation shouldn't (?) be necessary.
> std::vector<MyType> is a pretty good 'box' like container.
C++ has a concept of not paying for what you don't use. So if you only need one of that type then the vector isn't the right thing to use because the vector will have dynamic array-sizing information: at the very least, two size_t's one to indicate allocated size and one to indicate used size.
Then, a vector won't deep-copy a pointer. So a Vector requires the concrete most-derived type, and not a pointer to it. That's not the same as pointing to a base class to avoid needing to know the most-derived type.
No, I don't agree that a vector<type> is a good 'box' like container.
https://hackernoon.com/value-ptr-the-missing-c-smart-pointer...
https://buckaroo.pm/blog/value-ptr-the-missing-smart-ptr
People have been writing pointer-like value semantic wrappers for type-erasure for decades.