
Perils of Constructors - jasonpeacock
https://matklad.github.io/2019/07/16/perils-of-constructors.html
======
khold_stare
I completely agree with most of the post. In C++ I use the static factory
method "trick" mentioned in the post, when constructing subobjects may throw
or result in some kind of error. Throwing from the middle of a constructor in
C++ is a quagmire, so it's best not to do it. Bonus, you can mark the
constructor and factory functions noexcept.

The part I disagree with relates to "relying on the optimizer for placement".
Even in C++ using the above factory pattern, you are returning the constructed
object from a function - and there is no problem if it is ultimately part of
some larger object. The C++ standard specifies copy-elision very precisely so
you don't have to hope the optimizer does it - it is required to. To
demonstrate you can do stuff like this even if you object contains non-
moveable members, like std::mutex

    
    
      class Foo
      {
      private:
        std::mutex mutex_;
        SomeComplexSubObject sub_;
    
        Foo(SomeComplexSubObject sub) noexcept
          : sub_{std::move(sub)}
        { }
    
      public:
        static std::optional<Foo> make(SomeParams params) noexcept
        {
          try {
            return Foo{SomeComplexSubObject{params}};
          }
          catch (std::exception const& e) {
            return std::nullopt;
          }
        }
      };
    

I think Rust can also specify something like this (ie. "copy-elision") as part
of its unwritten spec. Anyway, great article! :)

~~~
stinos
_Throwing from the middle of a constructor in C++ is a quagmire, so it 's best
not to do it_

[https://isocpp.org/wiki/faq/exceptions#ctors-can-
throw](https://isocpp.org/wiki/faq/exceptions#ctors-can-throw) for one
disagrees, but it might not cover what you mean exactly. So: also in modern
C++ (i.e. using RAII types)? Or do you mean other problems than leaking, do
you have an example?

~~~
UncleMeat
Even though cpp11 has been around a while, there is a ton of code that isn't
using smart pointers everywhere. This makes exception handling messy,
especially with partially constructed objects.

------
klodolph
This is kind of a pattern with “design flaws in C++” if you’ll excuse the
phrasing. C++ conflates a lot of different concerns with each other, so you
end up with some reasonable objective (enforce invariants) and some reasonable
way of achieving that objective (constructors & destructors) and it works well
enough 90% of the time but the other 10% of the time it's a problem. Then
there's a ton of design patters that crop up to manage the 10% of use cases
where the language doesn’t quite give you the right tools for the job.

For example, access control. In C++, access control boundaries are classes,
which is a reasonable choice except there are a ton of situations where this
is the wrong choice, so you have “friend”. Some C++ designers will tell you
“friend” is a code smell, which is true, but the fact is you can’t always
avoid it. In languages where access control is defined relative to modules
this is rarely a problem. So I say that C++ conflates access control
boundaries with class boundaries.

Another example is syntactical blocks and variable lifetime. Object lifetime
is an important concept in C++, but object lifetimes often don’t line up with
the extent of syntactical blocks, even if it doesn’t make sense for the
objects in question to have dynamic storage duration.

My hot take here is that C++ is a very opinionated language, in the same sense
that Go and Python are opinionated languages, it’s just that C++, Go, and
Python have strong opinions about _different aspects_ of the language. Another
difference is that people in school falsely equate traditional object-oriented
design with good design. This makes sense, because it’s much easier to teach
object-oriented design than it is to teach good design.

------
adrianmonk
The footgun argument is not applied very consistently here.

In other languages, it's an argument against constructors. You _might_ do
things that are bad like have members with default values or call methods on
an object that isn't ready. You could avoid doing those things by establishing
conventions.

For example, the default values thing can be avoided in C++ by using member
initialization lists. Or in Java, use final fields and definite assignment
([https://docs.oracle.com/javase/specs/jls/se10/html/jls-16.ht...](https://docs.oracle.com/javase/specs/jls/se10/html/jls-16.html))
will protect you from default values.

But the requirement to maintain those conventions is an unreasonable burden
when it comes to constructors.

However, when it comes to maintaining conventions to avoid other issues, Rust
then gets a free pass:

> _A perceived downside of this approach is that any code can create a struct,
> so there’s no the single place, like the constructor, to enforce invariants.
> In practice, this is easily solved by privacy: if struct’s fields are
> private it can only be created inside its declaring module. Within a single
> module, it’s not at all hard to maintain a convention like "all construction
> must go via the new method"._

And it even goes on to say "One can even imagine a language extension" for
Rust. Fair enough, but in languages that use constructors, one could imagine
language extensions too. Default values for fields could be an explicit opt-in
thing. Or calling methods on a not-fully-built-yet object could be banned or
could require an explicit syntax.

~~~
matklad
Sorry if the post sounds like I am trying to argue that one approach is
inherently better than the other. It is not my goal. Rather, I want to show
the amount of language level machinery (extensive static checks, or runtime
nullability) that is required if the language has constructors.

~~~
adrianmonk
OK, that's very reasonable. I think you make some good points that
constructors can be problematic. Maybe they need to be reinvented or
something.

Just ditching them entirely as Rust does seems like an honest attempt at
moving things forward, though I don't really see it as the long term solution
because neither approach is all that great when you consider all the
weaknesses that have been pointed out with both.

------
ncmncm
Not buying this argument, not at all.

In the constructor, there is no object yet. You have a bunch of subobjects
with no relationship besides proximity. The constructor is already a confined
space analogous to the "module" recommended in the article. Its job is to tie
the subobjects up into an object, and it is a great good that there is a
specific language construct for this purpose.

It is true that you need to be careful, in the constructor, not to call
members that assume class invariants have already been established while you
are still establishing them. But nobody forgets they are coding construction,
when doing it. The article invents a non-problem, and then a solution that
solves nothing -- apparently just because there is no other choice in Rust.

Weak thesis, weak argument. Rust has strengths, but lacking constructors is
not one. To present failing arguments risks suggesting that no better
arguments for your favored language are available.

------
spankalee
Dart fixes this with initializer lists, which must run for every class in the
hierarchy before any constructor bodies. Initializer lists have restricted
access to `this` so you can't call methods or pass `this` to functions.

~~~
agumonkey
I often think that Dart is a failure.. but every time I read about it, they
have interesting ways to do things. Maybe it's yet another language that only
exists to die alone and spread its gene for the next decade to pick up ?

~~~
thosakwe
Dart isn't dead, though. Its use has been increasing because of Flutter.
Beyond that, there are still new features coming out with every release, so I
don't really think dead is the best word to describe it at this point.

~~~
agumonkey
It's been launched quite long ago and is still pretty ~private.

~~~
thosakwe
What do you mean by that? It’s completely open source.

~~~
agumonkey
Vocabulary fail on my part. I can't find the term. Say 'in the shadow'.

~~~
Igelau
"obscure"

------
kazinator
> _The easiest answer is to set all fields to default values: booleans to
> false, numbers to 0, and reference types to null. But this requires that
> every type has a default value, and forces the infamous null into the
> language. This is exactly the path that Java took: at the start of
> construction, all fields are zero or null._

I don't think this forces us to have a null.

Firstly, if an object has another one as a member, then we just recursively
default-construct that member. If the member is optional (e.g. next node of a
linked list that may not be there) that can be addressed with a sum type, like
a "maybe" type, whose default value for construction can be the "not there"
variant of the type. (If that is considered morally equivalent of a null, I
don't know what to say; but it's certainly not there because of default
construction, but because we wanted linked lists that somehow terminate, and
it makes sense for construction to produce a node that has no next node.)

The concept of a default value isn't problematic at all. Every type has a set
of values in its domain. If that domain is empty (like the _nil_ type in
Common Lisp at the bottom of the type spindle), then the default value of an
object of that type is that only possibility: to have no value. If the domain
has exactly one value, we have no choice but to establish that value as the
default. Otherwise, we can designate one of the two or more values as the
default.

> _In Rust, there’s only one way to create a struct: providing values for all
> the fields._

If that's the design, we could require the programmer to specify that default
literal when the type is defined. Then that literal is used by default
construction. Problem solved.

If we can have literals, we can have always have default construction, if we
inconvenience the programmer to supply us the literal that is to be used for
it.

~~~
hobofan
I agree that it doesn't really force you to have a null. I found that line on
reasoning especially surprising, as Rust also gets around this with sum types
and does have a `Default` trait in the standard library.

I do however think that beyond the empty/one element domains you mentioned,
default values can be a bit problematic. An acceptable default value for a
type is not really determined by the type itself but in what context it is
used.

Imagine a config struct with multiple boolean flags:

    
    
        struct SomeConfig {
          feature_a: bool,
          feature_b: bool,
          feature_c: bool,
        }
    

Would it be a good idea to use the default value `false` for all of them? I
think in a lot of cases like these it is very much preferable to force the
programmer to provide values for all the fields.

> If that's the design, we could require the programmer to specify that
> default literal when the type is defined. Then that literal is used by
> default construction. Problem solved.

Rust does kind of have a way to do it, though it's a bit more explicit. If you
have e.g. `std::default::Default` (which would be the conventional trait for
that) implemented, you can easily create a struct without needing to specify
the default fields:

    
    
        let instance = SomeStruct {
          foo: 1,
          ..SomeStruct::default(),
        }

------
scubbo
I don't have a CS background, so apologies in advance if these are basic
questions, but:

> For this layout to work though, constructor needs to allocate memory for the
> whole object at once. It can’t allocate just enough space for base, and than
> append derived fields afterwards. But such piece-wise allocation is required
> if we want a record syntax were we can just specify a value for a base
> class.

1\. Why can the constructor not allocate memory "progressively" for the object
as the construction chain descends the class hierarchy? Is it because, in a
multi-threaded program, something else might allocate some of the "extended"
memory, causing a clash when the constructor attempts to append fields? I
assume that an approach of "if the memory that a constructor is trying to
append into is already allocated, first move the occupying object to a free
memory location, and then continue with allocation" is inefficient because it
relies on an "overseer" that can coordinate and resolve these clashes?

1a. Sub-question - why does the memory for an object need to be contiguous? Is
this purely an efficiency concern ("read a whole object sequentially" being
more efficient than "read an object by reading a bunch of pointers and then
reading the locations they point to"), or are there other considerations? I
was under the impression that RAM (unlike hard drives) has no (or, negligible)
performance penalty to random access - but maybe those intermediate "jumps"
add up to a measurable impact?

2\. "But such piece-wise allocation is required if we want a record syntax
were we can just specify a value for a base class." I don't understand this
claim at all. Why is this required? What does it mean to "specify a value for
a base class" \- is this shorthand for "specify a value for a field of a base
class"?

Recommendations for further reading are received just as gratefully as direct
explanations - I'm sure these concepts are already covered in the literature
or a course syllabus, but I have no idea where to start!

~~~
brianberns
An object has to occupy a contiguous region in memory in order for
dereferencing to work efficiently. We need both "this.baseField" and
"this.derivedField" to directly access the desired bytes. If either of those
has to chase through a chain of memory regions, performance will be much
worse.

~~~
morelisp
Statically-known field offsets are at least as important as contiguous memory;
you can have either without the other, though a major "achievement" of
classical OO is that a single-inheritance class gives you both without really
thinking about it.

------
meddlepal
I don't really feel like this is an argument against constructors but perhaps
an argument that some language designs are better than others...

For example Go does not have constructors and it's very easy to get a nil
deref error if you aren't careful.

~~~
stinos
I don't think the author's intent was to argue completely against cnstructors.
At least to me this reads more like: here are some pros, here are some cons.
But you're right: in some languages some aspects are better than in others.
Take the vector constructor example for instance: that's not really just a
case against constructors. The exact same, i.e. not knowing which argument
does what, can be a problem in any language with no named arguments, and not
just for constructors but for anything which takes arguments. But is that a
problem? Do I really need to know the order of arguments of everything? Nope,
it's 2019, my editor and internet know the answer.

~~~
Sharlin
The point is that with (unnamed) constructors the problem is exacerbated.
There’s a reason functions are named descriptively.

------
rzwitserloot
I haven't written it at all, but the Pony language showed up here on
hackernews a while ago, and it has the kotlin solution except the notion that
the variable isn't yet initialized is type-carried there. You couldn't 'fake
out' the process and observe a null on a non-null variable there.

In other words, with some type system antics you can have your cake and eat it
too, and that makes 'rust did this right', without also tacking on a deep dive
on why pony's approach also has downsides, disingenuous or ill informed.
(which I'm sure it does; my point is, this article doesn't cover it).

~~~
matklad
Sorry if the post reads like “rust did this right”: my intention was to show
different trade offs around static/dynamic checking of constructors. I didn’t
mean to argue that certain approaches are right, and others are wrong.

In particular, the section about Swift shows that you can fully statically
checked constructors if you dedicate enough language machinery to it.

I don’t know a lot about Pony, but it seems like it uses a strict subset of
Swift’s rules? There’s no inheritance so two phased initialization is
condensed to “don’t call methods until all fields set”. It’s also not possible
to call one constructor from another, so designated/convenience constructor
split is also absent.

------
broth
Any programming language or scripting language can be misused — especially
when it comes to misusing constructors or some other mechanic. I think it
boils down to knowing these language pitfalls and establishing standards to
follow and help guard against them.

------
mothsonasloth
Nowadays I use builders with validation and defaults. It makes life so easier
when creating instances or testing them.

This is for c and Java. Not sure how the builder pattern would work for Rust.

~~~
kibwen
Many Rust libraries actually employ the builder pattern extensively (sometimes
as a reaction to the lack of default/keyword arguments); it's so widely-
accepted that the necessity of error-handling while constructing items via
method-chaining (as featured by the builder pattern) was a key argument for
the postfix `?` error-propagation operator.

------
TazeTSchnitzel
PHP 7.4 adds optional type declarations for properties on an object, and so
there was the problem of how to avoid an inconsistent state (type not matching
before initialisation). The route that was eventually taken was to have the
properties not be null, but be unset. If you try to read from them at runtime
without having written to them first, an error is thrown. Notably, there's no
requirement everything be defined when the constructor exits.

------
ape4
Constructors can definitely be misused. But they can be handy too. What if the
initial values of the members are just constants. Perhaps it takes a bit of
computing (not too much) to set them up.

~~~
davesmith1983
C# has `readonly` variables. The values of these cannot be changed after the
constructor has finished running.

~~~
mattnewport
I don't find this that useful in C# since the majority of types in C# tend to
be reference types and you can still change the values unless they are
immutable types which though considered good practice aren't that common in
most C# code in my experience.

