Hacker News new | past | comments | ask | show | jobs | submit login
Modern C++ Features – std::optional (arne-mertz.de)
86 points by tempodox on June 9, 2018 | hide | past | favorite | 113 comments

The worst part about this is that Apple has decided to once again violate the standard. They purposefully eliminated the `value()` accessor due to ABI issues as far as I understand from the stack overflow comments: https://stackoverflow.com/questions/44217316/how-do-i-use-st...

My recommendation would be to just use https://github.com/martinmoene/optional-lite for cross-platform stuff.

"Violate the standard" might be a bit harsh. The fact that you are using <experimental/optional> instead of <optional> as your header should indicate that something else is going on.

Practically speaking, it takes a while for features implemented upstream to make it into your OS of choice. If these are really critical features, you can get third-party implementations of <optional> or you can compile your own toolchain, neither of these are particularly difficult options. It can be a bit frustrating, yes, when language feature arrives on a different schedule in the library and compiler (which happens).

Mind you, #include <optional> doesn't work on my Linux box either, with GCC. And if you look at the GCC web page, https://gcc.gnu.org/projects/cxx-status.html#cxx17, it says that GCC's support for C++17 is "experimental". I know that GCC != Clang, but let's give them some time to test things before a million projects get the code compiled in and we're stuck with the implementation.

GCC moved it to <optional> with the release of GCC7. Note that we're now at GCC8, clang 4 added it - I believe we're now at clang 6.

Therefore, it's by far no longer something that's experimental IMHO. (standardised in C++17 after all)

It looks like this is just temporary until the feature is no longer experimental so they can modify the standard library.

A possible std::optional footgun is that an optional object can implicitly be converted to bool, returning optional has_value(). For optional objects that may hold value types that themselves can be used in a boolean context (like bool or a pointer value), then a programmer may confuse the implicit has_value() check with returning the real value().

  std::optional<bool> opt(false);
  assert(opt); // true even though *opt is false!

Is this really different than a pointer (or smart pointer) pointing to a false-y object? An optional isn’t implicitly convertible to its contained type - you have to use * or .value(), so it seems like there wouldn’t be much confusion.

I suppose the difference would be that pointers-to-bool don't make a ton of sense in the first place, but using optional to wrap a boolean is a very reasonable thing to to.

I don’t see it being reasonable at all. If you need more than two values, use an enum?

That's a good point. If need a bool variable, there is a decent chance you will need more values later. In that case, an enum is more extensible and the names are more descriptive. Like the classic "Daily WTF?":

    enum Bool 

Well there you go: your 'True' resolves to 0 (and 'False' to 1). So, please don't do this. bool exists for a reason, and so does std::optional.

For future reference, The Daily WTF provides examples of bad code. People submit examples of bad code they've found to The Daily WTF for discussion.

So, I missed the joke, I guess.

I think you misunderstood my point (or exaggerated it a bit to make that reference). Regardless, Let me add my suggestion for those wanting to use std::optional<bool>:

If a developer wants a yes/no/unknown type, an enum class is better suited for the job. This forces them to explicitly compare values instead of relying on an implicit boolean conversion.

Is the first number in a string even or odd?

The function computing that is probably served really well by returning an optional bool.

Ternary logic (Yes/No/idk) is one of the most common use-cases for std::optional I've seen.

The vastly more common use-case I've seen is when you want to have an optional value, but don't want the heap allocation that comes with std::unique_ptr.

std::optional is an useful wrapper because a simple "bool + object" would call constructors and all when that might be undesirable.

IMO, ternary logic belongs as an enum class, to require developers to be explicit about what branches are taken for what values.

heap allocation that comes with std::unique_ptr

I think it may be worth mentioning that std::unique_ptr can be used to "manage" non-heap pointers as well.

An optional containing a pointer seems pretty dubious. You would usually either just use a pointer and have null mean empty or use an optional containing the pointed-to type.

An optional pointer is a weird tri state thing where you can have empty or a null pointer or a non-null pointer.

It is generally dubious but you could easily end up with an optional containing a pointer if you're using templates. The type would predictably leak into parts of the code base calling the templated code.

There are a bunch of other parts of the C++ code base that are the same way. The most notorious one I can think of is std::declval, which just makes no sense whatsoever unless it appears inside another template.

Semantically though, not having a pointer value and having a value - even null - are two different things. For example, doing pointer arithmetic in the former case would be meaningless.

Null pointer arithmetic is undefined, so just as pointless as arithmetic on an empty optional pointer. Your example fails to prove that optional pointer has any valid use.

Sure, my point might seem theoretical; to explain further, the difference is that when std::optional pointer is used correctly its behavior in the absence of a value is perfectly defined! Which is a good thing.

> A possible std::optional footgun is that an optional object can implicitly be converted to bool,

That isn't true, as std::optional are explicitly converted to bool to signal whether the object stores a value.


Does that prevent the usage shown in the blog post?

    std::optional<unsigned> opt = firstEvenNumberIn(text);
    if (opt)

That usage means "was the optional value set?" That's the whole point of an optional type, and there is absolutely no reason to prevent it.

I've already explained why that argument makes no sense


Could you elaborate more on why having the compiler tell us that we forgot to change something wouldn't be a good thing?

Now imagine you are using type inference using `auto` and you are refactoring some functions from returning `bool` to `optional<bool>`.

Good luck with that.

> returning `bool` to `optional<bool>`.

In that case you need to pay attention to what you're doing. Semantically you'd be altering your code to convert a function that always returns a value into a function that maybe won't return it. How do you expect to replace a type with a wrapper that maybe won't wrap that type and still assume you won't need to check if a return value exists?

That would be like replacing a reference with a pointer and then forgetting to check if the pointer is null. Making this mistake has nothing to do with luck and has everything to do with incompetence, and no language specs saves you from that.

Of course you have to check if the value exists. But you did not get my point.

If the optional wouldn't be an implicit boolean, you'd see compile errors on the locations you forgot to change (even if you use type inference).

The pointer dereferencing operators and -> are implemented, but without the std::bad_optional_access – accessing an empty std::optional this way is undefined behavior.*

Oh, please. Raise an exception or panic. But "undefined behavior" on dereferencing this new form of "null"? That's no good.

Edit: Rewritten.

You say, "Oh, please." But this feature is consistent with the design mandate of the C++ language... the very foundation of its design is that you don't pay for safety if you don't want to. So I am not sure what you are trying to say.

Honestly... why aren't you just using a different language? Practically every other language invented in the past 30 years has the safety features it sounds like you want. So, why come into a discussion about some new C++ feature and complain about the fact that a safety feature is optional? This is the way it always has been for C++. This is the way pointer dereferencing works for all the new smart pointers. Dereferencing std::shared_ptr and std::unique_ptr... both unsafe, equally unsafe as std::optional. This is the way std::vector works. They're all unsafe unless you specifically use certain safer accessors like .at(). The fact that dereferencing a std::optional is unsafe is consistent with decades of changes to C++.

Let's keep things civil please. Saying "I can only imagine that you're trying to start a fight" is rarely helpful.

Okay, rewritten. I honestly thought the parent comment was baiting, since complaining about C++ safety is... well, rarely helpful and usually just starts arguments.

Thank you for the rewrite, much appreciated.

If I may offer a tip, when someone takes a position that seems unreasonable or baiting or argument-prone, I try to apply the principle of charity: I assume that their intentions are good.

Of course I don't always succeed at this! ;-)

Sometimes the principle of charity is quite harsh, because in some cases if you assume that someone has good intentions, you come across as condescending and make it sound like you think they're ignorant. So I don't think that the principle of charity is always the right approach. The tone of the original comment starting with "Oh please" doesn't incline me towards a charitable interpretation, and it still doesn't.

You are certainly right about that. Perhaps this is where it becomes an art instead of a science. How can we create a positive conversation that we can all learn from, even when people say such obviously provocative things as "Oh please"?

If someone takes that kind of attitude, so what? You'll never lose by taking the high road; I've found it to be a very rewarding exercise to try to be chill regardless. Of course as I mentioned, I don't always succeed.

At least I try to be like the old Shadio Rack:

"You've got answers? We've got questions."

Emphatically disagree. You can lose by taking the high road, because it costs time and energy. Being chill regardless is an unhealthy habit, I've found it better to allow myself to feel whatever I happen to be feeling (mindfulness), and share my feelings with others when appropriate. This will inevitably be "not chill" from time to time but it's healthier.

I can see that the standard committee is taking some steps to make things safer and I would really appreciate a C++ safety profile, where one can choose if they want the STL UB or not. E.g: GCC 7.x has a macro which makes all these unsafe member functions abort instead of exhibit UB.

Even better would be a "safe" block like D, where everything is checked & well defined.

The standard way of dereferencing std::optional is with .value() and that does provide a check and an exception. Using * and -> is equivalent to invoking "unsafe" in other languages. You get the semantics you ask for. If you don't want unsafe behaviour it's really as simple as using the safe operator. I don't honestly see what the problem is here.

(Disclaimer: not a C++ programmer)

I think the difference is that * and -> are normal things to do on a value. Other smart pointers also overload those operators, so it isn't an obvious red flag. For example, look at this code:

Is this safe? Depends on whether foo() returns a non-nullable smart pointer (or a raw pointer) or an std::optional. It doesn't raise any red flags when you're reading the code, and you can mess it up if you forget the signature. Sure, you could look at the signature of foo, but at that point you're looking at foo anyway and a doc comment telling you it was nullable would work just as well. (OK, it's slightly worse because you have to look at the comment. But you still have to look at the function to know what you're doing is safe.)

On the other hand, in most other languages with optionals, you have something like this (Rust):

unwrap() is only used with optionals, so anyone reading the code knows this is something that might be a bug. If you forget that foo() returns an optional, you'd write foo().bar() instead, and you'd get a compiler error.

The point of optionals, to me, is to have the compiler remind you to check nulls. If you can deref an optional in the same way that you deref a normal pointer, you don't get any of the benefits of optionals. (I guess it gives you a way to indicate null in places where you previously couldn't, which is useful but not why most people want optionals.)

This is just standard C++isms.

You can get a reference to an item in a std::vector with the [] operator as you would with an array in many other languages. It does not do bounds checking, as specified in documentation. Trying to access an out of range element is undefined behavior.

Or you can use the .at() method, which will throw an exception if you go out of bounds. It is the programmer's choice.

Bounds checking is not free, same with checking for a null pointer. C++ is a performance oriented language so it gives you the choice to play it a little unsafe in the name of having full control.

I understand the idea to want the language to make it more obvious that you are doing something potentially unsafe, but it has never been the case in C or C++ that dereferencing a pointer is considered 'safe'.

You have to know what you are doing, else you might end up blowing your whole leg off [1].

[1] https://www.goodreads.com/quotes/226222-c-makes-it-easy-to-s...

I guess I don't know why you'd even want an optional if it wasn't for the safety. Without compiler-enforced null checking, I don't see why you'd want one.

C++ already has compiler-enforced null checking by virtue of not allowing non-pointer variables to be null. But sometimes we want a variable to be null and not be a pointer.


    struct stuff {
        int required_value;
        int optional_value;
What would you do if you had no value for optional_value? Probably use 0 or -1 but that can be problematic, and not even an option if optional_value is not a primitive type.

You could use int* optional_value instead and just assign NULL, but then you have to malloc/new and make sure to free/delete later.

You could use std::unique_ptr to handle memory management but you are still invoking dynamic memory.

std::optional allows you to have nullable variables without using pointers, even when hidden behind the scenes.

You could ask, what is the point of optional if you don't want to check if it is null? Well...I guess it's the same as no bounds checking, sometimes the programmer just knows. Most of the time, hopefully, a check will be done at some point, and subsequent accesses can skip the check.

So I guess C++'s optional type is for a completely different purpose from other languages, which is why I think I was confused.

As much as I dislike most of "modern C++", the one thing they have consistently kept intact is that all the safety guarantees, at least insofar as they have performance implications, are optional. That's how it should be, by design.

There are plenty of languages that make implicit, opaque tradeoffs in order to give you some measure of protection from programmer errors. C++ was never meant to be among them.

The whole point of this new feature is to add safety. If you want the old semantics, use null pointers. Adding a feature which creates the illusion of safety without providing it is not helpful.

No, the point of this new feature is to add safety where you want it. That's not the same as mandatory safety. You can use the safe semantics throughout your codebase while retaining the option of direct ("unsafe") dereferencing wherever you prefer not to pay the costs of the safety checks. That's a basic requirement of how any new feature should work in a performance-oriented language.

C++ is and always has been about giving the programmer tools and leaving the decision up to the programmer. If you want a language that does the choosing for you, just use something else.

This is consistent with e.g. std::vector::at vs operator[]. The former throws an out-of-bounds exception, the latter has ub on bad index.

Like it, don't like it, that's cool. But it is consistent.

If you want exceptions, use 'value()'. If you want minimum runtime checks you use operator*/->. This is perfectly consistent with the spirit of C++ to give you maximum control over your code.

constexpr I think is the big difference. Also the name of the empty class to reset the value to none, boost is boost::none whereas the std has std::nullopt_t. I prefer the boost name of none as nullopt sounds and looks too much like null.

With all the modern features being used, I wonder how one would go about learning modern C++.

There are several good books about c++11 and c++14, though I'm not sure about 17.

"Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14" https://www.amazon.com/Effective-Modern-Specific-Ways-Improv...

"Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers (C++ In-Depth Series)" https://www.amazon.com/Discovering-Modern-Scientists-Program...

If you know the basics, cppreference.com is your best bet.

And reading real code. It’s important to see how these pieces all go together. Just reading cppreference is like familiarizing yourself with legos without seeing how to build something with them.

That's a good point, and it's not just a problem with C++. As any project matures, the voices of current users drown out the voices of new users, and as a result discoverability gets thrown out the window. How would you engineer a project so as to avoid this fate?

Can’t you still return null from a function? So even if your return type was optional, you’d have to check if it was null in some cases before unwrapping it.

(Edit: Limited c++ experience, thanks for explaining)

No. In Java every object variable is a reference type. They're all implicitly pointers. Hence why all objects can be null. Java just hides all the dereferencing away. "Object myobject;" creates a pointer/reference rather than an actual object.

In C++ you can have pointers to objects, but they're explicitly denoted by an asterisk. The C++ equivalent would be "Object* myobject;"

However, you can also directly have object variables in C++. That's not something that exists in Java. A variable declared like, "Object myobject;" can never be null. The myobject variable is a value, just like i in "int i = 5;" is a value.

To expand on this:

In C++ a useful heuristic could be to think of values like Java primitives.

A C++ integer takes up some amount of space. Maybe it's four bytes on your platform. Integers on your platform take up four bytes. Not "four bytes plus a pointer", but four bytes and that's it.

In those four bytes you can store 2^32 values. There's nowhere else to store whether the integer is null -- if you wanted to signal to someone that the value was "missing", you would either have to,

- Come up with your own "in-band" signal, like "-(2^31) means 'missing'", or

- Have an out-of-band signal, like a separate boolean variable, or keep a pointer to "integer or null". (Pointers being nullable is arguably kinda weird still.)

Functions that return an integer in C++ on your platform will put four bytes "in the right place" on the stack. (Or in a register? I don't really know what calling conventions look like...)

Again, there's nowhere to signal that the value is missing -- if you don't fill in that memory location or that register, well, there will just be some other (undefined) value sitting in its place that the caller will read. Ditto functions that expect integer arguments etc.

Now, this has some interesting consequences: for stack-allocated values in C++, "object identity" tends to be preserved less often than in Java. When you put an object into a `vector`, you'll probably put a copy there. When you mutate that copy or call some function on it, you're not mutating the original. You can get back to "Java semantics" of object identity and nullability etc by using pointers and explicit heap storage.

> Functions that return an integer in C++ on your platform will put four bytes "in the right place" on the stack. (Or in a register? I don't really know what calling conventions look like...)

On a register. I don't know of any mainstream platforms which return simple values like an integer through the stack.

For larger values (larger than a word or a couple of words), on the other hand, usually the calling function passes the called function a pointer to a location on the stack where it should write the result.

Either way, like you said "there's nowhere to signal that the value is missing".

Nitpicky detail: a lot of the time, the pointer variable passed to be written to is not on the stack, e.g. using strcpy with a malloc'd buffer.

I believe he is taking about the native retuning semantics, which is in a real register for simple types or organised to be on the stack by the caller or register frame for a larger value type. Point being, it's a compact primitive reflection of the machine, not much more.

C++ also has references, which are like pointers in some ways, except that references cannot be null:

  Object& obj = other_obj;
For this reason, when passing objects by reference, references should be preferred to pointers where possible. Consequently, I would translate Java code like:

  void f(Object obj) { ... }
  Object obj = ...
Into C++ code like:

  void f(Object& obj) { ... }
  Object& obj = ...
(Setting aside various other concerns like move semantics and const-correctness.)

A nice thing about references (to classes) is that you can use the dot and not the silly arrow which I don't know why exists in the first place.

No, this isn't java/c#. In C++ only a pointer can be set to nullptr or NULL. std::optional is a value that can store a type T(T is a place holder for a type name). If T is a pointer (such as T * or int *...) then yes you could return null. But that is rare and looks weird.

So the value of an optional is either a valid value or it has no value. The default constructed std::optional<T> has no value

C# has value types, you can't set structs to null.

True. Most things are reference types though

No in the case of Nullable, which behaves just like std::optional.

I see what you are saying, yes.

Also, in c++ if you try to use an empty value in an optional<T> it isn't using undefined behavior as dereferencing a null pointer would, plus with optional you get value semantics so things like copy are deep vs copying the pointer. Not sure about C# nullables, but I guess if you copy the .value member it is deep.

uh ? unlike languages such as java, or c#, objects aren't nullable in C++, only pointers to objects.

You can't do for instance

    struct foo {
      // whatever

    foo my_function() {
      return nullptr;
and thus,

    std::optional<int> my_function() {
      return nullptr;
does not compile (rightly).

Wouldn't it be more correct to say that Java and C# don't have nullable objects either, it's just that they're always used via pointers, so any object reference can be nullable, unlike C++, which allows objects to be passed by value and thus not be nullable?

C# is not like Java.

    struct Vector
        int x, y;

    public class Program
        public static void Main()
            Vector v = null;

Also does not compile:

"Cannot convert null to 'Vector' because it is a non-nullable value type"

well, change that struct into class and it does AFAIK :p

Basically, just given `public foo myFunction() { ... }` there's no easy way to know

Sure there is, you will see foo is a struct by simple mouse hovering, no one sane uses C# with Notepad.

Structs are value types and cannot be heap allocated like in C++.

Doing new foo() will only call the constructor while stack allocating, while in C++ will put it on the heap.

Also, the only way structs ends on the heap is by implicit boxing when converting to an interface.

Finally, yes structs can be nullable, only when they are declared as such, Typename?.

> Sure there is, you will see foo is a struct by simple mouse hovering, no one sane uses C# with Notepad.

Does mouse hovering work when you're reviewing a diff on gitlab?

On that case, you do just like in C++ and look at the type definition.

Unless you can magically understand it is a struct just by looking at the name, instead of a typedef, using definition or even a macro.

Nope. class and struct are almost the same thing in C++ .

yes, I was talking about C#.

sorry, got confused given that the discussion was originally regarding C++

No you can't. "null" isn't a specific thing in c++. There's nullptr, NULL, and 0. NULL is just a fancy name for 0.

If for example you have a function with a return type of string, it must return a string. It cannot return nullptr, NULL, or 0. It could return an empty string. But in some cases you might want to distinguish between empty string and no result, in that case you would want to the return type to be std::optional<std::string> .

std::optional is designed for value semantic. It wouldn't make sense to use it to return a pointer-type.

But in C and C++ a pointer is a value.

Sounds like a useless feature in that case

Java has Optional<T> which is actually nullable, unlike the C++ std::optional (which is a value type, not a pointer). So even if it were nullable, it seems that people find value in expressing "This may not return a valid object, you need to check" as an actual object type is useful.


I wish and hope that, in some future version of the Java language, they will change the behavior of Java references so that they are explicitly non-null by default, and require an annotation or signifier of some kind to mark them as nullable.

The Kotlin language (JVM-based) got this right: https://kotlinlang.org/docs/reference/null-safety.html

In the meantime, Java coding best practices dictate treating all references as non-null in first-party code (defensively checking references received from third-party code, such as with Objects.requireNonNull), and avoiding nullable references in favor of Optional.

This won't happen. But Java is likely to get value types, which would be non-null just as in other languages with value types.

The issue of Optional<T> being nullable is an extremely uncommon one - I have seen it happen once ever, and there were way bigger issues going on that happened to manifest that way. It's one of those "that's dumb but ultimately makes no difference" problems that many languages have.

They plan to sort out that problem when value types are finally part of the JVM.

In case, what do you say about returning a class instance from a function in Java? Joking aside, the feature gives you both a value type and documents the intent of function intentionally possibly not returning a value.

This will be C++ std::expected:


Returns a value (as in std::optional) than can be null. If so you will get an error explaining what happened.

I wonder if at some point there will be cpp that's kinda like kotlin to java. Source code compatible but a clean cut.

I find this useful for older paradigms where you want an error as well as pass in something to fill in as the output. Instead of doing bool foo(OutFoo* out_foo) or overloading Id getId(); // returns -1 if not present. You could use base::Optional<OutFoo> / base::Optional<Id> in each case respectively.

You may find std::variant also very useful. You can do something like std::variant<Error, Foo>.

Optional is a typical feature that should be baked into the syntax. If used properly, it will be used throughout codebases, and when it is used, it adds simple, replaceable syntax.

I disagree. If a perfectly good optional type can be implemented without making new syntax, why bother making new syntax? It obscures what's going on under the hood, makes implementation needlessly complex, and increases mental overhead for users.

To be fair, not all languages make this easy.

I fail to see how forcing syntax changes (thus breaking all IDEs) is preferable to simply adding a class, or how using a custom specialized keyword brings anything but problems when trying to replace a component.

> In programming, we often come across the situation that there is not always a concrete value for something

In database terms, the need for null values means that the data schema is denormalized. With a simpler approach there is seldom any need for these "not applicable" values.

And that's the problem with all languages with special support for "nullable" values: They glorify bad data structure design. I've seen codebases where 30-50% of the lines is just handling of null values that SHOULD NOT BE THERE and where nothing meaningful (or even functional) happens when the value is actually null.

Goes well together with OOP: In the name of isolation, each object is abstracted from all context necessary to construct a straightforward and correct program. The result is more and more meaningless boilerplate and burnt out developers.

Personally I'm fine with not assigning any value to "not applicable" variables. Or putting a sentinel there, like -1 or NULL. But adding a physical case (i.e. changing the type) to support "not applicable" cases is just not a good idea.

I think it's more helpful to think of C++ std::optional like Maybe in a functional language rather than as like a nullable type. Its primary use case is as a return value from functions, not as a data member. There are many functions where this can make good sense: parsing a string as a number, searching for an element in a collection, trying to load something from a file, taking a square root, etc.

> In database terms, the need for null values means that the data schema is denormalized.

And nobody ever use a fully normalized schema in practice, because it's cumbersome and gets in the way of solving the problem.

And that's for databases, that are optimized stable storage and data coherence, not for performance. A fully normalized schema is a performance disaster if applied to running applications.

What I was saying is that first-class nullable types encourage bad practice. I know that from lots of real world experiences.

Flat tables work surprisingly well. Parent-pointers instead of child-lists work surprisingly well and bring a lot of benefits including modularity/coherence and cache efficiency. All it takes is rethinking the control flow. It's usually possible to iterate tables separately instead of walking trees (the hierarchy mindset).

Doesn’t that also require that you implement indexing? Otherwise you’re iterating over the entire table for every access, which I’ve never found to work well in practice.

There isn't really anything to implement. Basic "entities" get integer ids starting from 0. You just point into the flat array.

Child lists or other structures that don't have natural ids can for example get "indexed" with first- and last-indices like this:

   struct Parent {
      some data;
      Child childFirst;
      Child childLast;

   struct Child {
      some childdata;
      Parent parent;

   struct Parent *parent;
   struct Child *child;

   int parentCnt;
   int childCnt;
These indices are trivial to setup:

   SORT(child, childCnt, compare_child_by_parent);
   for (int i = 0; i < parentCnt; i++) {
      parent[i].childFirst = childCnt;
      parent[i].childLast = 0;
   for (int i = 0; i < childCnt; i++) {
       if (parent[child[i].parent].childFirst > i)
           parent[child[i].parent].childFirst = i;
       if (parent[child[i].parent].childLast < i + 1)
           parent[child[i].parent].childLast = i + 1;
The important thing to see is that most operations are inside whole-table loops. Not in the O(n) sense - each step does something meaningful. Nested loops or complicated data manipulations are drastically decreased compared to OOP approaches.

This is not really true of any mature system. There are many reasons to design a schema in normal form, including performance, maintainability and space optimisation. Database design outside start-ups is a fairly mature area and normal form is well proven good practice.

I will call bullshit on that. I have never seen a database on the 6th normal form on the real world, and I don't believe any exist outside of a lab.

Anything beyond 3rd normal form brings very sporadic gains to maintainability and space optimization. The usual case is that you lose those. About performance, I can imagine there exist situations where you'll gain some by going further than the 3rd normal form, but I don't believe anybody sees that as a normal situation.

Yes, these are good reasons to use normal form. And it's also true that normal form is proven good practice. But in some cases, performance is important enough that adding a denormalized table or two makes sense. This isn't really "proven good practice" in any sense, but it's something that should be evaluated on a case by case basis when there's a compelling reason for it.

It's kind of like the dream of having a sufficiently smart compiler. In theory, a relational database could operate in normal form and give you all the performance you need. In practice, they don't always do that.

In our application we have a form that has at least 50 fields which are individually optional. This is in addition to the required fields, and currently it's all one table.

Are you saying it would be better for performance and space to have all of those optional fields in separate tables, and that it wouldn't be a huge PITA to write queries against for say reporting and while debugging?

Are these 50 fields "logically" relevant? In other words, do you do significant joining at these fields? Or in business logic, do you switch much on the presence of these fields?

I'll assume "no" since they're so many of them and they're individually optional fields. Maybe most of them are strings and you could simply use empty strings. Or whatever, it doesn't matter. I'm not saying these cases are trivial, but they're leaf concerns and not really relevant from a "schema" point of view.

But if the answer is really "yes", then I'll say that's at least a maintainability problem (bad coherence).

They're fields in a digitized form, essentially. Multiple name, address etc sections representing people or companies, for example, as well as domain specific fields which shouldn't be used, are optional or are required depending on the value of a previous field. Most are strings of various lengths, but also plenty of numeric (weight etc) fields as well as dates. Expected arrival date for example, which is optional in most cases but required in some.

Our UI would have to join in all the fields, as it displays them all in one window, and our customers would absolutely hate to split filling them up into multiple windows.

Also, once the form is sent and approved by officials, the data has to be saved without change for 10 years, like saving the paper version of the form for 10 years in a filing cabinet. It also needs to be easily accessible for the users up to 3 years after approval in case one needs to make a correction, which is essentially done using a special copy of the form.

I'm no db expert by any stretch, I've learned by doing. So not gonna claim this couldn't have been handled better.

To expand slightly. Take one of the name and address sets. If the company has a valid EORI number, one should _only_ fill in the EORI number in the organization number field (filling in name etc would lead to error). Otherwise one should supply name, address, country etc. Depending on the value of another field, the name of a contact person is required, otherwise it's optional.

I think these kinds of constraints are much easier implemented as code in a business logic layer. Conventional database schemas with their limited expressiveness are probably not well suited here.

Sure, but that wasn't really the issue. The issue was how to store such data in an optimal way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact