I've never needed such in 3 decades of systems design. I'd like to hear the deta...

gregmac · on Feb 3, 2020

A common situation I've run into is with "default" values and overrides, especially for configuration-type settings. NULL indicates use parent record value, while non-NULL, including empty string, means to use that value. By allowing empty string, you explicitly allow a user to basically say "don't inherit". Think along the lines of `COALESCE(userValue, tenantDefaultValue, "Default Value")`.

One way of implementing the UI for this is to have a checkbox labelled "Inherit" or "Use default", and if it's checked, disable the actual textbox (and put NULL in the field).

I've also run into similar patterns with data telemetry. I want to accurately represent the data I received, and so if some older version doesn't include a particular value, I want to store that as NULL, because empty string is a potentially valid value. If you "normalize" NULLs to empty string, and then it makes it impossible to answer a question like "What % of systems have a value for this data item?" since it includes versions that can't get it at all.

SigmundA · on Feb 3, 2020

Did the user leave the field blank in the form or did they never even see it (multi step processes etc.)

Did you want to update the field to null or not update it at all is another one.

tabtab · on Feb 3, 2020

What exactly does "not see" mean? Some use tab, others use the mouse. That doesn't tell us much and there are better ways to track user hand/mouse movements if you need such telemetry. UI api's often handle nulls/blanks different such that you don't want to over-rely on how one of them does it. Multi-screen tasks should track when each sub-screen is finished, not just the final transaction. I'd like to see a more specific use-case.

Re: Did you want to update the field to null or not update it at all is another one.

Who, the user or DBA? And why? Users don't know a Null from a horse; you shouldn't make them think about that, that's an internal guts thing.

kaitai · on Feb 4, 2020

I have to say I'm a bit amused at the insistence on tabs/mice/telemetry :)

In at least one project I worked on, there was a lot of survey data entered from paper surveys used in the 90s and early 2000s. The structure of the survey included pass-through questions: "If you have ever smoked a joint, please check yes and answer questions 42 and 43. If you have not ever smoked a joint, please check no and TURN THE PAGE."

One can certainly build a logic to process these replies (check answer to question 41, tally blanks in questions 42, 43 according to answer to question 41) but since these questions and answers were also entered into the computer in the olden days, NULL was used if questions 42, 43 were passed through, while blank was used if questions 42, 43 were left blank (and 41 was yes).

How times have changed.

SigmundA · on Feb 3, 2020

Was the user prompted to enter the value but left it blank or where they never prompted. Yes you could always have more data, but this extra two value distinction is common and practical in my experience.

On updating its related to prompting typically, again communicating user intent, did the user update a field from having a value to not having a value (they purposely blanked it out) or they didn't touch the field and maybe it wasn't even shown so don't modify it. Basically am I going to generate a update statement with set field = null or no set at all for that field. This is trivial in json to send through application due to null vs undefined.

tabtab · on Feb 3, 2020

Re: but this extra two value distinction is common and practical in my experience.

I'm still not following. What exact hand or keyboard/mouse movements constitute "left it blank"? "Leave" means to "move away from". What is moving away from what?

Usually one stores a time-stamp for input prompts (screens). If the prompt never appears, you then have no time-stamp and/or record for it. Why would a broken prompt create a record to begin with? I'm missing some context you seem to be assuming. You may be assuming behavior of a specific UI product or stack.

SigmundA · on Feb 3, 2020

Not sure I can explain more simply, just think multi part wizard interfaces or forms with button that open up sub-forms.

I think you get it since you talk about time stamps. Just as you could record all mouse movements and keyboards to get higher fidelity you could break a single record into multiple with times stamps to record each step in a form, then the lack of record would be distinct from a record with all nulls along with time stamps. You could also do a record per field (which I have seen more than once) with metadata per field as other columns.

But without all that a system that supports null and some empty value gives you more fidelity than just null that again in my experience is practical.

Empty string and null work fine with strings for this purpose but for other datatypes you start needing to pick special values like the minimum value for a 32 bit int or 0 date etc.

tabtab · on Feb 3, 2020

Usually you don't get the entire sub-screen's data such that a blank versus null distinction wouldn't help. You can't "half submit" a dialog or sub-screen on the vast majority of systems I ever worked with it, and one adds time-stamps to critical sub-screens if they can.

Maybe you encountered a specific product's edge-case where it helps, but I'd rather trade away 9 other headaches to get your edge case headache. It may be the case that Product X doesn't allow time-stamps, but should we gum up all strings to make just Product X happy? I don't see how the tradeoff math favors that. Plus, it encourages bad designs by plugging problems with funny non-printable codes. Let's not enable slop-heads.

Re: but for other datatypes

For the purpose of this discussion I'm limiting it to strings, where the majority of null-related headaches are found in my experience.

stronglikedan · on Feb 3, 2020

I just want to know if a boolean has been explicitly set, without having to check the value of another more complex data type that would never be used for anything else.

tabtab · on Feb 3, 2020

To clarify, my context is strings. If that was not clear, my apologies.

pushpop · on Feb 3, 2020

I’d go further than you and say they should be removed by default on all fields.

Want to know if a Boolean field is unset? Well it’s no longer Boolean because you now have 3 states for that field. So why not use a char, U/Y/N with the default being U?

NULL might have made more sense 30+ years ago when systems were more resource constrained but it doesn’t make sense now for most of the problems people are trying to solve day to day. If anything, it creates more problems.

Just to be clear, I’m not saying they should be removed entirely but rather that they shouldn’t have to be explicitly disabled on every CREATE.

I will say one use case for NULL that is hugely helpful is outer joins and nested queries. However these don’t generate high performance queries so if you’re having to rely on them then you might need to rethink your database schema anyway.

So essentially I don’t disagree with you, I just think you’re being too nice limiting your complaint to string fields.

williamdclt · on Feb 3, 2020

> Well it’s no longer Boolean because you now have 3 states for that field. So why not use a char, U/Y/N with the default being U?

Well because instead of using a type that exactly encodes the concept of "yes/no/unset" (nullable boolean), you'd be using a type that encodes "any 1-character text, with arbitrary meaning and most of the values being nonsensical"

pushpop · on Feb 4, 2020

The problem is you need a boat load of additional code to cover unset. Not just in SQL (syntax is different for NULL than it is for comparing any other type) but often also in your importing language too (eg some languages will cast NULL to a nil value that can actually raise exceptions or even crash your application if not handled correctly).

Capturing those edge cases is non-trivial compared checking the value of a char.

In an idea world your unit tests and CI pipelines would catch all of those but that depends on well written tests. Thus in my experience having fewer hidden traps from the outset is automatically a better design than one that perfectly fits an academic theory but is harder to support in practice.

setr · on Feb 4, 2020

It'd probably be more sane than trying to stuff a 3VL into bunch of 2VL operations, because you refuse to acknowledge that you don't actually have a 2VL type