

Notset: A Do-Not-Care value for Python - shmeedogg
https://github.com/rconradharris/notset

======
forgotusername
Everyone bumps into this at some time on their journey through Python, however
personally it's not something I've contended with in years and there is good
reason for that:

> Suppose you have a function which takes a person and allows you to update
> the person's name, age, both, or neither

The problem is not the lack of some fundamental feature, it is one of
obviousness in interface design. A trixy interface as given by the example
leads exactly to the kind of problems the library hopes to eliminate. Instead
how about:

    
    
        def update(person, attrs):
            pass
    
        def update_with_email(person, attrs):
            update(person, attrs)
            send_email(person)
    

Not only is the problem avoided, but a problem of namespace pollution has been
fixed too. Overusing keyword arguments in a hyper-generic manner forces
extension of the code to require definition of a new function in order to
avoid potential breakage.

For example, how does one add a 'use_html=True' parameter to
update_with_email()? Perhaps by adding a 'use_html' kwarg that hopefully
doesn't conflict with Person's attribute namespace, or perhaps by adding
'_use_html', hoping to skirt the problem by introduction of ugliness. For a
'clean' backwards-compatible solution, we're forced into something like:

    
    
        def _real_update_with_email(person, use_html, kwargs):
            update(person, **kwargs)
            send_email(person, use_html=use_html)
    
        def update_with_email(person, **kwargs):
            _real_update_with_email(person, False, kwargs)
    
        def update_with_html_email(person, **kwargs):
            _real_update_with_email(person, True, kwargs)
    

How can the caller dynamically form the attribute names if they need to?

    
    
        # TODO: something seems terribly wrong here, I can't quite put my finger on it.
        update(person, **{'previous_' + attr: value})
    

etc.

I realize abuse of \\* \\* is very much a religious issue, and at first sight,
one of the superficial attractions to Python (at least for me, way back in
time), however with experience it seems to regularly introduce more problems
than it solves outside a few niche uses. The idea of adding an 'undefined'
value has been discussed going back years (try grepping python-ideas and
python-dev) and it's never made it in for good reason.

There is one place where an 'undefined' might seem useful at first, for
example in the implementation of `dict.pop()` where a missing second argument
signals the need to raise KeyError. The problem is that no published, public
value including 'undefined' can be used as placeholder without introducing
another ugly rule to the language: the ability to use `pop(..., default)` with
any default value except 'undefined'! (net simplicity gain: zero)

~~~
to3m
As a general rule of thumb (standard disclaimers apply), I've found it better
to have several functions than one function, if that one function is going to
do different operations according to the parameters passed in. ("Operation" is
a vague term, but I think setting something vs not setting something would
count.) It's all too common for the operation to end up being fixed at the
call point, for every call point, and therefore for the path through each call
to be the same each time. The parameter/argument system is the wrong mechanism
for that.

This line of thinking was inspired by C's `fopen' (no doubt now that I've said
that it's going to turn out that I'm the only person ever to have ended up
using a string literal for the mode parameter 100% of the time). But I suspect
it would be the case for this function too.

------
reinhardt
The problem with this idea, or rather its implementation, is that it's just a
matter of time until someone uses NotSet as a legitimate value assigned to a
variable/attribute, just like None is today. At this point someone will
introduce a new singleton, "NotSetIReallyMeanItThisTime", and so on and so
forth. It never ends.

The only way this might work is if NotSet (or whatever it's called) is a
keyword and it is only allowed in (a) function signatures and (b) comparison
with `is` and `is not`; everything else throws a syntax error.

~~~
sltkr
> it's just a matter of time until someone uses NotSet as a legitimate value
> assigned to a variable/attribute, just like None is today.

No, because the NotSet value (unlike None!) isn't global; it's private to the
package (or even class) that uses it. Callers never need to reference it, and
never should.

(Python doesn't actually enforce access restrictions but using undocumented
variables/attributes is frowned upon; if you do that, your code deserves to
break!)

If you really want to hide it (to prevent mistakes), you could write something
like this:

    
    
        NotSet = object()
        
        def isset(val, magic=NotSet):
            return val is magic
        
        def update(person, name=NotSet, age=NotSet):
            if isset(name):
                person.name = name
            if isset(age):
                person.age = age
        
        del NotSet
        
        update(None,name='foo')
        update(None,age=27)
    

(I'm sure there are still ways to get to the NotSet value if you really want
to, but not by accident, and if you abuse this, you deserve all the problems
you'll receive.)

------
FuzzyDunlop
My first impression is that this isn't a problem in need of solving; it just
needs a change in approach.

The first is the conflation of classes and functions that work with classes.
The update function in the example isn't reusable at all, implies you can
update something other than a Person, and 'NotSet' doesn't fix that. So have
it as a method on Person, and pass in a list of attributes to change as
opposed to enumerating each field as a named parameter. You have the fields on
the class for more fine-grained control, and functions like this don't
necessarily make the code clearer.

Given that, I don't think the example presents a valid use-case for
implementing 'NotSet' or whatever you want to call it. The problem is in the
implementation, not Python, and the solution is a hack to enable you to
continue with this approach.

~~~
gizmo686
Their is no way to tell that the example function is not a method of a Person
class.

The code: "def update(person, name=None, age=None):"

is equivelent to the code: "def update(self, name=None, age=None):"

although if you chose to name the 'self' variable something other than 'self',
you probably deserve to have problems.

~~~
shmeedogg
Agreed, I think the distinction between a bound method or a function is
irrelevant here.

If we want explicit kwargs (a big 'if' since many of the suggestions in this
thread are to give up on that idea), then we need some value to distinguish it
from None.

------
sehrope
This is one of the features of Scala I like; explicit types for
Option/Some/None in the core language and standard API. It's generally used
for return types (ex: a hashmap 'get' will have a return type of
Option[ValueType] and return either Some[ValueType] or None) but you can use
it for function parameters as well.

    
    
      scala> def foo(name:Option[String] = None, age:Option[Int] = None) = {
           |     println("==========")
           |     if( name.isDefined )
           |         println("Name: " + name.get)
           |     if( age.isDefined )
           |         println("Age: " + age.get)
           |     println("==========")
           | }
      foo: (name: Option[String], age: Option[Int])Unit
      
      scala> foo()
      ==========
      ==========
    
      scala> foo(Some("Alice"))
      ==========
      Name: Alice
      ==========
      
      scala> foo(age = Some(10))
      ==========
      Age: 10
      ==========
    

Plus with some implicit syntactic sugar...

    
    
      scala> implicit def strToSome(s:String) = Some(s)
      strToSome: (s: String)Some[String]
    
      scala> implicit def intToSome(i:Int) = Some(i)
      intToSome: (i: Int)Some[Int]
     
      scala> foo("Alice")
      ==========
      Name: Alice
      ==========
      
      scala> foo(name = "Alice")
      ==========
      Name: Alice
      ==========
      
      scala> foo(age = 10)
      ==========
      Age: 10
      ==========
      
      scala> foo(name = "Alice", age = 10)
      ==========
      Name: Alice
      Age: 10
      ==========

~~~
Evbn
Yeah, not sure why there is so much confusion in pythonland over an issue that
has a trivial solution that was discovered ages ago, moments after the problem
first appeared.

------
timClicks
Sorry for the bikeshedding, but double negatives are a real pain. Not so much
for the code itself, but for talking to colleagues and giving talks. Off the
top of my head, Empty would could work.

~~~
aroberge
Personally I would use "assigned" 1) to better reflect the meaning and 2) to
follow PEP-8 and not use an uppercase.

~~~
shmeedogg
Uppercase was chosen to match Python's existing singletons, `True`, `False`,
and `None`. Lowercase might look a little too much like a variable, and all-
upper might look too much like a module-level const, IMHO.

------
sowhatquestion
I can't remember the specifics offhand, but I used the built-in value
"NotImplemented" to address a problem like this once.

~~~
rdtsc
That's good hack!

I personally prefer Ellipsis. Its meaning make more sense to me (it Greek
origin mean "omission") i.e. "Not Set" so it almost perfectly matches.

------
csense
The __kwargs form is a more obvious API. When your users grow more fields --
email, registration date, avatar -- this code will be reusable, whereas a
function with a signature will need its arguments changed whenever the
database schema changes.

The three-valued logic of set-to-something, set-to-None, don't-set is
perfectly adequately captured by a dictionary. You're introducing an
application-specific concept (the NotSet value) when a built-in concept (a
dictionary) works just fine.

~~~
shmeedogg
> whereas a function with a signature will need its arguments changed whenever
> the database schema changes.

True there is some work in keeping the two synchronized, but there are
benefits.

First, unexpected arguments are immediately caught since they throw a
TypeError.

Without this, you either have to manually check for unexpected keys (probably
doing a set difference with `allowed_keys` or something) or you just silently
pass through unrecognized attributes, probably causing strange behavior later
on.

Second, you are forced to say explicitly which attributes are modifiable. To
draw from the 'person' example, `name` and `age` might be modifiable, but
`admin` might be protected. That would be made abundantly clear by
`update(person, name=NotSet, age=NotSet)`, but less so, by `update(person,
attrs)` or `update(person, __kwargs)`.

A clear docstring would help, but I'd prefer to have the code just fail-fast
on this unexpected input.

~~~
j-kidd
> First, unexpected arguments are immediately caught since they throw a
> TypeError.

> Without this, you either have to manually check for unexpected keys
> (probably doing a set difference with `allowed_keys` or something) or you
> just silently pass through unrecognized attributes, probably causing strange
> behavior later on.

The default constructor for SQLAlchemy declarative base does a simple check
for unexpected keys, and it has served me well:

[https://bitbucket.org/sqlalchemy/sqlalchemy/src/acbaeb1acb7d...](https://bitbucket.org/sqlalchemy/sqlalchemy/src/acbaeb1acb7d/lib/sqlalchemy/ext/declarative/base.py?at=default#cl-406)

> Second, you are forced to say explicitly which attributes are modifiable. To
> draw from the 'person' example, `name` and `age` might be modifiable, but
> `admin` might be protected. That would be made abundantly clear by
> `update(person, name=NotSet, age=NotSet)`, but less so, by `update(person,
> attrs)` or `update(person, kwargs)`.

Whether a field is modifiable is often determined by the current user's access
level and the current state of the object. So, putting such restriction at the
function definition may have made things too rigid.

~~~
shmeedogg
> Whether a field is modifiable is often determined by the current user's
> access level and the current state of the object. So, putting such
> restriction at the function definition may have made things too rigid.

True, let me try to clarify. There might be some attributes like `admin` that
you don't want twiddled via the `update` method but rather mutated via a
setter function. In this case, the signature of the `update` function would be
helping to indicate that.

(A better example might be the attribute `active` with two methods called
`activate` and `deactivate` that send emails and what-not.)

------
notdonspaulding
The sentinel value you've used here is the right solution. It keeps the
keyword arguments explicit and gives None its value back. However, it doesn't
deserve a module global-to-python to live in. The cases where you use this
pattern are broader than just "NotSet".

Good pattern, and it needs to be more widely-known, but it doesn't need to
occupy the space of a module on PyPI. ;-)

~~~
shmeedogg
I'd love to see this integrated into the language and live along side of
`None` (I can dream, can't I?)

But until then, how can two different libraries agree to use the exact same
instance of `NotSet` without some standard package defining it?

The goal is really to be able to 'proxy' this do-not-care condition from
applications into libraries in a standardized way.

~~~
notdonspaulding
I'll repeat my advice to take this to the python-ideas mailing list. You'll
get real reasons why this shouldn't (or should) be included in __builtins__.

As far as two libraries agreeing on using the same instance, one can define it
the way you've done with `NotSet = object()`, and they can both use it just
like they share any other named object (classes, functions, constants,
sentinels). Better yet, it can carry a more meaningful name when "NotSet"
doesn't really describe how they're using the sentinel.

------
yuvadam
Without commenting on the suggestion itself, this really should be submitted
as a PEP [1] and discussed in that context.

[1] - <http://www.python.org/dev/peps/>

~~~
notdonspaulding
PEPs often begin life as simple emails (in the general form of the original
post) to the python-ideas mailing list.

<http://mail.python.org/mailman/listinfo/python-ideas>

It's a list that's generally friendly to new ideas and you'll get some
feedback from some old-and-crusty language designers. Which is neat.

------
xnxn
This problem also happens in the context of HTTP PATCH, and you can solve it
in the same way. Make update accept a single argument (a dict), and don't
supply any keys you don't want to overwrite.

------
rdtsc
Why not just use Ellipsis? That is a good "not set value". It is unique so
others' Ellipsis is also your Ellipsis. It is already there, don't need to
import anything, do any git pull or such.

~~~
shmeedogg
Interesting. I love the idea of not having to import an external library, but
much like the `NotImplementedError` suggestion, it overloads a value that
already has a specific meaning, in this case related to slice-notation.

I'd worry that this approach could end up being even more confusing.

~~~
rdtsc
Yeah it is a hack but in 6 or 7 years of using Python I haven't once needed to
use Ellipsis for its intended purpose. I always end up using it though for
"not set" where None is one of the valid set values.

------
buster
What's the difference to using "if name is not None:" in the first example?

~~~
shmeedogg
That example could probably use some clarification, but the idea is that
`name` and `age` in that example are both nullable, so passing in `None`
(setting the column to NULL) is different than omitting the value (leaving the
column unchanged).

~~~
eurleif
I think this could handled more elegantly if Python supported an Option type,
by making 'name' an Option[Option[String]]

------
cmccabe
better solution: don't use default parameters. They're a misfeature in every
language they appear in.

------
DasIch
I'm impressed by how much effort one can put into something that is
effectively nothing else than:

    
    
        NotSet = type("NotSet", (object, ), dict(__repr__=lambda self: "NotSet"))
    

There is no amazing concept, no new idea or even anything remotely interesting
at all here. This is but a side-effect of a type system that encourages the
usage of None in this way and the fact that None is generally used to mean
undefined in keyword arguments.

Why would anyone consider this notable, useful or even important enough to
warrant an upvote?

~~~
shmeedogg
The implementation here is just a suggestion, and trust me, it wasn't much
effort ;-)

The aspect that interests me is that this is a problem that crops up
occasionally in different Python projects and there doesn't seem to be a
recognized best-practice for addressing it.

(The ` __kwargs` approach seems like the most common but has all the downsides
I mentioned.)

