
Weird Python Integers - luu
https://kate.io/blog/2017/08/22/weird-python-integers/
======
adtac
[https://github.com/adtac/exterminate](https://github.com/adtac/exterminate)

Plugging my near useless Python library that does this and a lot of other
subtle, annoying things to break programs. The library is essentially a
display of how much Python actually exposes to the user and how modifiable it
is.

~~~
rosstex
Why does my inner self find it immensely fun to modify programs in subtle ways
to watch how they break? Considering there's infinite ways to write junk code,
and far fewer ways to make it work.

~~~
hobolord
It's probably what got me interested in programming in the first place.
Writing C code and learning about pointers to memory, and playing around with
memory locations to see undefined behavior could be invoked

------
std_throwaway
Summary: Integers in python are full blown objects. Small numbers are stored
in a central preallocated table where each entry represents one number.
Setting a variable to a small integer makes it point to an entry in that
table. Multiple variables may point to the same small integer objects in that
table. Fooling around with the table leads to funny results.

~~~
concede_pluto
It's just like Fortran, where everything _including a constant_ is passed by
reference, so if you assign to your arguments you might corrupt your program's
only copy of 7 unless the linker put it in a read-only page.

~~~
andars
Could you be more specific? Fortran passes arguments by reference, but writing
to an argument won't mess up the "program's only copy of 7".

For example, the first output of the following program is 4, but the second is
still 7.

    
    
           program main
           integer m,n
    
           m = 7
           call corrupt(m)
           write (*,*) m
    
           n = 7
           write (*,*) n
           stop
           end program main
    
           subroutine corrupt (a)
           integer a
           a = 4
           return
           end

~~~
mastax
I think he means something like

    
    
       call corrupt(7)
    

which causes a segfault for me. I suppose if you were using a
compiler/platform that didn't store constants in read only memory, this might
actually work.

~~~
alephnil
All newer versions of Fortran will segfault, but back in the days they would
not. Back in the ninties I fixed a bug we found when porting a Fortran 77
program from HPUX to Linux. The program would segfault on Linux, but worked on
HPUX.

The reason was that in one subroutine, a parameter value was stored in a local
variable, then used for computation and restored at the end. Since constants
was stored in read only memory when using g77 on Linux, but not on the f77
compiler on HPUX, the Linux port would crash, but not the original HPUX
version. In that the code you had above would have worked.

~~~
emmelaich
Yeah, a "fun" thing to do was to change the value of built-in constants such
as pi.

~~~
DonHopkins
That feature was required by the Indiana General Assembly.

[https://en.wikipedia.org/wiki/Indiana_Pi_Bill](https://en.wikipedia.org/wiki/Indiana_Pi_Bill)

~~~
emmelaich
More like ridiculed than required.

I was going to mention Indiana but was more hoping that it wouldn't be
mentioned at all.

------
woodrowbarlow
the python documentation [1] says the following:

> The current implementation keeps an array of integer objects for all
> integers between -5 and 256, when you create an int in that range you
> actually just get back a reference to the existing object. So it should be
> possible to change the value of 1. I suspect the behaviour of Python in this
> case is undefined. :-)

does anyone have any idea how they chose that range? it's a 262-wide block
starting at -5, which seems incredibly arbitrary.

[1]
[https://docs.python.org/2/c-api/int.html](https://docs.python.org/2/c-api/int.html)

~~~
masklinn
You could probably find out from the code's log. I'd guess the small positive
integers are common due to e.g. iteration or len() of small collections and
the like, and the very small negatives are due to things like error values.

~~~
jwilk
I did some git archaeology. Here are relevant commits and bug reports:

* Initial commit with -1...99 cached:

[https://github.com/python/cpython/commit/842d2ccdcd540399501...](https://github.com/python/cpython/commit/842d2ccdcd540399501a918b9724d2eaf5599f39)

* Cache more negative numbers, -5...99:

[https://github.com/python/cpython/commit/c91ed400e053dc9f11d...](https://github.com/python/cpython/commit/c91ed400e053dc9f11dd30c84e2bb611999dce50)

[https://bugs.python.org/issue561244](https://bugs.python.org/issue561244)

* Cache more positive numbers, -5...256:

[https://github.com/python/cpython/commit/418a1ef0895e826c65d...](https://github.com/python/cpython/commit/418a1ef0895e826c65d4113be5d86891c199e15d)

[https://bugs.python.org/issue1436243](https://bugs.python.org/issue1436243)

~~~
woodrowbarlow
good find! here's a summary:

it looks like they searched the standard library for negative integer literals
and that's how they settled on -5 for the low end.

the high end came after introducing the `bytes` object. it went up to 256
instead of just 255 for size/len checks.

------
boramalper
> We can use the Python built-in function id which returns a value you can
> think of as a memory address to investigate.

> [...]

> It looks like there is a table of tiny integers and each integer is takes up
> 32 bytes.

It _is_ the memory address but it's a "CPython implementation detail: This
[return value of the id() function] is the address of the object in
memory."[1]

Though you _cannot_ use this to determine the size of an object, or rather you
"shouldn't" because that assumes a very specific implementation detail, which
isn't there.

If you'd like to get the size of an object, use sys.getsizeof().[2] Also keep
in mind that containers in Python does not contain the objects themselves but
references to them so the returned size is the size of the object itself only,
non-recursively. Read "Is Python call-by-value or call-by-reference?
Neither."[3] for some more details.

[1]:
[https://docs.python.org/3.6/library/functions.html#id](https://docs.python.org/3.6/library/functions.html#id)

[2]:
[http://docs.python.org/3.6/library/sys.html#sys.getsizeof](http://docs.python.org/3.6/library/sys.html#sys.getsizeof)

[3]: [https://jeffknupp.com/blog/2012/11/13/is-python-
callbyvalue-...](https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-
or-callbyreference-neither/)

~~~
int_19h
Python memory model is amazingly simple and consistent: everything is passed
and stored by value, but all values are references to objects.

------
DonHopkins
"Is", "is." "is"—the idiocy of the word haunts me. If it were abolished, human
thought might begin to make sense. I don't know what anything "is"; I only
know how it seems to me at this moment.

— Robert Anton Wilson, The Historical Illuminatus Chronicles, as spoken by
Sigismundo Celine.

[https://en.wikipedia.org/wiki/E-Prime](https://en.wikipedia.org/wiki/E-Prime)

Kellogg and Bourland describe misuse of the verb to be as creating a "deity
mode of speech", allowing "even the most ignorant to transform their opinions
magically into god-like pronouncements on the nature of things".

Bourland and other advocates also suggest that use of E-Prime leads to a less
dogmatic style of language that reduces the possibility of misunderstanding or
conflict.

Alfred Korzybski justified the expression he coined — "the map is not the
territory" — by saying that "the denial of identification (as in 'is not') has
opposite neuro-linguistic effects on the brain from the assertion of identity
(as in 'is')."

~~~
logicchains
This is part of the reason why I prefer structural types (like in Typescript,
Go and OCaml) over nominal types (like in most languages), as any object with
the required methods is automatically an instance of such a type, instead of
having to explicitly declare that it "is"/"extends"/"implements" that type.

I suspect that with dependent types, nominal types are actually a degenerate
type of dependent structural type: a dependent pair of a type and a proof that
a string 'name' field has some particular value, or that a list<string>
'implements' field contains some particular string (the interface name).

~~~
DonHopkins
Which is a great approach if you're slinging JSON between different languages
and implementations.

Python is fine with duck typing.

~~~
logicchains
It's also really good for testing: if you want to mock some opaque third-party
object, you can just create an interface with the same methods as that object
and use it in your code. Nominal types often don't allow this, such as how in
Java you can't make a class to which you don't have the source implement your
own interface, or make it painful, like orphan instance restrictions in Rust
and Haskell.

------
mbell
Ruby does something similar, but all Fixnum (native sized) values are 'fixed
objects':

    
    
        a = 2**62 - 1
        b = 2**62 - 1
    
        a.object_id == b.object_id # true
    
        a = 2**62
        b = 2**62
    
        a.object_id == b.object_id # false
    

Ruby does automatic promotion from Fixnum (native size) to Bignum (arbitrarily
large) and uses one bit of the native size as a flag to identify this which is
why 2^62 - 1 is the max instead of 2^63 - 1. Though I think this is only true
of MRI and other implementations handle it without the flag bit.

Perhaps one difference from Python is that in MRI Ruby Fixnum doesn't really
even allocate an 'object', the object_id is the value in disguise. In fact all
'real' objects have even object_ids and all odd object_ids are integers:

    
    
        a = 123456789
        (a.object_id - 1) / 2 # 123456789

------
squeaky-clean
I wrote a blog post about this in the past. It's really fun going through the
oddities of the language like this.

It caches small integers, but also literals used in the same interpreter
context (I'm probably getting that last term wrong). You'll get different
results if you run these in from the shell as opposed to executing a script,
try it out!

Here's a fun example

    
    
        >>> x = 256; x is 256
        True
        >>> x = 257; x is 257
        True
        >>> x = 257
        >>> x is 257
        False
        >>> def wat():
        ...   x = 500
        ...   return x is 500
        >>> wat()
        True

~~~
kmill
The Python interpreter compiles programs into bytecode first, and the bytecode
includes instructions that load a constant from the constant pool. As an
example,

    
    
        >>> def f():
        ...     x = 2222; y=2222
        ...     return x is y
        ...
        >>> f()
        True
        >>> f.func_code.co_consts
        (None, 2222)
    

This last line is showing the constant pool, which is just a standard Python
tuple.

I believe the REPL compiles each input in a new toplevel module context, so
each input gets its own constant pool.

Functions get their own constant pools, which explains the following behavior:

    
    
        >>> if True:
        ...     def f():return 2222
        ...     def g():return 2222
        ...
        >>> f() is g()
        False

~~~
katee
I'm glad you and squeaky-clean wrote these comments. When I was experimenting
in the Python REPL, I was confused by the last line here:

    
    
        >>> 100 is 100
        True
        >>> (10 ** 2) is (10 ** 2)
        True
        >>> (10 ** 3) is (10 ** 3)
        False
        >>> 1000 is 1000
        True
    

I used the disassembler, but I completely missed that although `1000 is 1000`
and `(10 __3) is (10 __3)` both get optimized to nearly identical bytecode
they load different constants. I wrote it up in a new post and thanked you
both.[https://kate.io/blog/2017/08/24/python-constants-in-
bytecode...](https://kate.io/blog/2017/08/24/python-constants-in-bytecode/)

------
lispm
Lisp systems also have fixnums and bignums. For example a 64bit Common Lisp:

    
    
      MOST-NEGATIVE-FIXNUM, value: -1152921504606846976                                      
      MOST-POSITIVE-FIXNUM, value: 1152921504606846975        
    

Fixnums are typically stored inline in data structures (like lists, arrays and
CLOS objects). Bignums will be stored as a pointer to an heap-allocated large
number. Data has tags and thus in a 64bit Lisp the fixnums will be slightly
smaller than 64bit. Bignums can be 'arbitrary' larger and there is automatic
switching between fixnums and bignums for numeric operations.

~~~
junke
The range varies across implementations: the above values correspond to 60
bits; most-positive-fixnum evaluates to 4611686018427387903 in the version of
SBCL I am currently running, which corresponds to 62 bits of data.

------
ghewgill
Lots of good discussion at this Stack Overflow question (2008):
[https://stackoverflow.com/q/306313/893](https://stackoverflow.com/q/306313/893)
(Python “is” operator behaves unexpectedly with integers)

------
jedberg

        In [7]: a = "foo"
    
        In [8]: b = "foo"
    
        In [9]: a is b
        Out[9]: True
    
        In [10]: b = "foobaljlajdfsklfjds l;kjsl;dfj ls;dfj l;skdj flsdjluejsklnm "
    
        In [11]: a = "foobaljlajdfsklfjds l;kjsl;dfj ls;dfj l;skdj flsdjluejsklnm "
    
        In [12]: a is b
        Out[12]: False
    

Seems to work with small and big strings too.

~~~
nneonneo
This would be due to _string interning_ , which automatically caches small
strings, which are very common because attribute names are just strings.

~~~
Dylan16807
To be clear the question of _which_ strings to intern is up to an
implementation of string interning. Only small strings, all strings, or even
only large strings are all options that are good for different reasons.

------
rcthompson
Is there any practical reason to use "is" to compare two ints (other than
demonstrating integer interning)? Should doing so produce a warning?

~~~
yen223
I'll take it one step further - I don't think I've ever used `is` for anything
other than comparisons to `None`.

~~~
bobbyi_settv
It's useful when you want a sentinel object and you don't want to use None as
the sentinel because it is a valid value. Some discussion here:

[http://www.ianbicking.org/blog/2008/12/the-magic-
sentinel.ht...](http://www.ianbicking.org/blog/2008/12/the-magic-
sentinel.html)

------
avyfain
The case of strings is also pretty interesting: [http://guilload.com/python-
string-interning/](http://guilload.com/python-string-interning/)

------
timonoko
I remember implementing this too on Nova 1200. When the address space is
bigger than the memory, you can place those integers outside the memory. Those
objects do not actually exist in other words. Saves you memory cycles too,
because you can calculate the numeric value from the address.

------
Aliyekta
[https://stackoverflow.com/questions/132988/is-there-a-
differ...](https://stackoverflow.com/questions/132988/is-there-a-difference-
between-and-is-in-python/1085656#1085656)

------
tomsmeding
You can also do a similar thing in Java, as illustrated in this answer on
CodeGolf stackexchange:
[https://codegolf.stackexchange.com/a/28818](https://codegolf.stackexchange.com/a/28818)

~~~
leibnitz27
Indeed - the integer boxing cache is even required by the language spec!

[http://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#...](http://docs.oracle.com/javase/specs/jls/se8/html/jls-5.html#jls-5.1.7)

(here == means reference not value equality)

"If the value p being boxed is an integer literal of type int between -128 and
127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a
character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let
a and b be the results of any two boxing conversions of p. It is always the
case that a == b."

------
throwaway613834
Does anyone know why Python refcounts _everything_ \-- even small integers,
True, False, None...? Why not avoid it?

~~~
chrisseaton
I don't know the internals of Python, but maybe checking if you need to
refcount something is basically takes as long as actually just going ahead and
doing it all the time anyway.

~~~
throwaway613834
Yeah you might be right, it's hard to say. Generally I view writing as more
expensive than reading so I don't know.

~~~
gvx
Note that it's not just write vs read, it's write vs read + branch.

~~~
throwaway613834
Right, if it was just 1 read vs. 1 write then I _would_ know!

------
wyldfire
> That is suprising! It turns out that all “small integers” with the same
> value point to the same memory. We can use the Python built-in function id
> which returns a value you can think of as a memory address to investigate.

Unfortunately this blog post seems to miss a great opportunity to show you how
you _should_ compare integers for equality -- using the equality operator `==`
and not the identity comparison `is`.

EDIT: odd, this post attracted a lot of downvotes. Please help me learn how
this post could be improved.

~~~
int_19h
It never ceases to amaze me that Python seems to be the only one that got this
right - namely, that == should have value comparison semantics, for all types,
and that some other (preferably distinctive enough, rather than something like
===) operator should always compare references.

Then you look at something like C#, where == compares values for value types
and references for reference types, except that == is overloaded for some
"value-like" reference types like String... it's a mess.

OTOH, Java is consistent in a sense that == never dereferences, but awkward in
practice because of the need to use equals() for strings, which is too verbose
for an operation that's far more common than object identity comparison.

~~~
lmm
It's still confusing in Java because there are primitive types which ==
compares by value. Scala gets it right: == does the right thing and there's
some operator I never use (I think it's "eq"?) if you really want reference
comparison.

~~~
int_19h
In Java it's not confusing when you realize that == always compares values
(i.e. things that can be stored in variables); it's just that for reference
types, the values are references. It'd be more obvious if Java required
explicit dereferencing like C++ does, but it's still consistent. Just not
convenient.

~~~
lmm
> In Java it's not confusing when you realize that == always compares values
> (i.e. things that can be stored in variables); it's just that for reference
> types, the values are references.

It's still confusing. Plenty of "reference types" behave like values in all
obvious senses. Why should "foo" be a reference but 45L a value?

~~~
pjmlp
Because "foo" is an instance of String, aka subclass of Object, while 45L is a
value for the primitive type long.

I fail to see the confusion, other than for newbies.

~~~
simon_o
That's pretty much saying "it is the way it is", which

a) is not even true, see floating point numbers

and

b) will be obsolete with value types.

Needless to say, there is pointless confusion created by Java's design, and
there are better approaches available.

All you have to do is to adapt the semantic model from reference equality vs.
value equality to identity vs. equality.

Identity checks whether the "bits" are identical, irregardless of whether the
bits are "references" or values, and equality is a user-defined operation.

    
    
        "Foo" equality "Foo" // True
        "Foo" identity "Foo" // Only true if they point to the same "object"
        123 equality 123 // True
        123 identity 123 // True
        Double.NaN equality Double.NaN // False
        Double.NaN identity Double.NaN // True
    

Which symbols you pick for "equality" and "identity" is largely arbitrary.

~~~
int_19h
Better yet, allow identity checks for reference types only. Value types don't
have identity, per se, so the operation should be meaningless for them.

~~~
simon_o
I don't think you can get away with that for theoretical and practical
reasons:

1.

There is a ton of code out there which does something like

    
    
        def contains(that: Thing): Boolean =
          this.value identity that || this.value equality that
    

Pretty much every single collection implementation would be broken if this
stopped working with value types. Additionally, you would run into issues with
floating point numbers which would not be found/retrieved anymore if identity
were removed.

2.

The idea to define a _sane_ definition of identity/equality across all types
is there to avoid the "next-best" option: boxing primitives to wrapper classes
which is both slow and has terrible semantics.

3.

I don't really think restricting identity to e.g. reference types makes sense
given that equality is defined for every type. Either none of them should be
available by default, or both should be.

There _are_ multiple valid ways to compare to things (consider floating point
numbers for a second) and making one more privileged than the other feels
wrong.

~~~
lmm
> Pretty much every single collection implementation would be broken if this
> stopped working with value types.

Maybe collections of values _should_ be different from collections of
references. The sensible use cases for the two are quite different.

> Additionally, you would run into issues with floating point numbers which
> would not be found/retrieved anymore if identity were removed.

Meh, just allow NaN to compare equal to itself. Equality is supposed to be
reflexive.

> The idea to define a _sane_ definition of identity/equality across all types
> is there to avoid the "next-best" option: boxing primitives to wrapper
> classes which is both slow and has terrible semantics.

Unboxed primitives don't have identity, only value equality. They align well
with what's being proposed.

> I don't really think restricting identity to e.g. reference types makes
> sense given that equality is defined for every type. Either none of them
> should be available by default, or both should be.

We could build the distinction into the language, so for every type you define
you explicitly choose whether it's value or reference. Scala's already halfway
there with the class/case class distinction.

> There _are_ multiple valid ways to compare to things (consider floating
> point numbers for a second)

Disagree; comparison is so fundamental to most types that it's worth
privileging. Using the wrong kind of comparison is a very common source of
bugs.

~~~
simon_o
> Maybe collections of values should be different from collections of
> references. The sensible use cases for the two are quite different.

I think all existing code disagrees with that. There has been great value
derived from being able to abstract over element types.

What you are proposing would double the required number of collection classes
and all of its traits, because it would require separate ones for Collection[E
<: AnyRef] and for Collection[E <: AnyRef].

There is literally no reason for introducing this complexity. Go has
demonstrated how poorly this idea has worked out in practice.

Additionally, this approach would make it nearly impossible to migrate
reference types to value types, because it would break all users of the code.

> Meh, just allow NaN to compare equal to itself. Equality is supposed to be
> reflexive.

That's a complete non-option. You might not like the IEEEs definition of
equality, but this is what it is. Messing with it would break all existing
code using floating point numbers.

> Unboxed primitives don't have identity, only value equality.

Their identity is the bits they consist of, just like identity on references
is the bits of the reference.

> They align well with what's being proposed.

What is being proposed?

> We could build the distinction into the language, so for every type you
> define you explicitly choose whether it's value or reference.

We already have that: AnyRef and AnyVal.

> Scala's already halfway there with the class/case class distinction.

That doesn't make any sense. The case keyword is basically just a compiler
built-in macro to generate some code. It is already doing way to much, and
overloading it with even more semantics is not the way to go.

> Disagree; comparison is so fundamental to most types that it's worth
> privileging. Using the wrong kind of comparison is a very common source of
> bugs.

What I'm proposing improves the consistency across value and reference types
so that it's always obvious which kind of comparison happens:

\- identity: Low-level comparison of the bits at hand. Built into the JVM and
not overridable.

\- equality: High-level comparison defined by the author of the type.

~~~
lmm
> What you are proposing would double the required number of collection
> classes and all of its traits, because it would require separate ones for
> Collection[E <: AnyRef] and for Collection[E <: AnyRef].

Less than double, because not all collections make sense for both - e.g. a
sorted set or sorted map only makes sense if the keys are values.

> There is literally no reason for introducing this complexity. Go has
> demonstrated how poorly this idea has worked out in practice.

It eliminates a common class of errors. All type-level distinctions add a bit
of complexity, but we often consider them worthwhile to make.

> Additionally, this approach would make it nearly impossible to migrate
> reference types to value types, because it would break all users of the
> code.

Changing from one to the other is a radical change that should force the user
to reexamine code that deals with them.

> That's a complete non-option. You might not like the IEEEs definition of
> equality, but this is what it is. Messing with it would break all existing
> code using floating point numbers.

Java already deviated from the IEEE definition with Float and Double. The sky
didn't fall. Maybe strict IEEE semantics could be offered in their own type
where needed, and that type would neither be value or identity-is-meaningful.
(This would mean the type system wouldn't allow you to use the strict-IEEE
type in any standard collection, which I think is correct behaviour; compare
e.g. Haskell where for a long time you could corrupt the standard sorted set
structure by inserting two NaNs).

> Their identity is the bits they consist of, just like identity on references
> is the bits of the reference.

That's a low-level implementation detail that may not even be true on all
platforms. The _language semantics_ should make sense.

> We already have that: AnyRef and AnyVal.

No, those are just implementation details of how they're passed around. Many
AnyRef types have value semantics.

> That doesn't make any sense. The case keyword is basically just a compiler
> built-in macro to generate some code. It is already doing way to much, and
> overloading it with even more semantics is not the way to go.

Well, what I'd like in an ideal language is: no universal equality, opt-in
value equality with derivation for product/coproduct types. As for
references... I'm not really convinced there's a legitimate use case for
comparing references, especially the implicit invisible references that the
language uses to implement user classes. If we need reference comparison at
all I'd rather something a bit more explicit - either an opt-in "the identity
of this class is meaningful", or a notion of explicit references that were
much more visible in the code (something a bit like ActorRef), or both.

> What I'm proposing improves the consistency across value and reference types
> so that it's always obvious which kind of comparison happens: > \- identity:
> Low-level comparison of the bits at hand. Built into the JVM and not
> overridable. > \- equality: High-level comparison defined by the author of
> the type.

That's very inconsistent at the language-semantics level; which things are
"the bits at hand" are a low-level implementation detail that should probably
be left up to the runtime to represent as best suits a particular code path.
At the language level, "does 2L + 2L equal 4L?" is the same kind of question
as "does "a" \+ "b" equal "ab"?", and both those questions are quite different
from any question to which reference comparison would be the answer.

~~~
simon_o
This hardly makes any sense, is not practical to implement and theoretically
questionable.

It makes decisions that break existing code and introduce pointless
complexity, while failing to offer any tangible benefits in return.

------
jonbarker
Cool examples, but I'm not super concerned about the problems arising from the
ability to 'use ctypes to directly edit memory'. It's actually pointers to
memory blocks, not the memory contents itself:
[https://docs.python.org/3/library/ctypes.html](https://docs.python.org/3/library/ctypes.html)
If you're advanced enough to need to handle pointers to memory blocks in your
python program, you are probably good enough to know not to create problems
with the behavior of iterators on ranges.

------
santiagobasulto
Nice article. I wrote a similar piece some time ago related to booleans, in
case anybody is interested: [https://blog.rmotr.com/those-tricky-python-
booleans-2100d5df...](https://blog.rmotr.com/those-tricky-python-
booleans-2100d5df92c)

And to avoid issues with is/==, we recommend our students to always use ==
(except for `is None`). Also related piece:[https://blog.rmotr.com/avoiding-
being-bitten-by-python-161b0...](https://blog.rmotr.com/avoiding-being-bitten-
by-python-161b063e7da2)

~~~
sevensor
That's unnecesarily restrictive -- I've gotten great use out of "is". You
wouldn't want to use it on an integer, but I've been using namedtuples a lot
lately, and "LastKnownConfiguration is CurrentConfiguration" works great to to
check whether anything has changed without checking all of the fields for
equality.

~~~
santiagobasulto
You're very much right and it's a great suggestion. We just suggest that to
our students when they're starting in order to help them avoid issues. But
there are exceptions.

~~~
sevensor
Fair point -- == versus "is" is a pretty subtle distinction and best avoided
by novices.

------
knutae
For comparison, integers in clisp:

    
    
      [1]> (eq (expt 2 47) (expt 2 47))
      T
      [2]> (eq (expt 2 48) (expt 2 48))
      NIL
    

Explanation here:
[https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node17.html](https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node17.html)

~~~
rurban
But nobody would use eq in lisp for this. For numbers you would use equal,
comparing the values. You only would use eq for addresses, like cons cells or
vectors. E.g. in php there is == vs ===.

re python: this is of course superlame. the compiler can easily detect "is
<int>" and forbid the const int table optimization. Only "is None" would make
sense.

------
d0mine

      >>> 42 == 0
      True
    

[https://lameiro.wordpress.com/2010/07/18/deixando-o-
interpre...](https://lameiro.wordpress.com/2010/07/18/deixando-o-
interpretador-python-maluco/)

------
wgrover
You can use sys.getrefcount() to explore these "weird integers":
[https://news.ycombinator.com/item?id=15093897](https://news.ycombinator.com/item?id=15093897)

------
mattbillenstein
Stumbled across this when debugging a reference leak with a C-extension once
-- small integers didn't exercise the problem, but larger ones did...

------
sl4i6j3o4i98g
Brilliant! Very interesting and an insight into the inner workings of python.
Thank you for sharing Kate!

------
tosh
related read:

Equal Rights for Functional Objects or, The More Things Change, The More They
Are the Same (1993) by Henry Baker

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.9...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.9999)

------
supertramp_sid
a = 1000

b = 1000

a is b

The output is false only if you use the REPL. Run a .py script writing the
above code and it returns True. I read about it on SO but can't find the link
to it!

------
anentropic
"things you should never do"

------
gre
It's not weird, it's pythonic.

------
Veedrac
Now change it back!

------
dsfyu404ed
Yup, 0day hunters have fun with this behavior from time to time.

~~~
jwilk
Why is it interesting for "0day hunters" in particular?

~~~
syncsynchalt
I believe it's because you can find code where it does the right thing when
e.g. a username is short enough, but fails in a useful way if you increase an
input string length.

------
zde
That's not weird but pretty common knowledge. IIRC 1-char strings are interned
too.

