
My Favorite Programming Problem to Teach: Digit Length - jstrieb
https://jstrieb.github.io/posts/digit-length/
======
svnpenn
> As a result, solutions using strings are disallowed on problem sets and
> quizzes until they are taught. However, the few students who have prior
> Python programming experience may be tempted to find digit length without
> loops using a variant of the following (for our purposes) invalid solution.

Wow. this is one of the reasons I hated school. No programmatic reason what
given for why a string solution couldnt be used, only an arbitrary reason.
Here students may have knowledge from self teaching or whatever, but they are
unallowed to use that knowledge because "reasons".

To any teacher that thinks its a good idea to punish students for thinking
outside the box: shame on you. All youre going to end up doing is crushing
enthusiasm and/or creating drones. Please dont.

~~~
userbinator
Strings are disallowed because they are not necessary for this problem and
although the solution is shorter, is far more inefficient; it also doesn't
demonstrate the algorithmic thinking that the course is obviously trying to
teach.

I've taught CS courses before, and have seen plenty of self-proclaimed self-
taught know-it-alls who seem to be more stackoverflow-copy-pasters than
anything else.

~~~
svnpenn
> Strings are disallowed because they are not necessary for this problem

the division examples are not necessary either, thats the point. you can solve
it different ways, that doesnt mean one way is not necessary, it just means
its different. one may be faster, one may be more readable. If you dont allow
different solutions you cant explore the tradeoffs between them.

~~~
tzs
Generally, the purposes of the exercises in a class is to reinforce the
material that has been taught up to that point and to demonstrate that the
student can use it.

For example, if early in an elementary number theory class the student is
asked to prove that there are in infinite number of primes of the form 4n+3, a
solution that just invokes Dirichlet's theorem on primes in arithmetic
progressions would probably not be acceptable. That approach does work to show
that there are an infinite number of 4n+3 primes, but completely fails to show
that that the student understood the material actually taught in class.

It's the exact same thing with the digit counting problem. Solving it by just
invoking the built in string length function does little to demonstrate that
the student understands the material taught so far.

------
jepler
But the math.log10 solution is unfortunately "wrong" too, at least in my
python3 implementation.

    
    
        import math
    
        def digitLengthCorrect(n):
            return len(str(n))
    
        def digitLengthClever2(n):
            return 1 if n == 0 else (math.floor(math.log10(n)) + 1)
    
        testcases = (
            [10 ** i for i in range(300)] +
            [(10 ** i) - 1 for i in range(300)]
        )
    
        for t in testcases:
            a = digitLengthCorrect(t)
            b = digitLengthClever2(t)
            assert a == b, (t, a, b)

~~~
gspetr
Good point. To indemnify youself from this type of data loss, change
math.log10() to np.log10() and use Decimal() for tasks involving precise
number crunching.

EDIT: Setting Decimal module's precision manually via getcontext() may be
required if you work with "long" numbers.

    
    
        import math
        import numpy as np
        from decimal import getcontext, Decimal
    
        def digit_length_correct(n):
            return len(str(n))
        
        def digit_length_clever_2(n):
            if n ==0:
                return 1
            else:
                return math.floor((np.log10(Decimal(n)))) + 1 # You can use np.floor() and eschew the use of math. module altogether, but I left it intact to show the minimal necessary modifications
        
        def generate_cases(n):
            getcontext().prec = n + 3
            return (
            [10 ** i for i in range(n)] +
            [(10 ** i) - 1 for i in range(n)]
        )
        
        cases = generate_cases(300)
        
        for t in cases:
            a = digit_length_correct(t)
            b = digit_length_clever_2(t)
            assert a == b, (t, a, b)
        
        for t in cases[-2:]:
            print(t)
            print(digit_length_correct(t))
            print(digit_length_clever_2(t))

~~~
contravariant
Using arbitrary precision arithmetic kind of defeats the point of not using
loops.

------
matheusmoreira
Another fun calculation: maximum amount of decimal digits a binary number
could have.

    
    
      digits = ceil(bits * log10(2))
    

A 10 KiB picture can be thought of as one 24 661 digit number. A 2 GiB video
file can be thought of as one 5 171 655 946 digit number. I find this really
puts everything in perspective. Every number already exists, digital content
creators are just trying to find them. Certain numbers are actually illegal.

A practical application of this: calculating buffer sizes for a function that
converts numbers to text.

    
    
      /* digits = ceil(bits * log10(2))
       *        = ceil(64   * log10(2))
       *        = 20
       */
      #define DECIMAL_DIGITS_64_BITS 20
    
      /* digits + sign + NUL */
      char digits[DECIMAL_DIGITS_64_BITS + 1 + 1];
    

It's possible to handle arbitrary precision numbers by counting the number of
bits and calculating the amount of memory that must be allocated for the
resulting string.

~~~
MauranKilom
As shown in this thread, you should be very uneasy about relying on this for
arbitrary inputs. Non-trivial floating point math (such as log10) + rounding
almost surely ends up with off-by-one errors.

------
Gunax
Are these the typical steps students go through? I would guess that most
students know to convert it to a string and count the length, with a smaller
group knowing about base 10 logs.

I like the general idea of it--improving the algorithm incrementally, finding
exception cases, but I wonder if there is a better example that could be used.

It seems to me that one either knows how to use logarithms or not, and thus
students would either skip to the final solution, or be stuck until having the
answer given to them.

~~~
jstrieb
I was writing less about the steps students would go through when solving the
problem on their own, and more about points I try to hit when using the
problem to teach. At this point in the course, strings haven't been taught, so
using them would not be a natural solution for those without prior programming
experience. I will also review logarithms if necessary.

------
keithnz
quite funny reading HN articles sometimes. So...

I read [https://www.netmeister.org/blog/cs-
falsehoods.html](https://www.netmeister.org/blog/cs-falsehoods.html) which
came from
[https://news.ycombinator.com/item?id=21500672](https://news.ycombinator.com/item?id=21500672)
in the top HN articles at the moment

Item 27 of that list made me laugh when I read this article :)

~~~
jstrieb
I definitely designed my site with the stereotypical 90s hacker terminal in-
mind :)

------
ainiriand
So in the end the solution proposed by the TA is wrong and he invalidated the
good solutions proposed by people with actual python experience.

I hope he learnt more than the students...

------
netmonk
From my point of view, this isn't programming. This is high level language
tricks. It would be much more fun doing it in Assembly language, without any
use of any kind of library, expect for input and output. This way student
would have learnt so much more, about how integer are manipulated into cpu
(just bits in base 2) doing smart math conversion to represent it in base 10.
And why not generalise the problem to also compute the size in base 8, or any
base N.

I hate those programming class just trying to teach python surface use, while
in a programming class you have time to go deeper and learn about how python
works, cause basically python use all str to binary and loop for doing all the
work requested by the teacher, without student even being aware of how it does
it !

~~~
coldtea
> _It would be much more fun doing it in Assembly language, without any use of
> any kind of library, expect for input and output. This way student would
> have learnt so much more, about how integer are manipulated into cpu (just
> bits in base 2) doing smart math conversion to represent it in base 10. And
> why not generalise the problem to also compute the size in base 8, or any
> base N._

One could say as well: this isn't programming, this is low level language
tricks.

You don't need to know "how integers are manipulated into cpu" when learning
to program, and at an introductory class like that described in the article
you shouldn't either. There's a reason SICP at MIT was in Lisp and then
Python.

I also very much doubt it would be "much more fun doing it in Assembly
language".

The author went at length to explain how this is a useful exercize for
programming, as it introduces edge cases, alternative implementations,
testing, etc.

~~~
netmonk
I understand what you say, but i think you are wrong. Assembly is closest to
what a cpu actually does for REAL, so you are programming. When using Python
or whatever any advanced language, you are less programming CPU and more
relying and tons of layers from other's works which are basically created to
never let you understand how to program a CPU. So peoples feels they are
programming, while they are just integrating tons of library and set them to
do something usefull (which is good too).

But if you want to teach programming, i would follow my path, and provide a
deeper understanding at how to control a cpu and how it really works. In order
to demystify computer and gives student a real experience of all the hidden
works that are done with a 4 lines python code.

and for the fun side, guess why assembly is the second searched language on
Stack overflow during weekend : [https://stackoverflow.blog/2017/02/07/what-
programming-langu...](https://stackoverflow.blog/2017/02/07/what-programming-
languages-weekends/)

i guess people are trying to have more fun on weekend than on boring office
project during work days :)

And those numbers show a real interest about Assembly which is usually greatly
discarded in any common CS teaching anywere. So teachers decides it's not
interesting, while in fact most peoples search about it on weekend...I mean it
illustrates a real issue here. May be understanding how to program a cpu at
low level is something natural, that only scholar peoples cannot understand,
therefore neglecting natural tendency of normal peoples to try to understand
how things really work...

And for those really wanting to even dig deeper and understand what is a CPU,
i strongly suggest looking for "from nand to tetris"
[https://www.nand2tetris.org/](https://www.nand2tetris.org/) wich basically
start at nand logical gate, to the extent to create a full working cpu and
programming it to play tetris.

~~~
SamReidHughes
You can always learn assembly language later. Putting it in the intro class is
a bad idea.

~~~
netmonk
well can you develop a little bit your answer cause a one sentence opinion is
just nothing.

------
tromp
digitLength n = if n < 10 then 1 else 1 + digitLength (n / 10)

Avoiding loops by recursion is no worse than avoiding loops by calling a
looping library function...

------
jodrellblank
You ever get the feeling that companies and programmers who become massively
productive and profitable are either finding a narrow path where there are no
concerns like the other discussions here, or simply ignoring them and
ploughing on with a pile of faulty code that mostly works?

Look how much discussion there is over a few thousand nanoseconds here, edge
cases around 16 digit numbers (if my bank account gets that high, and you
accidentally round it up to too many digits, we can deal with that when it
happens).

It seems inconcievable that the most 'productive' can spend this much effort
on everything they do, and 'pick your battles' only goes so far towards
explaining it. 'Move fast and break things' goes farther but only in the space
of possibilities where correct vs incorrect has approximately no consequences.

"Companies with unicorn valuations are most probably doing things which don't
have to be done very well"?

------
pechay
Must be a good teaching problem seeing as it's generated such a large amount
of interesting discussion.

------
woadwarrior01
This reminds me of something I enjoyed writing a couple of years ago. I’d
decided to use the Damm checksum algorithm[1] for order numbers at work. Every
implementation that I could find was turning the number into a string than
then check summing it one character at a time. And that approach felt rather
suboptimal. So, I decided to write a numeric implementation[2].

[1]:
[https://en.m.wikipedia.org/wiki/Damm_algorithm](https://en.m.wikipedia.org/wiki/Damm_algorithm)

[2]: [https://github.com/jeethu/damm](https://github.com/jeethu/damm)

------
gnuvince
By using a do-while loop, you don't have to handle the special case of zero
(although one could argue that the do-while loop _is_ the special handling for
zero). Python does not have such a loop, so you need to do an infinite loop
with an explicit break and I know some people recoil in horror at the sight of
such code.

    
    
        def digitlength(n):
            digits = 0
            while True:
                digits += 1
                n /= 10
                if n == 0:
                    return digits

~~~
a-nikolaev

        def digitlength(n):
            digits = 1
            while (n > 9):
                digits += 1
                n /= 10
            return digits
    

Also may want to set n = abs(n) in the beginning, in case n is negative.

~~~
gnuvince
The problem statement said that the input would be natural numbers.

~~~
lifthrasiir
Wait, do natural numbers include zero? _evil grin_

~~~
chopin
I looked up wikipedia which says there's no conclusive definition.

~~~
lifthrasiir
The joke is that, if you don't need to check if the input is negative because
it is not a natural number, you can also choose a definition of the natural
number without zero to avoid special casing it at all.

------
_pastel
Why not:

    
    
      def digit_len(n): 
        return len(str(n))

~~~
gnuvince
For people who, like me, thought that this would be slower than repeated
division: my crappy benchmark indicates that allocating a new string and
taking its length is faster by a factor of 2–2.5x.

Edit: In C, the version that loops is 15 times faster than the version that
allocates a new string. Python is weird.

~~~
kragen
It's a common feature of interpreters that they impose a large slowdown on
whatever is done in interpreted code; Python's is about 40×, although better
interpreters like GhostScript and Lua are more like 10×. It's surprising the
difference isn't bigger in this case, but I can mostly confirm your results,
getting 3× except for small numbers:

    
    
        In [1]: dl = lambda n: 1 if n < 10 else 1 + dl(n // 10)
    
        In [3]: map(dl, [0, 1, 9, 10, 99, 1000, 15432, 32, 801])
        Out[3]: [1, 1, 1, 2, 2, 4, 5, 2, 3]
    
        In [4]: %timeit dl(15322)
        100000 loops, best of 3: 4.64 µs per loop
    
        In [5]: %timeit len(str(15322))
        1000000 loops, best of 3: 1.41 µs per loop
    
        In [6]: %timeit dl(0)
        1000000 loops, best of 3: 721 ns per loop
    
        In [7]: %timeit len(str(0))
        1000000 loops, best of 3: 1.24 µs per loop

~~~
kragen
Upon thinking about this further, I guess it's sort of obvious why this
happens. dl(0) does a constant, a function invocation, a comparison, a
conditional, and a return of a constant (although if you look at the bytecode
you'll see a few more steps in there). len(str(0)) does a constant and two
built-in function invocations, which (as it happens) involve lookups in the
global namespace at runtime in case you have redefined len or str.

So, basically, it's only interpreting a very small number of interpreter
bytecodes either way, so the small number it has to interpret to use the
builtins is comparable to the small number it has to interpret to run the
recursive definition, and so the interpretive overhead is comparable (and it
swamps the overhead of things like allocating a new string).

This machine is kind of slow. This took 54 CPU seconds in LuaJIT:

    
    
        > function dl(n) if n < 10 then return 1 else return 1 + dl(n/10) end end
        > for i = 1, 1000*1000*100 do dl(15322) end
    

That means that approach took 540 ns per invocation rather than Python's 4640
ns --- only about 9× slower instead of the usual 40×. Or maybe this is a case
where LuaJIT isn't really coming through the way it usually does.

------
tzs

      def digitLength(n):
          dlen = 1
          high = 9
          while n > high:
              dlen += 1
              high = 10*high + 9
          return dlen

~~~
SamReidHughes
One fun enhancement to this is to avoid the n^2 cost it faces with large
integers.

    
    
        def digit_length(n):
            totlen = 0
            n += not n
            while n > 0:
                klen = 1
                k = 10
                while n > k * k:
                    k = k * k
                    klen *= 2
                n = n // k
                totlen += klen
            return totlen

------
08-15
The log10 method is incorrect: math.log10(999999999999999)==15.0. (That's 15
nines.)

Good job teaching unreliable algorithms.

------
hprotagonist

      def what_a_hack(n:int):
          return len(str(n).strip(‘-‘))

------
onesmallcoin
What about len(str('12345')) If your teaching python you may as well take
advantage of the standard library

~~~
woat
If you read the article, you will see that this solution was covered at the
end.

