Hacker News new | past | comments | ask | show | jobs | submit login
My Favorite Programming Problem to Teach: Digit Length (jstrieb.github.io)
124 points by jstrieb on Nov 10, 2019 | hide | past | favorite | 106 comments



> As a result, solutions using strings are disallowed on problem sets and quizzes until they are taught. However, the few students who have prior Python programming experience may be tempted to find digit length without loops using a variant of the following (for our purposes) invalid solution.

Wow. this is one of the reasons I hated school. No programmatic reason what given for why a string solution couldnt be used, only an arbitrary reason. Here students may have knowledge from self teaching or whatever, but they are unallowed to use that knowledge because "reasons".

To any teacher that thinks its a good idea to punish students for thinking outside the box: shame on you. All youre going to end up doing is crushing enthusiasm and/or creating drones. Please dont.


I think this is a communications issue. If you ask them to solve it and then they clearly did, but you then say: “bUt yOu dIDn’T SoLvE iT tHe RiGht wAy”, then it is clearly your fault.

I think if you tell stuff like this upfront it is totally acceptable: “Reduce the number of digits in a given float without converting it to a String”.

Even students with previous knowledge could take this as a challenge where they learn something.


> No programmatic reason what given for why a string solution couldnt be used, only an arbitrary reason

The entire problem is arbitrary. It's not a real-world problem looking for a solution. It's a programming exercise.


I think it is more of a math exercise solved via programming. Which is somehing I am not particularly fond of.

These crop up all the time in those online programming challenge things.

They are usually maths problems (e.g. Find the combined area of two partially intersecting squares based on their coordinates of a,b,c,d and w,x,y,z) where the solution is done via programming but most of the work is, in my opinion, figuring out the maths which is testing for the wrong thing.


This particular problem (number of digits in a number) is actually monumentally important for performance reasons for e.g. json generation. See https://m.facebook.com/notes/facebook-engineering/three-opti... for details.

That said, I do agree that as stated the problem is a toy. The problem statement could at least motivate the lack of string operations - i.e. pretend you’re the language designer and you’re tasked with implementing str(int) in C. Just saying “don’t do that” isn’t helpful. Gaining an understanding that nothing is magical is useful though.


That link has a very insightful discussion (and incidentally provides a nice fast correct solution, and use case, for the initial problem!).


Probably the students do not know about big O notation at this point, but requiring a solution that uses at most O(1) additional space would be enough. Then again, automatic bignums in python make it hard to evaluate the space complexity.


It's not a math exercise. The problem is stringly - find the length of the string that conventionally represents the given number. The problem is that 0 is a special case, because we write something rather than nothing. Using len(str(n)) is the correct, idiomatic way to solve this. Consider what happens when the problem domain expands to include negative numbers...

Notice that all the straightforward solutions the author rejects, they reject because it fails on 0. Then, with the oh-so-clever log10 solution... surprise! They special-case 0. I'm not sure what the lesson here is supposed to be.


Many things people consider "programming" but not mathematics are things we actually study in mathematics classes. They're just unaware.


Actually, this problem does occur in practice. When one needs to output some numbers in a column it can be important how wide the widest one is. Ironically, in that case the numbers are going to be converted to text anyway so you might as well use the string size....


Except string size isn't co-determinate with output size either. Number of code points is sometimes an approximation of number of glyphs after text rendering, more usefully approximate with ASCII, and a lot less usefully approximate with a large amount of Unicode.


We're talking about converting numbers to strings, for which none of that matters.


Because there are no numbers other than Arabic numerals? Numbers are never localized?

Even within just Arabic numbers, it possible for localization systems to add digit separators such as commas, underscores, or periods depending on region.

Sure, str() of an out-of-the-box number in Python won't do localization by default, but Python is a highly dynamic language and I've seen __str__() handlers that do localization, even on things that otherwise act like numbers.

"Simple" string conversion-based "math" rarely is simple in the real world, and ignoring localization and the complexities of Unicode in situations where you are assuming they can't possibly happen is a jungle of mistakes that programmers continue to make every day. The world doesn't run on EBCDIC or ASCII alone anymore, and localization is a thing that matters to a lot of people. The correspondence between code points reported by len(str()) and either input to str() and output to a display device is tenuous at best in anything but the best ASCII circumstances, and handwaving it away is an exercise in your particular bias.

[ETA: And the parent discussion was explicitly about table formatting the output, and that itself specifically is a scenario where you should want to localize the output first and not just rely on str(yourNumber).]


This isn't thinking outside the box, it's an incredibly obvious way to avoid using your brain. There's nothing wrong with creating constraints on a problem, as long as those are clearly defined. Presumably the OP didn't neglect to mention that strings were disallowed, and then just fail anyone who happened to use them.


I don't know about this. This feels like cheating rather than thinking outside of the box. Like when you ask a student to implement a vector in C++, instead of writing the code the student makes a wrapper around std::vector. Internally, `str` probably also use some equivalent of the naive solution implemented somewhere to convert a integer to its decimal string.


Using math.log10 seems at least as cheating to me. (though to be clear, I'd accept both if it were me)


Strings are disallowed because they are not necessary for this problem and although the solution is shorter, is far more inefficient; it also doesn't demonstrate the algorithmic thinking that the course is obviously trying to teach.

I've taught CS courses before, and have seen plenty of self-proclaimed self-taught know-it-alls who seem to be more stackoverflow-copy-pasters than anything else.


There are other HN comments pointing out that the proposed solution still doesn't work correctly because of floating point operations. If the goal is to learn something about programming, yes, I agree with you. But if the goal is to write code with as few bugs as possible then I would take the "create a string and count chars" solution over every solution involving floating point math.


> Strings are disallowed because they are not necessary for this problem

the division examples are not necessary either, thats the point. you can solve it different ways, that doesnt mean one way is not necessary, it just means its different. one may be faster, one may be more readable. If you dont allow different solutions you cant explore the tradeoffs between them.


Generally, the purposes of the exercises in a class is to reinforce the material that has been taught up to that point and to demonstrate that the student can use it.

For example, if early in an elementary number theory class the student is asked to prove that there are in infinite number of primes of the form 4n+3, a solution that just invokes Dirichlet's theorem on primes in arithmetic progressions would probably not be acceptable. That approach does work to show that there are an infinite number of 4n+3 primes, but completely fails to show that that the student understood the material actually taught in class.

It's the exact same thing with the digit counting problem. Solving it by just invoking the built in string length function does little to demonstrate that the student understands the material taught so far.


I note that the solution is invalid at the bottom of my post, but it is worth mentioning that I know all of the students in the group session when I teach this problem, and I am familiar with their background. If I know that a student has prior Python experience, I will discuss the string solution with them, but let them know not to use it in their homework until strings have been covered in class.

In this case, even though the solution is disallowed for grading purposes, there is an opportunity to discuss it during the lesson.


Funny fact: the python round implementation actually uses strings. https://github.com/python/cpython/blob/master/Objects/floato...


> is far more inefficient

Is it?


Probably not, at least not compared to the naive solution using Euclidean division.

for larger numbers it is probably faster.


It's Python. Efficiency has gone out the window long ago.


Why do you need to know the decimal length of a number if not for display purposes? If you have one then make that the problem.

I wonder what would happen if you implemented itoa and then counted digits. It’s pretty much the same thing.


We're lacking a bit of context here, but this is a good way of understanding the relationship between number of digits, closest power of ten, math.log and decimal representation. Once you understand it, it works the same in any base; in particular in base 2. This would be a necessary step should you want to implement some data structures (eg. quadtrees, octrees, etc. which heavily rely on powers of two). You wouldn't expect the implementation to measure the length of the binary string representation!

Regarding itoa(), the first implementation provided in the article actually match it's implementation, without the unnecessary bits (building the output).


I wish they didn't phrase it as being about removing a loop. If math.log10 supported arbitrary integer inputs, there would be a loop inside.

They should consider the upper bound of their algorithm. Because it uses floating point, it doesn't work for all natural numbers, as required by their specification.

To address your point, counting digits in base 2 is much simpler than in base 10, because the internal representation of the number on the computer is already base 2. You can use numeric calculations, but you can also just look at your digits.

(Edited for fewer tangents and to be more positive.)


Once you compute your log2, computing log10 is just a single multiplication though, so, it is in fact as most as easy.


It's a single multiplication - with an irrational (transcendental, actually) number. That is most certainly not "easy" for a computer, and elsewhere in the comments you can see how much of a problem that actually poses.

Also, counting digits in base 2 is not the log2. The former gives you the latter but not the reverse. Finding the number of decimal digits in a number given in base 2 is not a simplification.


Oh, agree that for arbitrary large or non integer inputs it might not work. But elsethread jepler has shown a solution off Hacker's Delight which which simply multiplies the index of the most significant bit with a magic constant.


> But elsethread jepler has shown a solution off Hacker's Delight which which simply multiplies the index of the most significant bit with a magic constant.

...and then checks for and corrects the potential off-by-one thus incurred.


When N is fixed, all algorithms become O(1). That's why they appear the same.

If you do these calculations by hand, the complexity will be more obvious because each operation will be smaller.


Agreed. I think you could even argue that string manipulation is even more common than uses of math.ceil, floor, the floor division operator (//), and so on. I love math, but this is a programming class.

This also fails my personal rule to always need to have a reason to do something mean - like failing someone on a problem, and no reason to do something nice - like saying it’s fine since it works.


> I love math, but this is a programming class.

Sometimes you actually have to do arithmetic when programming. It's useful to teach people these sort of things so they don't cook up half baked solutions in the future.


It's a programming class where "algorithmic thinking" (and therefore, in my opinion, some amount of math) is one of the course objectives.

Since the discussion takes place in a low-pressure, guided group lesson, students will certainly not be penalized for suggesting the string method in that setting. Also, students trying to use strings on homework will fail tests in the linter and autograder before they finally submit, so they will have plenty of opportunity to fix their mistake before receiving a grade on their work.


Drastically depends on the kind of problems you're working on.


In school the goal isn’t to solve the problem, but to learn something in the process of solving that problem.

Thinking outside the box is of course valuable, but the teachers have specific concept they want to teach and constraints like that help them accomplish that.

When it comes to tests, it’s not as easy. At tests it’s is more about solving the actual problem, but one could still argue that the test is meant to test how well one has learned what was thought, not to solve the problem.

I think this is totally fair in this case.


Because I am a smart-ass, I did something like this in college. There was an intro course in MATLAB my freshman year, but I'd been programming for several years at that point so I found it too basic to hold my interest. In one of our early labs, we were told to implement a function that calculated a Fibonacci number. I figured that sounded like something MATLAB probably had in its stdlib, and it did, so I wrote a 1 liner and called over a TA to approve my work.

Thankfully this TA agreed with you (as do I). He said it looked good, and I think that's the shortest lab I ever had.


if i was the TA i would have said: "congratulations, for you have not learned anything new"

personally, when something is to basic to hold my interest, i try to find ways to make it more challenging.


I personally learned quite a bit in a similar situation.

In an operating systems class we had a little project to create a command line calculator in C, with the added hoop of using x86 ASM for all control flow and calculations. As this was not a programming class we had a very brief overview of the very basics needed to get this done. I assumed using floating point arithmetic would make this easier, but knew that was not part of the early spec, so I asked the professor in class what version of x86 and he said pentium 1.

I then found and read intel’s documentation for the first generation Pentium. Which completely changed my mental model of CPU’s. Honestly, it was probably the closest collage assignment to how real world coding works and much more useful long term than just implementing some simple algorithms by hand.


I was quite comfortable with writing the Fibonacci function. Writing that would have hardly taken me any longer and would not have taught me anything either.

The thing I did learn, and which I think has paid more dividends in my career than knowing the definition of the Fibonacci sequence, is that knowing your tools well can save you time and effort and reduce the amount of your own code you need to maintain and put effort into.


working with contraints (don't use loops, or don't use the string function) is a learning tool. don't dismiss it just because it prevents your clever trick from working.


Constraints are fine.

Blocking off a generic "anything you haven't been taught yet" is a hellfire of ambiguous egg shells.

Strictly speaking you might not have been taught to use division with loops. Maybe you haven't been taught about the log10 function, or you have but you haven't been taught about combining the log10 function with the round/ceil functions, are you sure you were taught about putting if's before these exact functions?

The issue is every programming question requires by its very nature that you use something you haven't been taught to solve it. So you have to know what the teacher counts as acceptable vs unacceptable, and this comes into weird edge cases where the logic behind why something would be considered part of the "you must be taught first" category vs the "you are to assume figuring this out is part of the question" category requires knowing an overview of how to teach computer science, something the students would not have.

I once got points taken off for using a recursive function as part of my solution because that wasn't taught until the next quarter. I didn't know the language before, nor did I have much experience with recursive functions as a formal concept. We knew how to make functions, call functions, do loops, and even do gotos.

I didn't even know at that time that recursive functions were a special thing that had to be taught to me, it just came to me as the way to solve the problem.


i agree, constraints and "anything you haven't been taught yet" are two different things.

constraints should be spelled out explicitly.

The issue is every programming question requires by its very nature that you use something you haven't been taught to solve it

that, i don't agree with. it's entirely possible to state programming questions in such a way that they only require things you have already been taught. (in its simplest form "being taught" means "being told about". most (if not all) freecodecamp exercises work like that.


It's also not great advice to be so ridgid in this case... Having written numerical grading code, I learned the hard way that log10 based methods of counting SF is a very bad idea (fp errors occur toward but not at word precision limit) counting string length with some solid regex for extracting integer fraction and exponent parts on the other hand can handle any input precision.


This problem is a funny one for me and gives me real nostalgia, back to about ten years ago when I was learning to program.

I'd just picked up python and was having so much fun going through the project Euler problems.

I can't remember which question it was for but we had to know the length of an integer. At that point I hadn't learnt about type conversion so had to implement it the way the author wants his students to do it.

I remember fondly as well that I hadn't learnt about the modulo operator and had to implement that by myself as well.

I couldn't believe it when I got the answer and progressed to the forum where others shared their answers and they were doing it on one line with %!

Good times!


There is also the issue that they may not know that the thing isn't "taught" yet because they don't know its a "thing".

I once got points taken off for using a recursive function as part of my solution. I didn't know the language before, nor did I have much experience with recursive functions as a formal concept.

I didn't even know at that time that recursive functions were a special thing that had to be taught to me.


Precisely. It was the first solution I came up with after reading the title. It's perfect in that all the subtleties are already handled by the language. Digit length of base10 numbers isn't really a logic/math/computer problem, it's a string and writing convention problem.


This problem is designed more in the spirit of teaching the algorithmic thinking side of the course objectives than the Python side.

If you prefer, you can view the problem restrictions as a means to encourage students to develop a more general algorithm for solving the problem, rather than a programmatic implementation specific to Python. Hopefully this makes them seem less arbitrary.


But the math.log10 solution is unfortunately "wrong" too, at least in my python3 implementation.

    import math

    def digitLengthCorrect(n):
        return len(str(n))

    def digitLengthClever2(n):
        return 1 if n == 0 else (math.floor(math.log10(n)) + 1)

    testcases = (
        [10 ** i for i in range(300)] +
        [(10 ** i) - 1 for i in range(300)]
    )

    for t in testcases:
        a = digitLengthCorrect(t)
        b = digitLengthClever2(t)
        assert a == b, (t, a, b)


Sure enough. This may help show what happens with float precision:

    >>> import math
    >>>
    >>> for i in range(1, 20):
    ...     n = int('9' * i)
    ...     digitLengthClever2 = math.floor(math.log10(n) + 1)
    ...     float_value = math.log10(n)
    ...     print(f'{n:<20} {digitLengthClever2:>2}  {float_value}')
    ...
    9                     1  0.9542425094393249
    99                    2  1.99563519459755
    999                   3  2.9995654882259823
    9999                  4  3.9999565683801923
    99999                 5  4.999995657033466
    999999                6  5.999999565705301
    9999999               7  6.99999995657055
    99999999              8  7.999999995657055
    999999999             9  8.999999999565706
    9999999999           10  9.99999999995657
    99999999999          11  10.999999999995657
    999999999999         12  11.999999999999567
    9999999999999        13  12.999999999999957
    99999999999999       14  13.999999999999996  # <-
    999999999999999      16  15.0                # <- 
    9999999999999999     17  16.0
    99999999999999999    18  17.0
    999999999999999999   19  18.0
    9999999999999999999  20  19.0


Good point. To indemnify youself from this type of data loss, change math.log10() to np.log10() and use Decimal() for tasks involving precise number crunching.

EDIT: Setting Decimal module's precision manually via getcontext() may be required if you work with "long" numbers.

    import math
    import numpy as np
    from decimal import getcontext, Decimal

    def digit_length_correct(n):
        return len(str(n))
    
    def digit_length_clever_2(n):
        if n ==0:
            return 1
        else:
            return math.floor((np.log10(Decimal(n)))) + 1 # You can use np.floor() and eschew the use of math. module altogether, but I left it intact to show the minimal necessary modifications
    
    def generate_cases(n):
        getcontext().prec = n + 3
        return (
        [10 ** i for i in range(n)] +
        [(10 ** i) - 1 for i in range(n)]
    )
    
    cases = generate_cases(300)
    
    for t in cases:
        a = digit_length_correct(t)
        b = digit_length_clever_2(t)
        assert a == b, (t, a, b)
    
    for t in cases[-2:]:
        print(t)
        print(digit_length_correct(t))
        print(digit_length_clever_2(t))


Using arbitrary precision arithmetic kind of defeats the point of not using loops.


Yes, the so-called "naive" solution is the best one, in my view, insofar as it is correct. Of course, choose the "clever" solution if speed is more important than correctness (in other words, if the approximate number of digits is acceptable, eg. if a +1 safety buffer is used) - though then benchmark to show that it is actually faster, for the distribution of numbers you actually encounter.

So, the discussion of this problem can be extended further into a discussion of specs, tradeoffs, benchmarking, etc.


Interesting point, could be worth adding a note to students about the finite precision of floating point numbers in Python. Thanks for pointing this out!


Incidentally, while it focuses on values that fit in machine registers, you might check out Hacker's Delight's section on computing "ilog10". In the second edition, section 11-4 is on integer logarithms.

If your target language is Python3, you have the "bit_length()" method of integer objects. Multiply by c ~= 0.30103 and then correct for being too low; superficially, it seems like this algorithm should work for extremely large numbers, at least to millions of digits.

As far as test cases, n10, (n10)+1 and (n10)-1 seem like good choices and as long as the underlying mathematical function is monotonic should ferret out failures. I could easily run my implementation for numbers up to 30,000 digits, but going to 300,000 digits didn't finish while writing this message.

    log10_2 = math.log10(2)
    def digit_length_clever3(n):
        if not n: return 1
        est = math.floor(n.bit_length() * log10_2)
        low = n >= 10**est
        return est + low


Yeah it was bound to go wrong somewhere when you try to find the length of an arbitrary length integer using floating point arithmetic.


Another fun calculation: maximum amount of decimal digits a binary number could have.

  digits = ceil(bits * log10(2))
A 10 KiB picture can be thought of as one 24 661 digit number. A 2 GiB video file can be thought of as one 5 171 655 946 digit number. I find this really puts everything in perspective. Every number already exists, digital content creators are just trying to find them. Certain numbers are actually illegal.

A practical application of this: calculating buffer sizes for a function that converts numbers to text.

  /* digits = ceil(bits * log10(2))
   *        = ceil(64   * log10(2))
   *        = 20
   */
  #define DECIMAL_DIGITS_64_BITS 20

  /* digits + sign + NUL */
  char digits[DECIMAL_DIGITS_64_BITS + 1 + 1];
It's possible to handle arbitrary precision numbers by counting the number of bits and calculating the amount of memory that must be allocated for the resulting string.


As shown in this thread, you should be very uneasy about relying on this for arbitrary inputs. Non-trivial floating point math (such as log10) + rounding almost surely ends up with off-by-one errors.


Are these the typical steps students go through? I would guess that most students know to convert it to a string and count the length, with a smaller group knowing about base 10 logs.

I like the general idea of it--improving the algorithm incrementally, finding exception cases, but I wonder if there is a better example that could be used.

It seems to me that one either knows how to use logarithms or not, and thus students would either skip to the final solution, or be stuck until having the answer given to them.


I was writing less about the steps students would go through when solving the problem on their own, and more about points I try to hit when using the problem to teach. At this point in the course, strings haven't been taught, so using them would not be a natural solution for those without prior programming experience. I will also review logarithms if necessary.


quite funny reading HN articles sometimes. So...

I read https://www.netmeister.org/blog/cs-falsehoods.html which came from https://news.ycombinator.com/item?id=21500672 in the top HN articles at the moment

Item 27 of that list made me laugh when I read this article :)


I definitely designed my site with the stereotypical 90s hacker terminal in-mind :)


So in the end the solution proposed by the TA is wrong and he invalidated the good solutions proposed by people with actual python experience.

I hope he learnt more than the students...


From my point of view, this isn't programming. This is high level language tricks. It would be much more fun doing it in Assembly language, without any use of any kind of library, expect for input and output. This way student would have learnt so much more, about how integer are manipulated into cpu (just bits in base 2) doing smart math conversion to represent it in base 10. And why not generalise the problem to also compute the size in base 8, or any base N.

I hate those programming class just trying to teach python surface use, while in a programming class you have time to go deeper and learn about how python works, cause basically python use all str to binary and loop for doing all the work requested by the teacher, without student even being aware of how it does it !


>It would be much more fun doing it in Assembly language, without any use of any kind of library, expect for input and output. This way student would have learnt so much more, about how integer are manipulated into cpu (just bits in base 2) doing smart math conversion to represent it in base 10. And why not generalise the problem to also compute the size in base 8, or any base N.

One could say as well: this isn't programming, this is low level language tricks.

You don't need to know "how integers are manipulated into cpu" when learning to program, and at an introductory class like that described in the article you shouldn't either. There's a reason SICP at MIT was in Lisp and then Python.

I also very much doubt it would be "much more fun doing it in Assembly language".

The author went at length to explain how this is a useful exercize for programming, as it introduces edge cases, alternative implementations, testing, etc.


I understand what you say, but i think you are wrong. Assembly is closest to what a cpu actually does for REAL, so you are programming. When using Python or whatever any advanced language, you are less programming CPU and more relying and tons of layers from other's works which are basically created to never let you understand how to program a CPU. So peoples feels they are programming, while they are just integrating tons of library and set them to do something usefull (which is good too).

But if you want to teach programming, i would follow my path, and provide a deeper understanding at how to control a cpu and how it really works. In order to demystify computer and gives student a real experience of all the hidden works that are done with a 4 lines python code.

and for the fun side, guess why assembly is the second searched language on Stack overflow during weekend : https://stackoverflow.blog/2017/02/07/what-programming-langu...

i guess people are trying to have more fun on weekend than on boring office project during work days :)

And those numbers show a real interest about Assembly which is usually greatly discarded in any common CS teaching anywere. So teachers decides it's not interesting, while in fact most peoples search about it on weekend...I mean it illustrates a real issue here. May be understanding how to program a cpu at low level is something natural, that only scholar peoples cannot understand, therefore neglecting natural tendency of normal peoples to try to understand how things really work...

And for those really wanting to even dig deeper and understand what is a CPU, i strongly suggest looking for "from nand to tetris" https://www.nand2tetris.org/ wich basically start at nand logical gate, to the extent to create a full working cpu and programming it to play tetris.


Students are taught to manipulate real numbers long before they're taught about Dedekind cuts or equivalence classes of Cauchy sequences.

For the purposes of teaching, the best approach is not always to start from the foundations and build upwards.


You can always learn assembly language later. Putting it in the intro class is a bad idea.


well can you develop a little bit your answer cause a one sentence opinion is just nothing.


"The author went at length to explain how this is a useful exercize for programming, as it introduces edge cases, alternative implementations, testing, etc. "

Well, as long as he disallowed the obvious solution (last paragraph of the OP), and then he still got the solution wrong... (see other comments up threat)


Exactly this and programming screening interviews are full of this!

> Evaluating different solutions leads to natural questions about the definitions implicit in the problem statement: mathematically, what is digit length?

OF COURSE! This is a MATHEMATICAL problem that is posed as a PROGRAMMING question. This has always confused me and I could never justify until reading this article why I felt them to be irrelevant to showcasing my programming knowledge. At the very least just tell people, this is the mathematical reasoning behind it, and watch the person implement it.


You think students should be writing assembly before they even know what a string is?


Basic assembly is easy. (adding numbers, counting, basic comparisons, even basic branching, all you need is 12 or so instructions ).

In our collage, one of the first semester classes we were doing assembly on old Motorola 8 bit, the other one was Java.

people that never programmed before had more trouble with java, than assembler.


This is exactly what i mean ! :)


digitLength n = if n < 10 then 1 else 1 + digitLength (n / 10)

Avoiding loops by recursion is no worse than avoiding loops by calling a looping library function...


You ever get the feeling that companies and programmers who become massively productive and profitable are either finding a narrow path where there are no concerns like the other discussions here, or simply ignoring them and ploughing on with a pile of faulty code that mostly works?

Look how much discussion there is over a few thousand nanoseconds here, edge cases around 16 digit numbers (if my bank account gets that high, and you accidentally round it up to too many digits, we can deal with that when it happens).

It seems inconcievable that the most 'productive' can spend this much effort on everything they do, and 'pick your battles' only goes so far towards explaining it. 'Move fast and break things' goes farther but only in the space of possibilities where correct vs incorrect has approximately no consequences.

"Companies with unicorn valuations are most probably doing things which don't have to be done very well"?


Must be a good teaching problem seeing as it's generated such a large amount of interesting discussion.


This reminds me of something I enjoyed writing a couple of years ago. I’d decided to use the Damm checksum algorithm[1] for order numbers at work. Every implementation that I could find was turning the number into a string than then check summing it one character at a time. And that approach felt rather suboptimal. So, I decided to write a numeric implementation[2].

[1]: https://en.m.wikipedia.org/wiki/Damm_algorithm

[2]: https://github.com/jeethu/damm


By using a do-while loop, you don't have to handle the special case of zero (although one could argue that the do-while loop is the special handling for zero). Python does not have such a loop, so you need to do an infinite loop with an explicit break and I know some people recoil in horror at the sight of such code.

    def digitlength(n):
        digits = 0
        while True:
            digits += 1
            n /= 10
            if n == 0:
                return digits


    def digitlength(n):
        digits = 1
        while (n > 9):
            digits += 1
            n /= 10
        return digits
Also may want to set n = abs(n) in the beginning, in case n is negative.


The problem statement said that the input would be natural numbers.


Wait, do natural numbers include zero? evil grin


No support for that in Sloane's. https://oeis.org/A000027


I looked up wikipedia which says there's no conclusive definition.


The joke is that, if you don't need to check if the input is negative because it is not a natural number, you can also choose a definition of the natural number without zero to avoid special casing it at all.


Why not:

  def digit_len(n): 
    return len(str(n))


For people who, like me, thought that this would be slower than repeated division: my crappy benchmark indicates that allocating a new string and taking its length is faster by a factor of 2–2.5x.

Edit: In C, the version that loops is 15 times faster than the version that allocates a new string. Python is weird.


It's a common feature of interpreters that they impose a large slowdown on whatever is done in interpreted code; Python's is about 40×, although better interpreters like GhostScript and Lua are more like 10×. It's surprising the difference isn't bigger in this case, but I can mostly confirm your results, getting 3× except for small numbers:

    In [1]: dl = lambda n: 1 if n < 10 else 1 + dl(n // 10)

    In [3]: map(dl, [0, 1, 9, 10, 99, 1000, 15432, 32, 801])
    Out[3]: [1, 1, 1, 2, 2, 4, 5, 2, 3]

    In [4]: %timeit dl(15322)
    100000 loops, best of 3: 4.64 µs per loop

    In [5]: %timeit len(str(15322))
    1000000 loops, best of 3: 1.41 µs per loop

    In [6]: %timeit dl(0)
    1000000 loops, best of 3: 721 ns per loop

    In [7]: %timeit len(str(0))
    1000000 loops, best of 3: 1.24 µs per loop


Upon thinking about this further, I guess it's sort of obvious why this happens. dl(0) does a constant, a function invocation, a comparison, a conditional, and a return of a constant (although if you look at the bytecode you'll see a few more steps in there). len(str(0)) does a constant and two built-in function invocations, which (as it happens) involve lookups in the global namespace at runtime in case you have redefined len or str.

So, basically, it's only interpreting a very small number of interpreter bytecodes either way, so the small number it has to interpret to use the builtins is comparable to the small number it has to interpret to run the recursive definition, and so the interpretive overhead is comparable (and it swamps the overhead of things like allocating a new string).

This machine is kind of slow. This took 54 CPU seconds in LuaJIT:

    > function dl(n) if n < 10 then return 1 else return 1 + dl(n/10) end end
    > for i = 1, 1000*1000*100 do dl(15322) end
That means that approach took 540 ns per invocation rather than Python's 4640 ns --- only about 9× slower instead of the usual 40×. Or maybe this is a case where LuaJIT isn't really coming through the way it usually does.


Having done a fair amount of python, this is exactly the result I'd expect. `len`, `str` are essentially implemented in C, thus avoiding the massive interpreter overhead compared to a simple loop in python.


Python uses a String conversion to implement rounding, so why not? :)

https://github.com/python/cpython/blob/master/Objects/floato...


Yes but performance was not part of the question. Any correct solution would do. I see this a lot that people go an extra mile trying to optimize the solution and introducing bugs along the way instead having a simple certainly correct solution and optimize it later if necessary. It is funny that the string version is faster in Python.


That's mentioned in the article, at the bottom, under the topic "Alternative, invalid solution":

> As a result, solutions using strings are disallowed on problem sets and quizzes until they are taught.


Also as a result, even the proposed solution is wrong for certain cases, takes longer, and doesn't convey what it's achieving as elegantly as `len(str(n))`. Because the length of the string representation is - exactly - what is being asked here.

I get that sometimes the simplest solution to a problem is not teaching the right stuff, but in almost all cases that is a fault in the problem, not the solution.


You need to minus 1 if the number is < 0


  def digitLength(n):
      dlen = 1
      high = 9
      while n > high:
          dlen += 1
          high = 10*high + 9
      return dlen


One fun enhancement to this is to avoid the n^2 cost it faces with large integers.

    def digit_length(n):
        totlen = 0
        n += not n
        while n > 0:
            klen = 1
            k = 10
            while n > k * k:
                k = k * k
                klen *= 2
            n = n // k
            totlen += klen
        return totlen


When I learned computing, on the machines of the time division was very slow and multiplication was slow (early microprocessors didn't even have instructions for them), so I would certainly choose this solution over all the other ones that involve division. Modern PCs are much faster but the operations still have the same relative ranking of speeds.


That works in Python but this idea would fail in languages with fixed with integers.


No loops, and at least for a 64-bit type, you won't run out of stack space.

  fn digit_length_recursive(input: usize) -> usize {
      if input == 0 {
          0
      } else {
          1 + digit_length_recursive(input / 10)
      }
  }

  fn digit_length(input: usize) -> usize {
      std::cmp::max(digit_length_recursive(input), 1)
  }


The log10 method is incorrect: math.log10(999999999999999)==15.0. (That's 15 nines.)

Good job teaching unreliable algorithms.


  def what_a_hack(n:int):
      return len(str(n).strip(‘-‘))


What about len(str('12345')) If your teaching python you may as well take advantage of the standard library


If you read the article, you will see that this solution was covered at the end.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: