Might not work (I think Python supports bignums, not too sure).
I did this which will work with any length of number[1], and appears to work for all edge cases, including numbers that start with zero and doesn't use loops:
(defun num-digits (n)
(if (< n 10)
1
(+ 1 (num-digits (floor n 10)))))
[1] Millions of digits, if your computer has the RAM for it.
All of the itertools functions are implemented using loops, but they heavily abstract over them so that users can think in terms of "streams" (or officially "iterators") without writing loop-oriented code themselves.
Unfortunately in Python, integers have no fixed maximum size and can grow indefinitely, so your algorithm would need to be long enough to cover all the numbers that can conceivably fit in a computer's memory. I guess there's still a finite size there, but I'm not sure how the memory usage scales with digit length, so I don't know what the largest number would be.
I would prefer a solution based on multiplication than division.
If the number is less than 10, return one, otherwise is it less than 100, ...
You should prefer the simplest tool that can do the job. Multiplication is simpler than division. You could digress about floating point errors (Python has infinite-precision integers, but not infinite- precision floats) or performance (division may be in some contexts meaningfully slower) but those are just digressions, the important thing is that it's more complicated.
The edge cases also seem completely clear to me in this approach, you won't accidentally write <= 100, and 0 is handled automatically.
Of course, the division and log and string methods are all fine in practice, but I'm surprised not to see this alternative mentioned.
Nice insight. Similarly, dynamic programming solutions that "build up" from the bottom can often be clearer than those "go down" recursively from the top, even though they're equivalent.
It's very strange to me that the teacher would push the students from the correct solution using a loop, towards an incorrect solution using a logarithm. A logarithm could work in a language like C where ints can't get too large, but Python has arbitrary precision integers so any solution using floating point numbers is doomed. For example, the code given in the post returns 16 instead of 15 for 999_999_999_999_999.
I've come to the retrospective conclusion decades after educational abuse that professors like this are more interested in showboating their mathematics knowledge than drive students to find good pragmatic solutions.
i.e. I know this super complex answer in which is it happens to be the only purely accurate thing. Can you guess it? vs what I would like to see which is: here are the foundational concepts, bring them all together and arrive at the naturally correct conclusion demonstrating your knowledge and understanding
>It's very strange to me that the teacher would push the students from the correct solution using a loop, towards an incorrect solution using a logarithm
That's because the teacher here is an undergraduate student who has yet to learn from painful experience :)
I'm belatedly warming up to the idea that the digit length of the number zero is actually zero in various senses. (A footnote in the article points out that we could define it this way and then some students' simpler solutions would actually be considered correct.)
Yes, we obviously normally use a single digit to write zero, but we could logically write it as the empty string "" if we had a way to indicate that a specific number was meant. (Relative to that, writing 0 is just as unnecessary as writing 00 or 000, because we can have a rule that all leading zeroes need not be written.)
As another example of this intuition, suppose that we wrote all integers with a trailing decimal point. In that case the numbers from 1 to 10 would be "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9.", and "10.", while zero could quite logically just be "." with no digits (as it has only leading zeroes, or we could say "nothing in the ones' place").
Quite a lot of arithmetic algorithms would probably work just fine with these conventions! In fact, they might have fewer exceptions or special cases than they do when they expect an explicit 0.
For instance, using the "zero is written as empty string" convention, Python int() (here simplified to assume input strings of zero or more decimal digits) and str() could look like
def int(s):
n = 0
for c in s:
n *= 10
n += "0123456789".find(c)
return n
def str(n):
s = ""
while n:
n, d = divmod(n, 10)
s = "0123456789"[d] + s
return s
Seems pretty general! Yes, it's annoying or confusing if the output is going to be viewed by a human user in a larger string context without delimiters, as we probably don't want to see things like "I ate tomatoes" instead of "I ate 0 tomatoes".
Aren’t you mixing up 0 the digit and 0 the number here?
Certainly we can use the empty string to denote the number zero, but it doesn’t mean that we have a straight forward convention for how we denote any number which have some null value in some power lower than the most significant one.
Not that it’s impossible to come with some convention that ditches 0 as intermediary digit. For example 302009==3e5+2e3+9 will evaluate to true in many languages out there. In Ruby we even have `302009 === 3e5+2e3+9+''.to_i` that is evaluated to true.
But this notation loses the conveniences afforded by a positional fixed base numeral system provides.
The only change is the empty string being a valid name for a number (the number zero).
I don't mean to suggest getting rid of place value or the digit 0! This is more like making the place value system more consistent with respect to a corner case.
And the practical reason that we can't make this change is that we don't have a way to distinguish in writing between the absence of any string and the presence of the empty string.
For base 2/16 this is pretty easy if you use assembly, because most ISAs have instructions to count the leading zeros, so for eg. aarch64 to get length of hex integer (assuming you're not trying to do anything weird like signed hex - negative numbers will always be max width this way):
hlen:
// int hlen(uint64_t x)
// takes an integer, returns length of hex string
mov x9, #16 // max width
clz x0, x0 // count binary leading zeros
cmp x0, #64
lsr x0, x0, #2 // 4 bits per hex digit
sub x0, x9, x0
cinc x0, x0, eq // if num = 0, add 1
ret
Decimal requires looping (well, only ~20 comparisons needed for 64-bit so maybe that could be unrolled but same thing either way), so it's simpler to just use high level.
This is a mathy solution to a CS problem. I would’ve preferred the route to a more CS-ey solution using bit shifts and recursion. Teaches an understanding of low level representations.
i feel like using the log operator for avoiding a loop is also cheating as well because under the hood there is likely to be a loop. i would have expected the non-looping solution to use either recursion or some abuse of itertools which is really just using a loop as well.
import itertools
def digitLength(n):
if n == 0:
return 1
*_, last = itertools.takewhile(lambda acc: acc[0] != 0, itertools.accumulate(itertools.repeat(None), lambda acc, x: (acc[0]//10, acc[1] + 1), initial=(n, 0)))
return last[1]
the problem is interesting in python because i think the looping solution is not optimal because it performs N divisions for an arbitrarily large integer whereas I think there should be a solution that performs O(log(N)) multiplications for an arbitrarily large integer. in other languages with fixed integers its not really an issue how many operations you do since its effectively constant.
That's really my beef with languages like Python - typing is an afterthought. I literally get more safety from C than I do from Python:
def numDigits(num):
return len(str(num))
print (numDigits(1)) # Returns the correct answer
print (numDigits("1")) # Returns the correct answer
print (numDigits("01")) # Returns the incorrect answer
The "returns the wrong answer" is the problem. If the input is the wrong type, I expect there to be an error raised, not silently give me wrong answers.
Fair enough I suppose. Though I should mention that python is much better at these than a lot of dynamically typed languages. All of these give errors in python but return nonsense in e.g. javascript:
1+"1"
[]+{}
({}).foo
[]/2
Of course there's the type checkers for python like mypy and pyright but in my experience the type systems these provide are wildly underpowered for statically typing even slightly more complex idiomatic python.
Re the num digits example: One way to go about this is to add asserts for checking the input types, though generally in python it's considered more idiomatic to check how an object behaves than what type it is.
In any case in practice I find that by far most type systems, bolted on or built in or otherwise, are more of a hassle than they are worth. One of the few exceptions I've found so far is D. When you rewrite the num digits function in D such that it accepts any type like the python example does, it should be of no surprise that it has the exact same bug as the python version:
int numDigits(T)(T num){
return num.to!string.length
}
I'm not entirely sure what my point is other than to point out that typing in python isn't an afterthought. it just uses a different type system to what you're used to which allows for bugs you're not used to.
str(x) converts the argument to a string. Since the argument is an integer, there won't be any leading zeroes. More problematic are negative values though.
Wouldn't len(str(num)) be adequate here? This is a quite literal translation of what the code should be doing: measuring the length of the text representation of a number. The mathematical approach seems a little convoluted, although it serves the purpose of teaching a lesson.
Be careful, doing str(num) requires python to convert the binary representation it has num stored as to a decimal. (C)Python implements a quadratic time change of basis algorithm. This is slow enough that now for very large inputs python will raise "ValueError: Exceeds the limit (4300 digits) for integer string conversion"
The correct solution here is to give credit for the problem to acknowledge genuine clever problem solving, and then offer extra credit for doing it the pedagogical way.
There is no correct solution here. A classroom is not a test environment.
The goal is to learn, and the point of the exercises is to teach a specific concept. If a student finds a different way around the problem, that may show that they're already proficient in other skills, but they haven't necessarily learned the concept being taught in this class yet. A good instructor would probably acknowledge the solution, but add extra boundaries to the task to get the student to explore the problem in a way that lets them encounter the testing difficulties discussed here.
It's like smuggling a calculator into a class about mental maths strategies: you'll probably do very well in the final test, but you won't have learned anything!
> a clever, unintuitive solution to a difficult problem
If you approach from the mathematics side of things, building on log₁₀ is the completely obvious approach to use. If it seems unintuitive, that’s just because you don’t understand logarithms.
> the autograding test cases did not include a test using a power of 10.
That’s a pretty glaring oversight. Boundary cases are probably the most important things to cover in such tests. I’d be wanting to test things like 0, 1, 9, 10, 11, 99, 100, 101, and so forth. (Also negative numbers, unless the problem explicitly constrained it to {0, 1, 2, …}.)
We often talk about edge cases in testing, but this reveals that the edges can be not just at the extremes, but also at intermediate value changes. Put another way (still fuzzy!), these are the interesting values. You test interesting values.
This would also be a good place to use property testing; instead of hard-coding a bunch of tests (or perhaps as well as), compare against a large number of generated values, comparing with a known-good implementation, probably just len(str(number)).
I'm not sure about the property-based testing thing. In this example, yes, you can easily write a test case that compares against the string-based algorithm. But in general, this is one of the biggest weaknesses of PBT, which is that you need to find a good property to test.
In a lot of cases, the most useful property is "for all inputs, the result is the right answer", but we don't know the right answer until we've written the algorithm, and there's not usually much point writing two algorithms unless, like here, one of those algorithms is trivial correct but inappropriate for some reason.
More generally, I feel like the more trivial a property is to check, the simpler the code to implement it is, and the less useful the test is in general. In the extreme case, a getter/setter pair is very easy to test with PBT, but you rarely need to be testing your getters and setters.
Counting digits is very closely adjacent to the general category of parsing and printing (or serialization/deserialization), which are both tricky to get right and have simple correctness tests. They are perfect candidates for property-based testing.
You’re misunderstanding me. I said that if you approach from the mathematics side of things, building on log₁₀ is the completely obvious approach to use. That is, if you’re already a mathematician, of course you’ll see if you can use logarithms, because that will be the way you’ll think—because you’ll view it as a mathematical and algorithmic task.
As for the handling of zero, well, that’s not about mathematics or about how a computer stores or represents data—that’s about how humans represent data. We choose to special-case zero, allowing it to have a leading zero which we never allow for anything else. All solutions in software will need to special-case zero.
This is a fantastic example of how seemingly simple programming tasks can teach deep concepts. His methodical approach to uncovering edge cases and alternative solutions demonstrates the importance of critical thinking and comprehensive testing in software development, a valuable lesson for all developers.
We don't want generated comments on HN, or even human-generated summaries. They are too generic to produce interesting conversation, besides which we want people to actually look at, and perhaps even read, articles.
Fortunately your earlier comments look fine so this should be easy to fix!