Hacker News new | past | comments | ask | show | jobs | submit login
Divisibility by 7 is a Walk on a Graph (2009) (tanyakhovanova.com)
335 points by znpy on Nov 9, 2015 | hide | past | web | favorite | 71 comments

This "walk on a graph" is known formally as a Deterministic Finite Automata. One of the interesting results of automata theory is that every FA has an equivalent regular expression (a subset of the "regexps" in most modern programming languages).

Where is this going? Behold, a 12,701 character regex for divisibility by 7: http://codegolf.stackexchange.com/questions/3503/hard-code-g...

This doesn't use any recursive features (it is a real Regular Expression in the theoretic sense) so it can be checked in linear time.

Note that the 12,701 character regex is WRONG. It doesn't even match "70". Even the binary regex is wrong.

The original 32K regex pointed to in the stackexchange problem statement however is perfectly correct.

Test code:

    open(F, '< /tmp/q.in');
    $myre = <F>;
    $myrecomp = qr/^(?:$myre)$/;
    while (<>) {
      if (m/$myrecomp/) {
        print "$_\n";
Examples of multiples of 7 not matched by the given regex: 70, 707, 7007, 11095

(I spent all morning trying to find the bug in my regex compiler based on the assumption that the answer there was correct)

> a 12,701 character regex for divisibility by 7:

I wonder if that's the optimal regex or can be reduced further using some rules[1] like the ones which we(humans) generally use to do it manually.

[1]. https://www.math.hmc.edu/funfacts/ffiles/10005.5.shtml

I wrote a 11811-character regex for divisibility by 7 back in August 2008: http://web.mit.edu/andersk/Public/regex

And unlike the one on stackexchange, it's actually correct!

(I have a regex -> DFA compiler here that I just threw your regex through. Seven states, connected as expected)

Do you have the source to the program you wrote the regex with? (Or did you write it by hand?)

It's definitely not the optimal regex. Finding the smallest regex is a hard problem (at least in theory): https://cs.stackexchange.com/questions/19686/is-regex-golf-n...

(I think I've read that finding the smallest one is not only NP-hard, but P-Space complete.)

Don't regular expressions have minimal, canonical forms?


The minimum DFA does not correspond in any particular sense to a minimal regular expression.

Of course not - my bad!

Yes they do. For reference, check Sipser.

Could you give a page number? I skimmed the regular expression section and couldn't find it. Also, while it is self-evident to me that there is a minimal form for each regex, the construction for finding one isn't immediately obvious to me (I can surely enumerate several ways of eliminating redundancy). Beyond that, I'm not sure a minimal form would minimize character length.

Of (a \star b \star ) \star and ( a ∪ b ) \star, which is the canonical form? The transformation from the former to the latter is nontrivial as a \star b \star is not equivalent with a ∪ b.

Of course, there is a shortest equivalent regular expression, but in general it's PSPACE-complete to find it: https://en.wikipedia.org/wiki/PSPACE-complete#Regular_expres...

It remains PSPACE-complete if you're given a corresponding DFA: "Jiang and Ravikumar [7] show moreover that the minimization problem for ... regular expressions remains PSPACE-complete, even when specifying the regular language by a dfa" from http://citeseerx.ist.psu.edu/viewdoc/download?doi=

In that sense, the grandparent is incorrect. It is unknown, in both practice and theory, how to find the shortest equivalent regex.

It's not unknown, it just takes a long time.

I think you should think deeper on the idea of "unknown"

I don't understand what you mean. Regex minimization is in PSPACE, so take any PSPACE complete problem, eg Q-Sat, do the reduction and apply the algorithm. Q-Sat can be solved by recursively enumerating all possibilities. So there is a method for minimizing regular expressions. It just takes a long time to run. If you have time you can probably think of an algorithm that works directly on the regex problem without doing the Q-Sat thing. It might even be faster.

Heck, there might be an algorithm that works really well in practice. People who do model checking and verification laugh PSPACE completeness in the face and are not even really concerned with undecidability.

I would be very interested in finding an approach that works well in practice.

Okay, fine, but in this case N=7. What are the constant factors?

The weird part is that the regex is much bigger than the DFA (but the DFA is not accepting the symbols directly, it's transforming them first

It also seems that regexes can't express accepting expressions like AB where A is accepted by itself (considering AB is accepted), which in the DFA is implemented by sending it to the initial state

> It also seems that regexes can't express accepting expressions like AB where A is accepted by itself (considering AB is accepted), which in the DFA is implemented by sending it to the initial state

B* does that.

This is a DFA over the alphabet {0, 1} where the number is first transformed by writing out each digit in unary with 1 and delimiting digits with 0.

Wow, I'm famous.

Here's another human-friendly scheme to test divisibility by 7 that I don't think is popularly known: Write down the number, then cross off the least significant digit, double it, and subtract that from what's left of the number. Repeat until you get zero or an obviously divisible-by-7 number, or not.

Example: 325 becomes 32 - 10 = 22, which is obviously not divisibile by 7.

Better example: 8638 -> 863 - 16 = 847 -> 84 - 14 = 70 -> 7 - 0 -> 7, so we have a winner.

Learned this as a kid (forget where), but it took decades for me to realize why it obviously works.

There's a similar test for divisiblity by 13: cross off the least significant digit, quadruple, and add to what's left.

Example: 325 becomes 32 + 20 = 52 which is divisible by 13. Example: 8632 -> 863 + 8 = 871 -> 87 + 4 = 91 which is divisible by 13.

It works because 39 = 40-1 is divisible by 13 (similarly to why the 7 test works because 21 = 20+1 is divisible by 7).

An equivalent test would be to use a factor of -9 instead of 4.

325 => 32 - 9 * 5 = 32 - 45 = -13

The method itself comes from the fact that what we're doing is really just modular arithmetic. Which has the wonderful property that, as long as all manipulations mod N don't fundamentally change the base we can keep repeating them.

Similar to my mod 7 breakdown:

  AB = 0 (mod 13)
  10*A + B = 0 (mod 13)
  40*A + 4*B = 0 (mod 13)
  A + 4*B = 0 (mod 13)
  A - 9*B = 0 (mod 13) // because 4 = -9 (mod 13)
This is neat. A similar construction should be doable for any number.

EDIT: I have a rule of not questioning down votes, but many of my comments that get down voted have something disagreeable in them (well, I agree with them, but it's typically political/management style/economics/etc.). But how does a break down of modular arithmetic become worth down voting?

Oops. That makes more sense now that you point it out. I thought they were just little scrolling arrows, and never understood why they didn't work right. Sorry about that.

(While it's a good general practice not to question downvotes, there are some cases that just leave you wondering what the other person was thinking. This is one of the explanations I've imagined, although usually I just assume big fingers on a tiny touchscreen.)

Doing it with -9 instead of +4 will terminate faster. But on the other hand one has to think about negative numbers. It's an interesting tradeoff.

"A similar construction should be doable for any number." - yes, except 2 and 5. You just need to find a multiple of the number that's 1 more or less than a multiple of 10.

In fact, we can even come up with a test for divisibility by 3 (or 9) this way: just add the last digit to the rest of the number repeatedly until you can tell if the result is divisible by 3 (or 9):

4752 -> 475+2 = 477 -> 47+7 = 54 -> 5+4 = 9 is divisible by 3 and 9 3752 -> 375+2 = 377 -> 37+7 = 44 -> 4+4 = 8 isn't divisible by 3 or 9 4755 -> 475+5 = 480 -> 48+0 = 48 -> 4+8 = 12 -> 1+2 = 3 is divisible by 3 but not 9.

Of course this ends up being equivalent to the usual "add all the digits" test.

Using the same approach I used to construct the formulation, divisibility by 2 and 5 actually just break down to the normal "is the last digit even" or "is the last digit 5 or 0" tests. We end up with:

  10*A + B = 0 (mod 2)
  B = 0 (mod 2)

  10*A + B = 0 (mod 5)
  B = 0 (mod 5)
Of course, these are trivially small cases and we've all learned the patterns. An interesting case with N = 4, the quick method I learned in grade school was to look at the last two digits. But using the approach I've adopted here, you get something more complex:

  10*A + B = 0 (mod 4)
  2*A + B = 0 or 2*A - 3*B = 0 (mod 4)
If we instead break N into ABC (A * 100 + B * 10 + C = N):

  100*A + BC = 0 (mod 4)
  BC = 0 (mod 4)
And we've got the method I learned in school.

EDIT: Formatting.

So why does this work?

...There was another answer here that seemed close.

I'm not sure the best way to express it, but here's what I've got:

Allow me to use a shorthand, N = AB where AB is shorthand for 10 * A + B, and B is the least significant digit. IF N is divisible by 7, then AB = 0 (mod 7). This allows us to do the following manipulations:

  AB = 0 (mod 7)
  10*A + B = 0 (mod 7)
  3*A + B = 0 (mod 7)
  5*(3*A + B) = 5*0 (mod 7) // EDIT: had -2 on the RHS, that was correct but confusing
  A + 5*B = 0 (mod 7)
  A - 2*B = 0 (mod 7)
So we've arrived at drfuchs' method. You only need to repeat the process until you've proven to yourself that A - 2 * B is divisible by 7 or not.

That's all I've got. I ran a half-marathon yesterday, and I've been up for 16 hours today. My brain might work better tomorrow.

EDIT: changed the steps a bit, had some unneeded ones. Also, this shows that you can use addition instead of subtraction, but you won't be moving towards 0, but you still have to be able to recognize that the new number is divisible by 7.

Thanks for this explanation! I was about to post something similar after working the math out, but I'm sure my answer would have been twice as long and less clear.

50 years from now this will be called "drfuchs last theorem"

with the scribble "but it took decades for me to realize why it obviously works" somewhere in the margins and stating that there wasn't enough room to write that down though..

You are subtracting enough 21s (ie: X * 21) to make the last digit zero. Since 21 is obviously a factor of 7, subtracting those should yield a factor of 7.

You can then divide by zero because if X is a factor of 7 and ends in zero, X / 10 is a factor of 7 as well.


you are subtracting multiples of 21, then dividing by 10, which is coprime with 7

The examples subtracted 10, 16, etc... Those aren't multiples of 21.

Since you scratched off the last digit, 10 is really 100, plus 5 for the digit you scratched off = 105.

105 is a multiple of 21.

Well.. Why does it, then?!

Another interesting feature of dividing by 7 is that the decimal repeats as:

0.142857 = 1/7 0.285714 = 2/7 0.428571 = 3/7 0.571428 = 4/7 0.714285 = 5/7 0.857142 = 6/7

you can see the first digit after the decimal point increases in numerical order through the available numbers in the pattern, the proceeds to repeat in the same order

Some other prime denominators behave the same way. For example, 1/17th is 0.0588235294117647 and the first digit after the decimal point increases like 0,1,1,2,2,3,4,4,5,5,6,7,7,8,8,9 ... Where there are 2 of the same number in the sequence, the first time you encounter that number you start from it's first ocurrence so for example 3/17 is 0.2941176470588235

This makes for a pretty spooky party trick but I can tell you from experience it doesn't attract the opposite sex.

I had some fun reverse engineering that. Here is my take at how it works.

Label all of the nodes 0 through 6, starting with making the white node 0, and following the black arrows. We can consider each of these nodes to represent a congruence mod 7.

Let n be the first digit of the number, and nXXX be the entire number. After we walk n nodes by following the black arrows, we arrive at the node n' = n MOD 7. Notice that n'XXX = nXXX MOD 7. More specifically, nXXX is divisible mod 7 if and only if n'XXX is divisible by 7.

Let m be the second digit. At this point, we reduced the problem to testing n'mXX MOD 7. Notice that n'mXXX = (10n' + m)XX. [0] Notice that following the white arrow at an arbitrary node, k, is equivilent to following 10k black arrows. [1] This means that after following the white arrow, we have reduced the problem to mXX. At this point, we simply recurse to the first step until we are out of digits.

[0] This construction is slightly improper, in that it involves a single "digit" having a value greater than 10.

[1] This can be verified by considering the value of 10k MOD 7, and comparing it with the destination node.

I wrote a post on reddit with some code to generate diagrams with generalized divisibility testing the last time I saw this idea (about two years ago): https://www.reddit.com/r/math/comments/1p9g8t/start_at_the_b...

Copied here for the lazy:

I wrote a small python script to generate DOT code for graphviz so that you can just input the base and the number to check division and automatically draw the graph. (The resulting graphs aren't as pretty as one made by a human, but it's kind of fun to poke around with it.)

Here's the script:

    base  = 10
    divby = 7

    print "digraph {"
    print "    0 [shape=doublecircle]"

    for i in xrange(0, divby):
        print "   ", i, "->", (i+1)%divby
        print "   ", i, "->", (base*i)%divby, "[arrowhead=empty]"
    print "}"
Save it to a file such as `make_graph.py`. Then, as long as you have the graphviz binaries on path, you can render an image like:

    python make_graph.py > graph.dot
    dot -Tpng graph.dot -o out.png
Here are some sample outputs:

- [base = 3, divby = 11](http://imgur.com/cwUmnME)

- [base = 10, divby = 7](http://imgur.com/O7GYw17)

- [base = 10, divby = 10](http://imgur.com/k6xC0CQ)

- [base = 10, divby = 11](http://imgur.com/1SSXKcK)

- [base = 16, divby = 7](http://imgur.com/QZzSyva)

Not all the edges are actually reachable -- e.g. base 10, divby 10 has a black edge from 9 to 0 even though you can't ever wrap around to 0 since `(10 * anything)%10` is zero -- so the biggest you can get to is 9, then when you follow the white edge for 9x you start over at zero... There may also be other edge cases I haven't thought of ;)

Whenever I'm bored I either doodle or randomly recall and work out random tidbits of CS theory. One that I did recently was to work out the DFAs that determine the divisibility by various numbers in base 2. You end up with some neat patterns emerging with primes versus non-primes and other details. You also get the modulus for free if you number your states.

WHY would you pick an example number that's not divisible by 7?! Academics facing the real world are the strangest creatures. Also this is really cool.

I agree the OA would be improved by two examples, one of each.

An answer: Counter-examples are important in Maths.

Anecdote: Oliver Sacks wrote about autistic twins [1] who could not speak or communicate well to people but who could perform prodigious feats of calculation, including factorising large pimes. Sacks sat with them with a table of prime numbers, and swapped some really large ones after which they took him seriously. Sacks took their reaction to his large primes as proof of their arithmetical ability.

The mathematician John Conway (the game of life chap) was quoted on an online forum decades later as saying that Sacks should have suggested some large non-prime numbers as a test of the twins. I'm searching for a reference!

[1] http://math-frolic.blogspot.co.uk/2013/02/prime-savants-fasc...

I liked your anecdote. It led to this more detailed analysis of the twins abilities to detect primes:


This didn't bother me. I assumed they picked that number _because_ it didn't work, to emphasize that only some numbers would end up back at the white node.

Clear, simple communication is a skill that takes time and practice to develop! Pointing out that you're not bothered by this strange example will hopefully be a good point for you to reflect on later.

Similarly, it can be difficult to recognize when you may come across as snarky and passive-aggressive. Perhaps it will be useful for you to reflect on this comment later.

This is the first example of self-recursive snark I've seen on HN. Congrats.

Thank you, I was thinking the same thing. I followed the graph twice and thought I was doing it wrong before I mentally divided 325 by 7 to find that it's not a multiple of 7

Why not? The point is to be able to distinguish the cases.

This exact state machine -- divisibility by 7 -- is one of my favorite state machines ever! It was a homework problem in college, and one of the few that stuck with me over the years.

The implication of a DFA for divisibility by 7 is that it's easy (with practice) to figure divisibility by 7 in your head for arbitrarily long numbers as fast as someone can say them. Nice party trick! ;) I've also been having debates with people about whether divisibility testing in your head is a simpler operation than long division in your head. When you break it down carefully, long division in your head for a single digit divisor is very similar, and also surprisingly easy to do, the main difference might be simply having to remember quotient digits, and nothing more.

I'm curious where the DFA for divisibility by 7 might have originally come from, but I'm pretty sure it was a homework problem in the text I used, "Elements of the theory of Computation" by Lewis and Papdimitriou, Prentice Hall, 1981.

Wow that's awesome. I decided to re-implement divisiblity by 7 in this manner, since integrating with <horrible system> could wait a few minutes. :)


Edit: s/division by 7/divisibility by 7

Thanks for pointing that out, omaranto!

That is only a test for divisibility by 7, not 'division by 7', which is typically understood to compute the quotient.

For an explanation on the construction: http://math.stackexchange.com/questions/413296/how-does-the-...

For a bit of an explanation, see David's follow-up blog post: http://blog.tanyakhovanova.com/2010/08/divisibility-by-7-is-...

Another quick way to check divisibility by 7 is to multiply each digit by 3^x where x is the place (right to left). Good way to check divisibility in your head and useful up to 4-5 digits.

As an example, consider 413: 4x9 + 1x3 + 3 = 36 + 3 + 3 = 42

42 is divisible by 7 therefore 413 is too.

  10 = 3 (mod 7)
So this works because you're substituting the powers of 10 with powers of 3, and we're operating under mod 7 arithmetic. Conveniently, this can also be repeated like drfuchs' method if your numbers are too large to recognize as a multiple of 7 after the first pass.

  413 => 4*9+1*3+3 = 42 => 4*3+2 = 14 => 1*3+4=7

We used to call this casting out nines.

Oh I still have nightmares of this. As others point out, you can use DFAs to determine divisibility by various numbers (base, well, anything you like). I remember spending a few hours banging my head against the desk, trying to figure out how it works with only a bunch of crappy net articles. But it is somewhat fun after you understand it.

Can we build such state diagrams to check divisibility by other digits too?

Yes, it's easy: to test divisibility by n make a graph with n verities labeled 0 through n-1; put black arrows from each vertex k to (k+1) mod n; and put white arrows from each k to 10k mod n.

(That's for testing divisibility by n of numbers written in base 10; for others bases just replace the 10 above.)


If we're trying to compute 3256 % 7 and we know that 325 % 7 = 3, then 3256 % 7 = (325 * 10 + 6) % 7 = (3 * 10 + 6)

Computing 10 % 7, 20 % 7 etc generates the white arrows.

I think unless we're checking divisibility by primes, though, the white and black arrows won't share edges

My variation of this requires the input number (I) to be in binary, and read from leftmost digit to the right (most significant to least significant). Here's the table version of N = 2 through N = 4.

  N = 2

  | state | 0 | 1 |
  |     0 | 0 | 1 |
  |     1 | 0 | 1 |

  N = 3

  | state | 0 | 1 |
  |     0 | 0 | 1 |
  |     1 | 2 | 0 |
  |     2 | 1 | 2 |

  N = 4

  | state | 0 | 1 |
  |     0 | 0 | 1 |
  |     1 | 2 | 3 |
  |     2 | 0 | 1 |
  |     3 | 2 | 3 |
Each state represents a potential result of I mod N. Constructing it becomes mechanical, as in each state you can either double it (input digit D = 0) or double it + 1 (input digit D = 1). So in N = 4, determining the outputs of state 2: You can double it, and 4 mod 4 = 0, therefore the next state is 0; You can double it + 1, and 5 mod 4 = 1, therefore teh next state is 1.

As to making this particular style graph, I'm currently playing around with it.

EDIT: Hmm, left something out. You can convert this to any base, you just end up with more transitions out of each state (for a base B, you need B transitions).

EDIT Again: omaranto has it correct for generating the article's style of graphs. Guess I should've refreshed before posting, but working it out myself was fun and successfully kept me from sleeping.

Seems like something Euler should have already done :-) I jest, of course.

This gave me an unreasonable amount of pleasure doing.

help me, i can't get it to work for 1603.

It works. Follow:

(1) 1 black then 1 white.

(6) 6 blacks then 1 white.

(0) 1 white.

(3) 3 blacks then 1 white.

And you end up at the bottom node, indicating it is divisible by 7.

You should be following the 1 white path before the n black paths, not afterwards. In particular, following "3 blacks then 1 white" in the last step, when the final digit is three, is an error; it will produce the wrong modulus for any number that is not divisible by 7.

    (1) 1 white then 1 black,  ending up in state 1
    (6) 1 white then 6 blacks, ending up in state 2
    (0) 1 white then 0 blacks, ending up in state 6
    (3) 1 white then 3 blacks, ending up in state 0
The graph keeps track of ((the number so far) mod 7); following a white arrow multiplies your total by 10, and following a black arrow increases your total by 1.

I can see how you'd like to hold the order of operations correct for readability purposes (especially if the states were labeled with digits), but I don't see where psyklic's method could produce an error for the divisibility test.

Your initial white doesn't change anything, since it brings you back to the starting node. Any whites after the the final digit either keep you on the starting node or keep you in the graph where.

The graph isn't just a divisibility test, it reports the remainder after division by seven. I specifically acknowledged that this error won't mutate a "divisible by 7" result into "not divisible by 7" (because any number that is divisible by seven will still be divisible by seven after multiplying it by ten), but in every other case it will give you the wrong answer. (e.g. "remainder 2" will mutate into "remainder 6".)

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact