Hacker News new | comments | show | ask | jobs | submit login
Regularly divisible (alexbowe.com)
18 points by michaeltwofish on Nov 24, 2010 | hide | past | web | favorite | 9 comments

A simpler solution is to use modular arithmetic (this works for any base).

If we consume a binary number from left to right, let N be the binary value already expressed. If we then consume a 0, then the new value is now 2N. If we consume a 1, then the new value is 2N+1.

Create a 3-state machine where state 0 represents a running value of N mod 3. Starting state is 0. Follow all the transitions. e.g. from state 2 (N = 2 mod 3), if we consume a 0, we are now at 2N + 1 = 1 mod 3, so f(2,0) = 1. Full transition function is:

    f(0,0) = 0, f(0,1) = 1
    f(1,0) = 2, f(1,1) = 0
    f(2,0) = 1, f(2,1) = 2
This is trivial to turn into the regex (0|1(01* 0)* 1)* (had to add spaces to stop italicisation). If leading zeroes are a problem you have to modify the FSM a little which makes the resulting regex a little more complicated.

I should probably clarify that it only becomes trivial to turn that FSM into a regex when you've drawn it out! I encourage HNers to do this and see what I mean. In my experience, just inspecting the transition function visually isn't much use at all.


If the regex was (0|1(01* 0)* 0)*, then that would accept "10", which is not divisible by 3.

You can construct a regular expression / state machine for these fairly easily.

Let X be the number we have seen so far.

State A: X mod 3 = 0 State B: X mod 3 = 1 State C: X mod 3 = 2

We'll need a separate start state if you swing that way.

Now, observe what happens if we have interpreted a binary number and then suddenly stick a 0 at the end of it. It gets multiplied by 2. Not shocking. If we stick a 1 at the end of it, it gets multiplied by 2 and one gets added. Still not that hard.

OK, I'm going to rename our states now. Instead of X, we're going to describe them in terms of k. k is some positive integer which is a multiple of 3 -- we don't care which.

A: k + 0 B: k + 1 C: k + 2

Now, if we multiple k by 2, we get 2k, which we know is also going to be divisible by 3 because k is. So a 0 transitions A to A.

If we multiply k by 2 and add 1, we get 2k + 1. 2k + 1 % 3 = 1, because 2k is evenly divisible by 3. So a 1 transitions A to B.

The next four transitions are left as an exercise for the reader -- they're pretty trivial.

You can use this same general pattern to construct any "divides by N" state machine.

Going from a state machine to a regular expression (particularly the smallest possible regular expression) is a bit trickier.

Just to link the whole thing up (in case someone finds this thread via Google, for example), here's my demonstration of the general algorithm that can find a regular expression for testing divisibility by any number, in any base: http://s3.boskent.com/divisibility-regex/divisibility-regex....

I wrote up something similar a while back: http://bmm6o.blogspot.com/2008/03/divisibility-testing-and-p... .

The author constructs his regular expression ad hoc, by trying examples and spotting patterns. His proof that it's correct consists of trying it for the first 5000 non-negative integers. Here's a more principled approach for those who prefer to know that, why, and how something works.

I'll use "@" instead of the usual asterisk to denote "repeated zero or more times" to avoid HN formatting issues.

First, let's construct a finite state machine that does the job. It needs three states, one for each value the number-so-far might have mod 3. You can pretty much write this down immediately. State 0 goes to itself on seeing a 0, and to state 1 on seeing a 1. State 1 goes to state 2 on seeing a 0, and to state 0 on seeing a 1. State 2 goes to state 1 on seeing a 0, and to itself on seeing a 1.

Now, there's a general technique for converting an FSM to a regular expression. The idea is that you label edges with regular expressions and not just single symbols. Then you eliminate states one by one, until you're left with only the start state and the accepting end state. To eliminate a state y, consider all x,z such that there are transitions x -> y -> z; replace y and those transitions with a bunch of transitions straight from x to z, of the form [what it takes to go x -> y] [what it takes to go y -> y]@ [what it takes to go y -> z]. Repeat. (When you get multiple edges from one state to another, use the | operator.) Eventually you just have a start state and an end state, and then the label on the edge from one to the other is the RE you need. Unless the start and end states are the same, in which case you need to allow arbitrary repetitions of that RE, so add another star.

OK. So now we eliminate state 2. It happens that the only edge we need to add goes 1 -> 1, and it's labelled 01@0. In other words: if you have something that's 1 mod 3, then the bit-sequences that take you back to 1 mod 3 while visiting only {1 mod 3, 2 mod 3} are those of the form: 0, maybe some 1s, 0.

So now we have states 0 and 1, with transitions 0 -> 0 (labelled 0), 0 -> 1 and 1 -> 0 (both labelled 1), and 1 -> 1 (labelled 01@0). State 0 is both the start state and the accepting end state.

And now we eliminate state 1. In just the same way, the only edge we need to add goes 0 -> 0, and it's labelled ... well, we have 0 -> 1 [1], then zero or more of 1 -> 1 [01@0], then 1 -> 0 [1]. In other words: 1(01@0)@1.

So now we have only one state, which has the 0 -> 0 transition labelled 0 that it always had, plus a new one, labelled 1(01@0)@1. That's equivalent to a single transition 0 -> 0, labelled 0|1(01@0)@1. So, finally, our RE is obtained by starring that: (0|1(01@0)@1)@. Where, to repeat, "@" means what's usually denoted by an asterisk.

This RE is actually simpler than Alex's: fewer characters and fewer stars. I don't know which will execute faster in a typical RE engine.

[EDITED to fix asterisk screwage -- I had no idea that HN italicization works across paragraphs -- and to add: while I was writing this, qntm made a similar comment. Except that I think constructing the FSM is trivial and converting it to an RE is nontrivial, whereas qntm thinks constructing the FSM is nontrivial and converting it to an RE is trivial. Hopefully the combination of qntm's comments and mine will make everything clear.]

has anyone considered that all numbers where the sum of digits are divisible by 3, the whole number is divisible by 3?

Interesting problem. Well explained.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact