
A regular expression to test for divisibility by 7 - eru
http://slexy.org/view/s2QFWeNxZo
======
btilly
The link is not loading for me. So I can only guess how it was done. But here
is how _I_ would do it.

First as you proceed through the digits of a number, you know that after each
digit you will have some remainder mod 7. And each digit changes that
remainder. Now let's invent a piece of notation. Let's say that (n...m) is a
regular expression that matches any string of digits that carries you from a
remainder of n to a remainder of m without in the middle having a remainder
that is m or lower, or n or lower.

So, for instance, (3...1) would match strings like '6' or '23' or '200', but
not '242' because after adding the digits '24' your remainder is 2, which is
too low for the middle.

Now with this piece of notation, the desired regular expression is simply:

    
    
      (0...0)+
    

And (0...0) is not to hard to figure out. It is

    
    
      (
        0 |
        7 |
       (0...1)(1...1)*(1...0) |
       (0...2)(2...2)*(2...0) |
       (0...3)(3...3)*(3...0) |
       (0...4)(4...4)*(4...0) |
       (0...5)(5...5)*(5...0) |
       (0...6)(6...6)*(6...0)
      )
    

Taking just one of those pieces, what is (0...3)? It is just

    
    
      (
         3 |
         (0...4)(4...4)*(4...3) |
         (0...5)(5...5)*(5...3) |
         (0...6)(6...6)*(6...3)
      )
    

And so on. If you expand all of the pieces out eventually you'll come up with
a horrible regexp without my invented notation that will test for divisibility
by 7. (And nothing stops you from repeating that for divisibility by anything
else.)

~~~
eru
My approach was more mechanistic, but may boil down to the same idea.

I build up the finite automaton for testing for divisibility by 7; one state
for each possible remainder from 0 to 6. Then I compiled this description of
an automaton to a regular expression using the textbook algorithm.

~~~
btilly
Of course from your description, both of us have incorrect algorithms. Because
we'd match things like '0777'. Instead you need to match 0 or match 1-9,
followed by a sequence that leads to 0, followed by anything. (I still can't
download the link, so I can't test whether you actually have this bug.)

If you add to my notation (n.>.m) for a match for any sequence descending from
n that finally reaches m, then the answer is something like this:

    
    
      ^(
        0 |
        (
          (
            7 |
            (1|8) (1.>.0) |
            (2|9) (2.>.0) |
            3 (3.>.0) |
            4 (4.>.0) |
            5 (5.>.0) |
            6 (6.>.0) |
          ) (0...0)*
        )
      )$
    

And, of course, you can expand something like (4.>0.) out into

    
    
      (4...4)* (
        (4...0) |
        (4...1) (1.>.0) |
        (4...2) (2.>.0) |
        (4...3) (3.>.0)
      )
    

And now we'll get an even bigger heinous mess. But a more correct one.

~~~
eru
I do match 0777, and consider it the correct thing to do. I don't care here
whether some programming languages consider leading zeros to indicate octal.
Also I treat the empty string as equal to zero.

It would be easy to extend the regex so that it would not match numbers with
leading zeroes or the empty string, if one wanted to.

If you give me your email address, I can send you the regex and the program
that generates it. (My address is in my profile, or you just post yours here.)

~~~
eru
P.S. Try this link <http://pastebin.com/q2AXes8u>

~~~
btilly
Thanks, that link works.

It looks like it did something similar to the strategy I described, but it
ordered the states differently. It also inserted a lot of unnecessary
parentheses. I also dislike the need for \ everywhere. With Perl compatible
REs you don't need them, and that gets rid of a lot of line noise.

I'm tempted to put together a Python program to generate these, with options
to control how much they expand out.

~~~
eru
I removed most of the unnecessary parenthesis, now. I also put my program on
<http://github.com/matthiasgoergens/Div7>

I targeted grep, because I know that grep handles regular expressions
properly. Perl does a very bad job, according to
<http://swtch.com/~rsc/regexp/regexp1.html>

Perl can take an exponential time to match (or reject). grep always finishes
in linear time. Because grep does regular expressions according to theory.
While Perl does something strange.

To quote: "This is a tale of two approaches to regular expression matching.
One of them is in widespread use in the standard interpreters for many
languages, including Perl. The other is used only in a few places, notably
most implementations of awk and grep. The two approaches have wildly different
performance characteristics ... The trends shown in the graph continue: the
Thompson NFA handles a 100-character string in under 200 microseconds, while
Perl would require over 10^15 years."

P.S. Good news: I found out that grep -E makes the backslash unnecessary. At
<http://pastebin.com/Dr7xk8in> you will find the new and shorter version.

~~~
btilly
There are disaster regular expressions where Perl will be slow. But most of
the common disasters disappeared a number of years ago. And your regular
expression is one that is unlikely to have performance problems due to
backtracking. (Though the regular expression engine may hate the size.)

That said, Russ Cox is generally well worth listening to.

------
fleitz
An arithmetic test for divisibility by 7. (x % 7) == 0

~~~
eru
What my program did, was compiling this expression down to the weaker language
of regular expressions.

------
petrilli
"Some people, when confronted with a problem, think 'I know, I'll use regular
expressions.' Now they have two problems." Jamie Zawinski on
alt.religion.emacs

------
eru
Just pop it into "grep -ex". I know the regexp is horrible. But I did not have
time to make the program that created the regexp, clean it up as well, yet.

------
Natsu
You can make a division regex for arbitrary numbers. Turn the number into
unary, then see if it matches the divisor (also in unary) repeated N times
with no characters left over.

I know someone posted something like that to HN a while ago.

~~~
eru
Yes, but how do you convert from decimal to unary in a regular expression?

~~~
Natsu
In Perl, you can use /e which is probably cheating as far as some are
concerned.

You could also probably do it by repeatedly applying a fairly complex regex
that would continually decrement by 1.

~~~
eru
Oh, I was talking about regular expressions in terms of
([http://en.wikipedia.org/wiki/Regular_expression#Formal_langu...](http://en.wikipedia.org/wiki/Regular_expression#Formal_language_theory)).
You can't do decrementing in this sense.

~~~
Natsu
At least for Perl folks, 'regex' acts as disambiguation, informing people that
they're talking about expressions that aren't truly "regular" in the sense
you're talking about.

But I guess that not everyone uses the term that way.

------
sgdfdfyhdfh
Well done. You get a coookie.

~~~
eru
Would you ship it?

