
Regex Golf - subbz
http://regex.alf.nu/
======
josephlord

      ^(?!(..+)(\1)+$)
    

Why does that work on primes? I got it by mistake when fiddling with the
parenthesis locations but I was expecting to have to deal with xx separately.

~~~
surreal
Nice find. It works because it rejects "2 or more x's" repeated "2 or more
times". So xx doesn't get rejected, but any multiple of that (xxxx, xxxxxx,
...) will be. The same way xxx doesn't get rejected, but any multiple of that
(xxxxxx, xxxxxxxxx, ...) will be.

You've solved it using the actual definition of prime numbers, no trickery
needed. Well played.

FYI, you don't need brackets around the \1, so can score 286.

~~~
Aissen
More interesting than the definition of primes, it's almost the definition of
multiplication that is embedded in this regex. We have two numbers(of
occurrences) being multiplied:

\- the first one is represented by the group (..+) it represents the number of
occurrences n between 2 and +∞

\- the second one is represented by (\1)+. We will repeat the first number m
times, between 1 and +∞ times.

So the result of the multiplication is n*(m+1), which cannot be a prime. We
just have to take the opposite with negative lookahead. It's very beautiful
indeed.

See [http://regex101.com/r/qN2fQ8](http://regex101.com/r/qN2fQ8) or
[http://www.regexper.com/#^%28%3F!%28..%2B%29\1%2B%24%29](http://www.regexper.com/#^%28%3F!%28..%2B%29\\1%2B%24%29)
to follow the above explanations.

~~~
vijucat
Thanks for the web site links! Both are pretty interesting and I actually
learned something from the detailed description(s) that regex101 provides.

(I learned that for (\1)+, "Note: A repeated capturing group will only capture
the last iteration. Put a capturing group around the repeated group to capture
all iterations or use a non-capturing group instead if you're not interested
in the data")

~~~
vijucat
Later, I realized that in a regular program / on the command line, the
negative lookahead can be avoided by using the !~ (doesn't match)
operator,i.e., we just check that the number is not a non-prime using a
simpler regex:

    
    
      DB<63> print "Matches!" if (("x" x 31) !~ /^(..+)\1+$/)

Matches!

    
    
      DB<64> print "Matches!" if (("x" x 18) !~ /^(..+)\1+$/)
    

The Regex Golf site only asserts matches, i.e., it's using =~. That's why the
negation using negative lookahead was needed.

(The simpler regex merely looks for non-primes by matching any number of
characters which are a multiple of two numbers, n x m, i.e., those which can
be factorized. n comes from (..+), m comes from \1+).

------
chaz
Shouldn't the objective be to get the lowest score if it's called "golf?"

~~~
danceonfire
The goal in code golf is lower character count, not lower score - which is
also the case here.

~~~
talmand
Therefore, the score it provides should be a running tally of how many
characters you've used to make it match the scoring system of golf; which is
the lowest number of strokes wins.

The scoring system for this is incremental which is the opposite of golf.

A proper scoring system with this would provide a character limit (par) for
each section and the goal would be to write a shorter regex formula to
complete the task. Final score would be how many characters under or over the
total character limits (course par) you scored.

Seems this is more like Regex Darts or something like that.

But it's fun nonetheless.

~~~
danceonfire
I don't get what this is all about.

The title was (most probably) derived from Code Golf, which is a competition
in coding something with as few characers as possible. Code Golf was derived
from Golf, where you want to use as few turns as possible.

The score going up and not down, which is done because you get more points the
more objectives you fulfill, does not change the objective of this game or
what it is based on.

~~~
rschmitty
> The score going up and not down, which is done because you get more points
> the more objectives you fulfill, does not change the objective of this game
> or what it is based on.

Golf scoring penalizes you with more "points" by how many strokes you take.

If you are playing a Par 4 and it takes you 6 swings to get in the cup you
just got _penalized_ +2

If your partner gets in the cup in 2 swings he is awarded -2

Therefor "Regex Golf" has its scoring reversed

------
davidbrent
I have the most trouble with regex, and something about seeing my matches as I
type made this incredibly useful for me!

~~~
tokenizerrr
[http://regexpal.com/](http://regexpal.com/) or for a paid desktop program
[http://www.regexbuddy.com/](http://www.regexbuddy.com/), which is most
excellent

~~~
rschmitty
Regexbuddy is great. Well worth the coin I paid way back when

------
danielweber
Many times I would match the exact opposite opposite of what I wanted. Is
there a general rule for inverting regexps? ^ and ?! don't seem general
purpose.

~~~
josephlord
You need to pin the match to the start and end of the string with ^ and $
respectively otherwise the negation just matches an empty string or other
irrelevant string.

~~~
danielweber
Just to follow my thought pattern.

I start with

(.)(.)\2\1

Now to invert I change it to

(?!(.)(.)\3\2)

As you say, the negation matches an empty string. So I put on anchors:

^(?!(.)(.)\3\2)$

But now it matches nothing. I'm not even sure what that regexp says. The
entire string is a negation? Would anything match that regexp?

I see elsewhere on this page that the answer involves putting in an extra
dummy character, putting that new negation-and-dummy-character in parens, and
then requiring that, between the anchors, there be 0-or-more of negation-and-
dummy-character.

^((?!(.)(.)\3\2).)∗$

Two questions:

1\. Why are my backrefs still \3 and \2? I added another pair of parens. (I
thought ?! might not count, but it counted in my second example above.

2\. Why does abba no longer match? It has no matches to the negation-and-dummy
character construct, which ∗ should match, right?

NB: I used ∗ as my asterisk to avoid bb-code.

~~~
josephlord

      ^(?!.*(.)(.)\2\1)
    

You may also need to fill in the places where it could be anything. The above
worked on abba for me.

BTW a double space indent then formats as code on HN I think.

~~~
Procrastes
Your approach scores higher, but it only matches the (imaginary) space before
the "good" words.

I went with:

^(?!. _(.)(.)\2\1)._ $

with the thought that if I really wanted those matches I would want the whole
strings.

Fun game!

~~~
Procrastes
oops,

    
    
      ^(?!.*(.)(.)\2\1).*$
    

With proper formatting hopefully.

------
elwell
I figured out Ranges!

    
    
      abac|accede|adead|babe|bead|bebed|bedad|bedded|bedead|bedeaf|caba|caffa|dace|dade|daff|dead|deed|deface|faded|faff|feed
    

Edit: /s

~~~
abus
Prime:

    
    
        ^x{2,3}$|^x{5}$|^x{7}$|^x{11}$|^x{13}$|^x{17}$|^x{19}$|^x{23}$|^x{29}$|^x{31}$|x{33}

~~~
jonahx
josephlord's solution is much nicer:

^(?!(..+)(\1)+$)

~~~
abus
Look at the parent.

------
hadrel
Glob (333) without cheating (replace ⁕s with asterisks, they get turned into
italics):

^(\⁕?)(\w⁕)(\⁕?)(\w⁕)(\⁕?)(\w⁕) .⁕
((.(?!\1))+|\1)\2((.(?!\3))+|\3)\4((.(?!\5))+|\5)\6$

Edit: ((.(?!\1))+|\1) is used to conditionally match .+ iff a * has been
found. .(?!\1) Matches any character if it is followed by \1. When * has been
found then it matches no character, when * is not found it matches every
character.

Edit 2: Formatting to avoid the *s becoming italics :/

~~~
josephlord

      le[^*]|co|ito|dr|^p|su|gi|nr|hw|fa|[eo]b|ide
    

Glob 376 although it isn't pretty and better may be possible.

Indent two spaces with a blank line above to avoid code mangling.

~~~
falsedan
Glob 380

    
    
      ^([wlpb]|c[hor]|do|re|mi|\*[pifvt]|\*er)

------
sbirch
An interesting bit on the computational complexity of solving this problem
(with a slightly different scoring function):

[http://cstheory.stackexchange.com/questions/1854/is-
finding-...](http://cstheory.stackexchange.com/questions/1854/is-finding-the-
minimum-regular-expression-an-np-complete-problem)

------
ColinDabritz
Hrm, on number 8 "Four" using:

    
    
        (.)(.*\1){3,}
    

I got all but the "do not match" for "Ternstroemiaceae"

The challenge appeared to be to match words with four instances of the same
letter. "Ternstroemiaceae" contains four 'e's, and thus should be in the
"match" column, instead of the "don't match" column, no? Did I miss something?

~~~
denkfaul
Look closer, there's something different about the matches and this word.

------
amix
JavaScript solver:
[https://gist.github.com/amix/8063003](https://gist.github.com/amix/8063003)
;-)

------
daGrevis
I like the idea, but words there seems to be pretty random. I can't figure out
the pattern, not even talking about writing regex... :(

~~~
rplnt
Read a) name of the puzzle b) the little help below the score. It gives out
the pattern you should (not) look for.

edit: Not in Glob. What is Glob about?

~~~
schoen
It's about implementing * as a wildcard character:

[https://en.wikipedia.org/wiki/Globbing](https://en.wikipedia.org/wiki/Globbing)

I haven't figured out how to solve this with the parts of ERE and PCRE that I
know. (I definitely don't know the entirety of PCRE.)

It's straightforward for me to write a substitution using regular expressions
to create a pattern-matcher for a given glob (just anchor the ends and replace
literal ? with . and literal * with .*) but here we have to do it inside a
single regular expression.

I don't think there's a BRE solution if the number of stars is unbounded
because I don't think this is a regular language.

------
jpsim
Gist with my answers:
[https://gist.github.com/jpsim/8057500](https://gist.github.com/jpsim/8057500)

If you look at the revisions, you'll see my 1st iteration was mostly
identifying patterns, then with more and more cheating (and looking at this
thread) to squeeze every point possible.

~~~
galen_tyrol
[https://gist.github.com/jonathanmorley/8058871](https://gist.github.com/jonathanmorley/8058871)

3121 points

------
JadeNB
What are the rules? That is, are these Perl regexes, POSIX regexes, …? (Come
to that, what _is_ this site? Going up one level to alf.nu gives me a lot of
suggestions for what I can do by modifying the address, but no clue of who's
doing it on my behalf.)

------
hyp0
challenge: use machine learning to find the best solutions.

They might improve on those intended by exploiting accidental regularity in
the corpus - though charmingly, the golf-cost of regex length helps combat
this overfitting. They might also find genuinely cleverer solutions.

------
hadem
Reminds me of Vim Golf.

[http://vimgolf.com/](http://vimgolf.com/)

------
The_Double
Does anybody know how to do math with regex? (triples)

And conditionals don't seem to work?

~~~
danielweber
My guess is that 147 must appear the same number of times as 258, but I'm not
sure if that's even expressible in regexp.

~~~
johnlbevan2
Refined: [0369] can appear any number of times [147] and [258] must appear an
equal number of times, or for any remaining: [147] must appear a multiple of 3
times [258] must appear a multiple of 3 times

------
johnlbevan2
Ternstroemiaceae contains four es; anyone else hit that issue / know what
that's not in the valid results for Four? I'm guessing there's a pun in there
that I didn't get :/

------
lsv1
Fun learning tool for regex.

~~~
chrismorgan
Practise, maybe. Learning, probably not so much. Certainly fun, though, if you
can cope with them!

~~~
zedadex
Learning, kinda. I've often been motivated to learn something after being
given a problem that can only be solved (or be solved much more easily) using
it. Every other Excel trick I know is a result of that.

------
Trufa
I like the concept but the word choice doesn't seem too "regular", it is more
about catching all the particulars rather than finding a pattern as far as I
can tell.

~~~
ZirconCode
There are good patterns, the first regex for me was /foo/, the second /ick$/,
for example. They're there.

------
cpeterso
What purpose do the "plain strings" serve?

------
tareqak
I got Four for 196 with

    
    
        (.).*\1.\1.*\1
    

and Order for 156 with

    
    
        ^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$

~~~
gorhill
Order for 198:

    
    
       ^[^o].....?$
    

Probably not what was wanted, but it works (or maybe it was to trick people
onto a false path)

~~~
zedadex
This right here is why test cases are always blackboxed...

------
scott_karana
For Abba...

Why doesn't (.)(.)\2[^\1] work?

I thought backreferences matched the captured literal, so negating it would
match? But this looks the same as (.)(.)\2\1...

~~~
goldenkey
You cannot negate a capture, only a character literal. A capture might only be
a character but it is NOT a literal,

~~~
scott_karana
Gotcha. That makes perfect sense. Thank you! :)

~~~
goldenkey
You got it :-)

------
onaclov2000
Glob: 277, it didn't match all and not match the others, but it's reasonably
high. ^([bcdlmpwr]|\\*[efptv])

~~~
moron4hire
378:

    
    
      ^(\*(er|[fiptv])|b|c(?!a)|do|le|mi|p|re|w)

~~~
chingjun
379:

    
    
      ^\*(er|[fiptv])|^([blpw]|c[hor]|do|re|mi)
    

and it has "do re mi" in it!

~~~
moron4hire
380:

    
    
      ^.[^bds].*[^e-kjotz] .* [^eiz].+[^lx]..$

~~~
ekke
390:

    
    
      ^[lwp]|fa|r[ro]|[isytd]l|c$|de

~~~
3rd3
390:

    
    
        de|eat|fa|rr|ow|[rl]o|^p|[cd]$

~~~
jmallard
Glob 392: ^p|c$|[wrbc][npbro]|ai|fa|il

~~~
nwellnhof
Glob 396: ai|c$|ep|[bcnprw][bnopr]

------
ZirconCode
For "6\. A man, a plan", I thought it was impossible to match palindromes with
regex, am I wrong?

~~~
quarterto
It gives you hints below the score, for 6 it's:

    
    
      You're allowed to cheat a little, since this one is technically impossible.
    

No idea how you cheat...

EDIT, SPOILERS: I get 170 with ^(.?)(.)(.).?\3\2\1$

~~~
mryingster
176 with ^(.)(.).*\2\1$

~~~
hyp0

      ^(.)[^p].*\1$           # 177, "cheat a little"

------
endophage
Seems very very very broken. The regex "[a]" apparently matches "crenel"

~~~
pomfpomfpomf3
It doesn't. The green ✔ indicates that you've completed the task successfully
— that is, your regex does not match "crenel".

------
Pxtl
I hit enter and nothing happens.

~~~
danielweber
Use the yellow box, not the name box that gets your focus when you land on the
page.

This confused me for a few minutes.

------
bencoder
what's the pattern on "Abba"? I thought it was just to exclude doubled letters
but I have doubles on two words on the left hand side as well (noisefully and
effusive, in case the word lists are the same)

~~~
shdon
Excluding doubled consonants that have the same vowel on both sides of the
pair:

    
    
      ^((?!([aeiou])([^aeiou])\3\2).)*$
    

The following also works for the testcases and is shorter:

    
    
      ^((?!(.)(.)\3\2).)*$

~~~
surreal
Can get 2 more points with:

    
    
      ef|^((?!(.)\2).)*$
    

But I reckon yours wins for having a pair of breasts in the middle

------
joelanman
am I being a bit slow? Why doesn't [^g-z] work on 'ranges'?

~~~
martinml
Because for example "beam" matches _[^g-z]_. That is, it has a letter
somewhere that is not between _g_ and _z_ (namely _e_ and _a_ ). I came up
with _^[a-f]+$_ but I'm guessing it could be shorter :)

Edit: ah, every word in left column has 4 a-f letters. So [a-f]{4} is a
shorter match.

~~~
joelanman
ah you're right :) I was being slow, what I thought I wrote was 'words
consisting only of letters that arent g-z'

------
dsschnau
Me and a coworker totaled 3079 points. Anyone beat it?

~~~
ekke
Got hooked and to 3202, but ca 10-20 more points could be gained according to
answers here and there:
[https://gist.github.com/jonathanmorley/8058871](https://gist.github.com/jonathanmorley/8058871)

Kudos to the author of the game, good job.

PS. [http://regexcrossword.com/](http://regexcrossword.com/)

------
ddebernardy
Doesn't seem to do anything on an iPad...

------
xarien
Why is Kesha an optimal answer? (#2 k$) ;)

------
jhight
Spoiler alert (201 points): f[ao][no]

~~~
elwell
You can just write: foo

------
easy_rider
(.+|)foo(.+|)

Lol so awesome this. I oblige. Much love!

~~~
ryanthejuggler
Powers:

    
    
        ^((((((((((x)\10?)\9?)\8?)\7?)\6?)\5?)\4?)\3?)\2?)\1?$
    

I feel like there's gotta be a sneakier way of doing this.

~~~
rplnt
Mine is a bit shorter, though a bit more "meh" as well.

    
    
        ^x{32}$|^(x{2}){1,8}$|^(x{64})+$|^x$

~~~
galen_tyrol
improved, gives 80

    
    
        ^(x|(xx){1,9}|x{32}|(x{64})+)$

~~~
rplnt
I tried to get rid of those redundant ^ and $ but it somehow didn't work. I
probably forgot to put it all inside one group.

~~~
alanh
They are not redundant. Any string of one or more exes is going to constrain a
substring of exes of length 2^n (consider n=0 for a trivial proof), so you do
need those anchors!

~~~
rplnt
I meant my previous post where I had them in every single possibility.

------
sriharis
^ answers all questions.

