
Regex Puzzle - mboto
http://www.bbc.co.uk/programmes/articles/5LCB3rN2dWLqsmGMy5KYtBf/puzzle-for-today
======
bluesmoon
Well, there is [https://regexcrossword.com/](https://regexcrossword.com/)

~~~
samjs
The OP converted into this format:
[https://regexcrossword.com/playerpuzzles/595e5542d2433](https://regexcrossword.com/playerpuzzles/595e5542d2433)

~~~
angry_albatross
There's an error in the second "Across" expression.
[^PZVJG]{4}(.)[EFUG]{6}(.)[^\sPZVJI]{2} should be
[^PZVJG]{4}(.)[EFUG]{6}\1[^\sPZVJI]{2}

~~~
ColinDabritz
This threw me off for a bit.

The \1 is from the original puzzle, and refers to the value of the first
(capture group). I ran into this issue while filling out the puzzle. The \1 is
required to 'propagate' the value in that square to other places. The puzzle
is ambiguous without this fix.

Thanks for the pointer, I've added the note in the comments. Hopefully the
puzzle is editable.

Lovely puzzle, and it's a great quote. :)

------
bshimmin
Brilliant. My dad is 71, loves puzzles (like cryptic crosswords and Sudoku),
is a huge technophobe, and has just retired. This should keep him busy until
about 2022.

~~~
baron816
Technophobe or technophile?

~~~
bshimmin
Technophobe. Just acquiring the necessary Google skills to find out what the
proper regex rules are will probably take him till Christmas. But hey, the man
loves a challenge.

------
canada_dry
Regex is one of those tools that I use a couple times a year - usually for
cleaning up lousy input data.

I always end up spending a fair amount of time using tools like:

[http://regex.inginf.units.it/](http://regex.inginf.units.it/)

[https://regex101.com/](https://regex101.com/)

[http://www.regexr.com/](http://www.regexr.com/)

And of course stackoverflow.

~~~
squeaky-clean
I will always keep a Windows install at the ready just for RegexBuddy. I use
it mostly to take a regex and generate the code I need for it (e.g. find the
first numbered group in match in javascript), without having to remember
language specific details.

~~~
rgb122
Why don't you just learn learn Linux and pcre? What use are 2-bit windows
tools? Don't fill your head with windows - a dead os walking

~~~
dahart
I think parent said why: there's no RegexBuddy on Linux. I can see you're new,
so I won't be harsh but HN isn't the place for this kind of commentary.
Everyone here understands the difference between Windows and Linux. Judging
and/or trolling over Windows is boring, take it over to Reddit or something.

~~~
squeaky-clean
Yes, thank you. I of course use Linux, but I really don't care to remember the
specifics of regex libraries across PHP, Python, JS or Java. So I just work
out my regex, and from the drop downs choose "Use->Javascript->Chrome->Get
Text From Numbered Group". And it spits out like 6 lines of JS that will
handle cases of it either being found or not. You can choose the names of the
ingoing and outgoing variables.

You don't always get to pick the flavor of regex engine you're using and I've
sort of become (partly because of RegexBuddy) the "regex expert" at the
office. Aside from `re` in Python I don't even remember the names of the regex
libraries. Why should I?

~~~
dahart
The funny part (to me) is that it was already obvious you use Linux or mac --
some flavor of (star)nix -- because you said you keep a Windows install at the
ready. That implies to me that Windows isn't your primary OS. I keep a Windows
install at the ready too, for a whole bunch of reasons that have nothing to do
with how much I like or dislike Windows.

I would _love_ to be able to remember regex specifics from lib to lib and app
to app, but try as I might, I can't. I never know if I have lookaheads or
backrefs or named captures available and what the syntax is, I can't remember
if there are named character classes. I end up reading the docs, again, almost
every time I dig into a regex problem. Same reason for me- I use too many
flavors of regex libs. If I could stick to one language, I'd have some hope.

I haven't tried RegexBuddy, but now I'm going to because of your comment,
thanks for sharing!

~~~
squeaky-clean
I highly recommend it if you have to deal with regex a lot. I really wish it
was open-source, or there was some OSS alternative as good as it, but oh well.
The tools linked above are great for simpler usage.

The built in regex step-debugger is also great, though I've learned that if I
have to rely on that, it's probably not a task well suited to regex.

------
hokkos
I've worked on the project where some XSD files defined fields with regex
restrictions, also some rules over fields added other stricter regexps or
negative regexps depending on some context in a format called Schematron. I
had to generate XML files conforming to those XSD, so I used some tools around
Z3 solver and Microsoft.Automata to generate those strings conforming to
multiple regexps. It would convert the regexps to finite automaton and
intersecting them, walking it from the starting state to a final one over a
charset.

Links :

[https://www.microsoft.com/en-
us/research/publication/symboli...](https://www.microsoft.com/en-
us/research/publication/symbolic-automata-the-toolkit/)

[https://www.microsoft.com/en-
us/download/details.aspx?id=523...](https://www.microsoft.com/en-
us/download/details.aspx?id=52302)

It now seems to be Open Source (MIT):

[https://github.com/AutomataDotNet/Automata](https://github.com/AutomataDotNet/Automata)

~~~
eru
There's also redgrep
([https://github.com/google/redgrep](https://github.com/google/redgrep)) that
supports intersection and complements of regular expressions.

I am toying the idea of writing a little game where player A thinks of a
regular expression, and player B tries to guess. If B guesses right, they win.
If B guesses wrong, A has to provide a false positive and a false negative (if
they exist), and B gets to guess again.

Can you think of ways to automate the roles of A and/or B?

~~~
long
In computer science academia, this kind of game is called grammar induction
(of which inferring regular expressions is a special case).

A classic algorithm for inferring regular expressions was given by Angluin:
[https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/pape...](https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/papers/angluin87.pdf)

(This isn't quite the same setup as you're thinking of but there are a ton of
variations on the basic idea)

~~~
eru
Thanks. I had figured out that grammar induction was the right word to look
for a while ago. (But took me a bit to find it.) I know the paper you linked
to, but yes, it's not quite the right setup.

~~~
long
There's a conference on grammar induction called ICGI; might wanna browse
through the proceedings to see if there's anything closer.

~~~
eru
Thanks!

I'm basically interested in the equivalent of the "guessing game" for regular
expressions. (See eg [https://stackoverflow.com/questions/5440688/the-guess-
the-nu...](https://stackoverflow.com/questions/5440688/the-guess-the-number-
game-for-arbitrary-rational-numbers) for a rational number solution.)

With a fixed guesser, that would encode all regular expressions / finite
automata as sequences of binary digits. (But in a interestingly different way
from just serializing the table for a DFA, or writing down the regular
expression in ASCII characters.)

~~~
long
So I do AI research on something pretty related to the guessing game -- I'll
shoot you an email.

------
jgrahamc
Worth doing this by hand to exercise your knowledge of regular expressions. My
solution (SPOILER): [http://imgur.com/a/9iK9J](http://imgur.com/a/9iK9J)

~~~
rootlocus
I'm assuming the solution isn't unique because I found some positions that are
under-constrained.

~~~
vhold
I only found one column to have multiple solutions, and saved it for last, at
which point only one option made sense.

------
KineticLensman
This BBC report refers to a puzzle released by the UK's National Cyber
Security Centre [1], as part of an online recruitment effort.

[1] [https://www.ncsc.gov.uk/news/take-our-regex-crossword-
challe...](https://www.ncsc.gov.uk/news/take-our-regex-crossword-challenge)

~~~
arien
So I suppose is it a one time thing only? A shame, it was quite fun to solve!

~~~
simlevesque
There you go: [https://regexcrossword.com/](https://regexcrossword.com/)

------
Cephlin
Wow, finally a crossword I have a chance at!

------
dbrgn
If you want a challenge, try this one:
[http://twiki.org/p/pub/Codev/TWikiPresentation2013x03x07/reg...](http://twiki.org/p/pub/Codev/TWikiPresentation2013x03x07/regex-
crossword-puzzle.png)

~~~
gregable
I also created an HTML-based version of this one some time ago that allows
rotation and color codes the rows as matching or not:
[https://gregable.com/p/regexp-puzzle.html](https://gregable.com/p/regexp-
puzzle.html)

~~~
proactivesvcs
Thanks for the gregex in HTML format! (gets coat)

------
andyjohnson0
I know that there are problems to do with regex matching that are NP-hard. So
I'm wondering if it is possible to attack this puzzle using an algorithm that
simplifies the individual regexes using knowledge of the regexes that that
they interact with?

~~~
eutectic
This problem is NP-hard by reduction from SAT. Treat each column as a truth
variable and use the rows to encode CNF clauses. For example, `(A | ^C)`
becomes `(1..)|(..0)`. Then set all the column regexes to `( 0* )|( 1* )` to
enforce a consistent truth value for each variable.

~~~
jonahx
Could you elaborate on the encoding? What are valid mappings?

~~~
eutectic
A variable becomes '1' in the corresponding position and '.' everywhere else,
and similarly with negations of variables and '0'. The regex is then just an
alternation of these sub-regexes. This incurs just a linear blow-up in the
number of variables, for the '.'s.

For an n x m grid, you can encode any CNF formula with n clauses on m
variables. See here if you are unfamiliar with CNF:
[https://en.wikipedia.org/wiki/Conjunctive_normal_form](https://en.wikipedia.org/wiki/Conjunctive_normal_form)

------
mcbobbington
I love regexes. In addition to doing cool things and saving time, I feel like
I'm a "real programmer" whenever I write a good one.

~~~
gargarplex
This comic artistically renders that feeling. I, too, know it well.

[https://xkcd.com/208/](https://xkcd.com/208/)

------
Already__Taken
Anyone know a decent android app for these? the MIT one has the most insane
and broken scrolling functionality it's shocking.

------
Xophmeister
That wasn't as hard as I thought it would be. I was worried that, without
stard/end of string anchors, things could get quite hairy, but the biggest
stretch of logic was just, "There are five spaces for me to fit a character,
an optional characters and two two-character sequences. Therefore that
optional character must not appear."

------
Emyr42
Column H pattern starts [MVFU]{2}, and 3 of those options don't match the Row
0 pattern, leaving "U"

The published solution says H0 should be "S".

~~~
Emyr42
The version at
[https://www.ncsc.gov.uk/content/files/regex_cross_hard_v3.pn...](https://www.ncsc.gov.uk/content/files/regex_cross_hard_v3.png)
has S[MVU]... for column H.

Guess nobody tested it.

------
shabble
Does any common regex format/dialect require '\\-' for a literal hyphen? AFAIK
it's only special inside character classes, and escaping it doesn't
necessarily work there if it would form a valid range identifier.

~~~
okdana
I don't know of any that _require_ it. But it's common to see punctuation
characters escaped like that because Perl (and PCRE and its various
cousins/descendants) allows you to escape any non-meta-character and have it
treated as a literal.

I suppose the two main benefits are

(a) neither the writer nor the reader has to remember which punctuation
characters are meta-characters (you just have to remember that it's always a
literal if it's escaped), and

(b) in implementations like PHP's which try to replicate the Perl-style
'delimited' syntax (e.g., /foo/), it prevents characters in the pattern from
conflicting with the delimiters.

Maybe there's some other advantage but i can't think of what.

------
jwilk
Direct link to the crossword:

[https://ichef.bbci.co.uk/images/ic/976xn/p057t19t.jpg](https://ichef.bbci.co.uk/images/ic/976xn/p057t19t.jpg)

------
ape4
Since the clues are machine parsable it should be machine solvable.

~~~
hermanschaaf
It is indeed machine-solvable; I wrote a solver for regexcrossword.com puzzles
a while back ([https://github.com/hermanschaaf/regex-crossword-
solver](https://github.com/hermanschaaf/regex-crossword-solver)). It was great
fun, maybe even more than solving the puzzles by hand!

~~~
mtharrison
Will your tool work on this puzzle though? I don't think so because it has
backreferences.

------
gumby
Nice! At Keplers in Mountain View you can buy version of Scrabble that uses
regexes. The designer used to sell it in front of the shop -- he is obviously
a programmer.

~~~
tzakrajs
Could I stop by this afternoon and expect it to be in stock or was this a
temporary offering?

~~~
gumby
It wasn't a short-term item, but poor Kepler's has shrunk so much who knows if
it's in stock or not. I would call them. It wasn't described as using regexes
of course, so you'll have to say something like that special version of
scrabble.

The designer is local so if they no longer stock it you could look
online...but it's better to get it from the shop if you can.

------
timdierks
I believe column E is under-constrained; a solution with column E = "YYYY " or
"OOOO " passes the tests, but is clearly not what's intended.

~~~
timdierks
Never mind, this was an error in the
[https://regexcrossword.com/playerpuzzles/595e5542d2433](https://regexcrossword.com/playerpuzzles/595e5542d2433)
version, which has a (.) where it should have a \1 in row 2 (thanks to
@angry_albatross).

------
IanCal
Fun! I made a few mistakes by writing letters sideways which was then
confusing (C vs U, for example), but this was a nice puzzle.

------
gozur88
That's a very odd thing to see in a mainstream publication.

