
Given a list of regexes, generate all possible strings that matches all of them - draegtun
https://github.com/audreyt/regex-genex
======
Jun8
This is really interesting, but wait: won't the number of strings matching a
given set of regexes be infinite, for many different types of regexes, i.e.
it's like the number of phrases in language that fit a certain grammatical
rule.

~~~
bartonfink
The github page mentions that * and + quantifiers are redefined to have limits
of {0,3} and {1,4} respectively. That wasn't what you were thinking of, was
it?

------
andrewcooke
nice idea. it doesn't say whether it can filter/flag duplicates, but that
seems like it would be useful functionality to add if not present (it can
indicate that you've made a mistake, or that your regexp is inefficient[* ]).

it's such a sweet idea, with an obvious implementation and use, that you'd
think it would have been done before. but i've not heard of anything and am
having a hard time finding a good search term that isn't swamped by regexp
howtos.

[* ] having said that, if it's compiled to a dfa then it might not be. a lot
depends on the implementation... [edit] oh, and since a dfa-related package is
a dependency my guess is that he can't display this info, because duplication
will be lost in the transformation. interesting.

~~~
audreyt
Nice suggestion and thanks for the compliment!

The inspiration for this work is the Regexp::Genex module from CPAN:
<http://search.cpan.org/dist/Regexp-Genex/> \-- though it uses a random-walk
approach for character classes, instead of enumerating all possibilities.

regex-tdfa was only really used for parsing regexes, so it's certainly
possible to find duplicates. That said, piping the output to "|perl -pe
's/.*\t//' | sort | uniq -d" is quite usable too. :-)

------
groks
Anyone aware of something which does the exact opposite: given a list of
strings generate a regex which matches all, and only those strings? The
shorter the regex the better.

~~~
audreyt
Unfortunately, the problem of finding a minimally equivalent regex from an
/alt1|alt2|alt3|.../ form is known to be PSPACE-complete (
[http://www.computer.org/portal/web/csdl/doi/10.1109/SWAT.197...](http://www.computer.org/portal/web/csdl/doi/10.1109/SWAT.1972.29)
), as well as not finitely axiomatizable (
[http://www.sciencedirect.com/science/article/pii/S0304397597...](http://www.sciencedirect.com/science/article/pii/S0304397597001047)
).

That means the naïve SMT-solver-based approach in genex will not apply to this
problem... Links/suggestions to relevant research welcome! :-)

~~~
groks
Well goddamn. Thanks for the links :-)

------
kordless
genex '(o|O|0)(_)(o|O|0)'

