

Ask HN: List of words – List of regexes tool? - e7mac

Does anyone know of a tool that would input a list of strings and output a list of regexes that match the strings? An intelligent regex pattern matcher ? If not, what do you think about building something like this?
======
_jomo
There is this insane email validating RegEx [0]. The page says:

> I did not write this regular expression by hand. It is generated by the Perl
> module by concatenating a simpler set of regular expressions that relate
> directly to the grammar defined in the RFC.

There's also the famous xkcd Regex Golf [1]. Peter Norvig writes:

>So that got me thinking: can I come up with an algorithm to find a short
regex that matches the winners and not the losers?

And he described his steps to create a RegEx using a list of words that must
be matched and those that must not be matched [2]

[0]: [http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html](http://www.ex-
parrot.com/~pdw/Mail-RFC822-Address.html)

[1]: [https://xkcd.com/1313/](https://xkcd.com/1313/)

[2]:
[http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313....](http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb)

~~~
e3pi
If how to create these giant matchstick regexes interest you, there is a
wonderful(famous?) perl script generating a regex 6,598 chars long, more
optimized and faster than earlier attempt at 4,724 bytes, in Jeffrey Friedl's
book, Mastering Regular Expressions, 1st edition, Oreilly, pp 312-316, Appedix
B: Email Regex Program.

------
alansammarone
Well, this is somewhat vague. There's more than one way of matching any string
- you'd have to be more specific about what exact form you want your regexes
to have.

~~~
e7mac
True. I was thinking about a tool that would give you a list of regexes,
ranked by some factor that aims to get at the regex pattern that'll be useful.
For example, if I gave it, [name@email.com, name2@something.com] it would give
all kinds of regex patterns, but ideally point to [a-z]@[a-z].com

~~~
logn
I think you need to consider more specifically what you want. Another possible
regex would be /name2?@(?:som)?e(mail|thing)\\.com/ or even
/(name@email\\.com|name2@something.com)/

I was able to find this:
[https://github.com/noprompt/frak](https://github.com/noprompt/frak)

