
RegExr 2.0 - _kushagra
http://www.regexr.com
======
bane
Regex testing is cool, but there are dozens of these kinds of tools and I'd
really love to see some other kinds of regex tools

\- A list generator. Enter a regex, set repetition operator constraints (e.g.
_- >{0,3}, +->{1,3}, .->[A-Z0-9 ], etc.) and have it exhaustively generate a
list of matching strings. This is helpful when you have a regex that matches
your test strings, but also to let you know what _else* it'll match. The
constraints are to keep it from generating infinite lists. Even if it jams out
tens or hundreds of thousands of produced strings, it's still useful. I've
found that most people just build up the first regex that will "match" their
input text, and move on without thinking about all the edge cases they've just
introduced.

\- A regex assembler optimizer. Give it a few regexes, have it assemble them
into one large regex and optimize it. It's got to do better than just | or'ing
all the regexes together. I've seen some work done on using trie variants to
do this, but have no idea how far along the work is on this.

\- A regex list generator. Give it a list of strings you want to match and
have it generate a regex. A sliding "fuzziness" control could tell it to take
alternates in the same character position and substitute either

1\. Just the characters in the given list - a, t and q in the same position
generates a|t|q

2\. A representative narrow character range - if I give it a|t|q it knows to
use [A-Z] while a|t|q|4 might generate [A-Z0-9]

3\. A larger character range, a|t|q might just go ahead and produce [A-Z0-9]

4\. An even larger character range, whatever it is, just use .

And maybe another slider for repetitions, so if I end up with [A-Z][A-Z][A-Z],
should it just produce [A-Z]{3} or can I go ahead and have it [A-Z]+

Jam the result through an optimizer (see previous idea above) to clean up the
regex and maybe even run it through the list generator to check if it produces
only what you want.

~~~
lespea
I actually do the combining idea all the time. As long as the language is
roughly pcre compatible you can use this to spit out your regex and (if
necessary for your alternate language tweak it a bit so it fits).

I've generated some very massive regex's that are quite speedy.

Merger

    
    
      https://metacpan.org/pod/Regexp::Assemble
    

These are also super handy

    
    
      https://metacpan.org/pod/Number::Range::Regex
      https://metacpan.org/pod/Regexp::Common

~~~
bane
Yeah, Regexp::Assemble was what I had in mind. There's a few that try to
generate a list of matching strings from the expression, but I've never been
satisfied with their output. Either they're slow, or don't let you constrain
the regex, and all of them don't generate comprehensive lists for some reason.

------
lelf
People just cannot do unicode even remotely properly. Just cannot.

𝄞 is one char, not two. привет is matched by \w+.

PS there's some advanced stuff but where is basic [[:posix:]] char classes?

~~~
eurg
Just to make it clear: It does not even support the basic Latin-1 charset
correctly. Matching my family-name requires manual intervention. This is sad.

It seems a very nice regex page otherwise.

~~~
gskinner
Creator here - can you elaborate? What is your family name? The example in
this thread ("Grüneis") matches and displays correctly in all the browsers
I've tested.

Are you perhaps trying to use a RegEx feature that is not supported by JS?
Currently, RegExr only supports the JS flavour of RegEx.

~~~
eurg
Forget it, I was not used to JavaScript RegEx. I just looked it up on MDN, and
it really defines `\w` to be very limited. Doesn't really make it any better,
but whatever.

------
shock
> Uh-oh, it looks like your browser is not supported.

> RegExr only supports modern desktop browsers.

I'm using Firefox 30 on Ubuntu. I think it's plenty modern :)

~~~
LukeB_UK
I get the same message with chrome 34 on android 4.4.2

~~~
rguldener
Pretty sure Android is not commonly considered a desktop system ;) Though
mobile (or at least tablet) support would be cool

------
markbnj
Very nicely done. As someone else pointed out there are quite a few of these
tools, but I think you've done a really nice job with this one. One
suggestion: make the reference easier to scan at a top level as opposed to
drilling down.

------
nlh
I'm guessing the following is either near-impossible or pure-impossible, but:

Is there a tool that allows you to highlight portions of a string and generate
a corresponding regex? (i.e. the inverse of RegExr)

~~~
gamegoblin
Here is the problem with that:

Consider the string abcdefgh

Guess what!? I have the perfect regex to match your string.

    
    
      "abcdefgh"
    
    

So given a string literal, there is always a regex to match that literal.
Namely, the literal itself.

Really, what you _want_ is a tool that, given several examples, will generate
a regex that matches all of them.

So you'd give it:

    
    
      aaaaabaa
      aabaaa
      aba
      abaaaaa
    

And it'd generate "a+ba+"

The _problem_ with that is, given a corpus with a set of tokens { T0, T1, T2
... }, I can give you a regex that will match the corpus!

    
    
      "[T0 T1 T2 ... ]*"
    

or even

    
    
      ".*"
    

So it will match everything in your corpus! But unfortunately, it will match a
whole lot you don't want, too.

So ideally you want a regex that matches everything in your corpus, but
nothing outside the language you are trying to describe. This requires both
positive and negative learning examples. The problem is that for most
applications, you'd need a _lot_ of negative examples.

Source: Working on this exact problem for graduate research

~~~
nmrm
T0 | T1 | T2 | ... would match exactly the correct thing with all positive
examples, and (T0 | T1 | T2) & !(CE1 | CE2 | CE3) would match exactly the
correct thing with positive and negative examples.

But that's pretty stupid, because you don't generalize beyond your examples.

What's your approach?

<em>edit: removed random conjecture</em>

~~~
gamegoblin
You have to have some sort of heuristic that determines what a "good" regex
is, since there are undoubtedly multiple regexes that describe a corpus.

A simple heuristic is the smallest regex.

So in your example, given the training examples:

    
    
      aba
      abaa
      aaaaba
    

and the counter examples:

    
    
      abba
      ba
      ab
    

It's clear to a human I probably want to match "a+ba+". That's clearly much
smaller than ("aba" | "abaa" | "aaaaba") & !("abba" | "ba" | "ab"), so it
would be a "better" regex.

------
ChrisGaudreau
Reminds me of [http://rubular.com/](http://rubular.com/), except it isn't
Ruby-focused and is more community-based. Seems pretty cool.

~~~
dehrmann
Or [http://www.regexplanet.com/](http://www.regexplanet.com/), but regex
planet supports a lot more flavors.

------
mck-
There is one for Javascript that I use pretty often:
[http://scriptular.com/](http://scriptular.com/) (based on the Ruby one:
[http://www.rubular.com/](http://www.rubular.com/))

------
cygni
[https://www.debuggex.com/](https://www.debuggex.com/) is a nice alternative
that is a little different from other regex sites I've seen.

------
strictfp
Why can I not match against "\w*" for instance? It just says "infinite" and
does not seem to attempt to match.

~~~
gskinner
Creator here - this is because \w* matches 0 characters, and thus matches
infinitely. You can roll over the "infinite" error for details, or look in the
help.

Try \w+ instead.

~~~
strictfp
But \w* matches "" and "abc" but not "!a". How can I test this with your tool
if \w* always says "infinite"?

------
hardwaresofton
one of the best regular expression testers online just got better. Great site,
love it

------
tuananh
a side note: i found Patterns app on OS X very useful for regex.

