
Regexper: Beautiful regexp visualizations - trevmex
http://www.regexper.com/
======
javallone
Hello, I'm the creator of this (trevmex is one of my co-workers).

I just want to thank everyone for the feedback so far, I am looking into the
issues that have been brought up (they'll have to wait until this evening to
be fixed though...I have a day job).

~~~
Tycho
You should build a tool that is the inverse of this. Let people build regular
expressions by creating the visual diagram (some sort of click and drag GUI).

~~~
jbrennan
I've always wondered why there doesn't already exist such a tool. Regex
strings are so opaque and hard to read. We can't really fix that for code
(short of building objects), but what we could fix is a better way to write
them.

A graphical tool to create them, built of units instead of just strings, and
the output could still be a regex string.

~~~
mpyne
KDE 3 had KRegExpEditor (<http://www.blackie.dk/KDE/KRegExpEditor/>) but it
only recently found someone interested in porting it to KDE 4.

------
tibbon
Just a thought- on the page if the author showed a few example regexes that
would be really useful. Rubular does a great job of having a 'key' at the
bottom, so I don't have to remember (or look up) anything.

~~~
marbu
This is a good point. Also mentioning which flavour of regexp is supported
would be nice.

~~~
rtkwe
The input bar says Javascript style before you start typing something in..

~~~
tibbon
Since I do mostly back-end stuff, I'm quite bad at Javascript regexes off the
top of my head :)

------
Twisol
This is very cool! If you put enter this regexp, you get a visualization of a
Telnet protocol lexer.

    
    
      (?:[^\xFF\x0D]+|\x0D(?:\x0A|\x00)|\xFF(?:[\xFA\xFB\xFC\xFD\xFE].|\xFF|.))*
    

The graphic makes it obvious why Telnet requires NUL to follow any bare CR's:
if it didn't, you would need a byte of lookahead!

------
kentwistle
This website is awesome! Instantly bookmarked.

Suggestions

1) it would be really nice if you could allow the user to share a regex with a
friend. IE <http://www.regexper.com/shared/some_unique_slug> will display my
saved regex

2) Add a Favicon

Great work thanks for creating this

~~~
Too
Why not simply the regex itself in the url instead of some unique slug? - No
need to store state on the server

~~~
Goopplesoft
I'd avoid it to prevent url encoding problems.

~~~
SquareWheel
You could base64 it first.

~~~
RyanMcGreal
With an effective maximum URL length of 2000 characters, you'd be stuck with a
regex limit of around 1300 characters once you take into account the overhead
of base64 encoding.

~~~
SquareWheel
I'm pretty sure this webapp isn't capable of rendering any regex with 1300
character anyway (or if it could, the client couldn't handle that much SVG). I
don't expect URL length would be the limiting factor here.

------
robomartin
Interesting concept. As with all regex related online tools, it is important
to test exhaustively before trusting it.

Real-time character-by-character output would be important. If you are working
on a long regex expression real-time output can help you think like the FSA
and see how it behaves.

Here's an example of this tool failing:

    
    
        <a.*?>([^<]*)<\/a>
    

<http://gskinner.com/RegExr/?339m7>

or

<http://rubular.com/r/pCXZs0DnSS>

take your pick, same results.

If you remove the "?" from the regex you'll get different results.

One version selects all anchor tags individually and make a selection group of
their contents and the other version ends-up selecting everything between the
first opening anchor tag and the last closing anchor tag into the selection
group except for "<" characters.

The graph in Regexper does not show any difference between the two expressions
even when any one of these forms is used:

    
    
        #<a.*>([^<]*)</a>#msi
        
        #<a.*>([^<]*)</a>#gsi
        
        /<a.*>([^<]*)<\/a>/msi
        
        /<a.*>([^<]*)</a>/gsi
    

It looks like the flags are not being processed.

Also, please change the textbox to a fixed width font.

In addition to that, regex authoring tools, ultimately, are only useful if you
can also enter some input text and see the result, preferably in real-time.

Other than that, it's an interesting concept.

~~~
mlacitation
With regards to the real-time analysis, you might find perl's Regexp::Debugger
fun to use:

[http://search.cpan.org/~dconway/Regexp-
Debugger-0.001011/lib...](http://search.cpan.org/~dconway/Regexp-
Debugger-0.001011/lib/Regexp/Debugger.pm)

That's the documentation for using the module in your code. If you're
interested in the standalone tool, you'll want rxrx:

[http://search.cpan.org/~dconway/Regexp-
Debugger-0.001011/bin...](http://search.cpan.org/~dconway/Regexp-
Debugger-0.001011/bin/rxrx)

------
chiph
Nice, but would be nicer if it had a couple of samples to show what the site
can do for you.

------
cookingrobot
This is great. Has this approach to visualizing loops and structures been
tried with general programs? Indentation and color coding seems to be the
limit to the visual expressiveness in most IDEs because they tend to stay
withing the format of plain text files.

I'd really like to see some more creative visuals like this for making program
structure visible.

~~~
spiralganglion
I don't know if there's a name for this style of display, but it's the same as
the way JSON is depicted on its official site. <http://www.json.org>

Edit: It's called a Railroad Diagram.

~~~
plq
No, it's called a finite state machine ;)

~~~
spiralganglion
The FSM is the model, the Railroad Diagram is the view, to borrow terms.

~~~
beala
Well, as long as we're picking nits, an FSM accepts only regular languages,
while a rail road diagram accepts context-free languages[1]. So a railroad
diagram isn't really the "view" for an FSM. A state-diagram or state-
transition table is (although the whole MVC analogy is itself kind of weird.
Where's the controller?).

[1] <http://en.wikipedia.org/wiki/Syntax_diagram>

~~~
modarts
>Where's the controller?

The MV* family of patterns seems to be okay without the notion of a
controller.

------
viggity
looks neat. can you please make the regex textbox use a fixed width font,
reading a regex is already hard enough, let alone when it is in Verdana

------
Yoni1
Pretty! I had wanted to make something like this, but one feature I thought of
and you didn't do: For the entered regexp, generate 20 random matches and show
them to me.

Use case for this feature: You encounter an undocumented regexp in someone
else's code and want get a few quick examples of what it matches.

~~~
jkestner
Since I'm usually starting with a bunch of data I want to parse/match/chop, I
like going the other way - pasting a couple of lines into Patterns (
<http://krillapps.com/patterns/> ) and tweaking my expression until it looks
right. This would be a useful addition to help debug in place.

------
program
I tried the most complicated regular expression in the world:

<http://ex-parrot.com/~pdw/Mail-RFC822-Address.html>

and the server 500'd. Just joking, it's a very good tool.

P.S. it crashed with three nested groups (((a*)))

~~~
robomartin
As an aside. I quit using regex to validate email addresses a while ago. Part
of it is that a regex that large is simply incomprehensible and very difficult
to maintain and fix if something is broken. The best solution, in my opinion,
is a state machine based analyzer that checks the email address character-by-
character and confirms compliance with RFC5322. This would also include
checking DNS for MX records (which might require following CNAME to find it).

~~~
4ad
No, stop validating emails. Not only it's error prone and will frustrate legit
users but it utterly pointless.

~~~
dkokelley
The thing is, email input validation is only one use case for regex and
emails. Here's a more sinister one: you want to build a web scraper looking
for emails to add to your spam list. Put a bit more generically, you need a
script that can import email addresses from a broad and unknown host of
formats, and it is impractical to condition the data beforehand.

I agree that the value derived from email validation for something like a new
account registration is almost nil. Just do something like an email
verification round trip, which not only validates the email _could_ be real,
but also provides assurance that the user has control of the email address
used.

Note: My last name is O'Kelley, which is a bit of a pain when it comes to
poorly designed computer services. Some sites will strip the ' and others will
escape it so that I become Mr. O\'Kelley. The worst I've seen is that my
school used my complete last name in my email address, so most sign up forms
refuse to even try to accept it, even though it is a valid email address. It
can be a pain if I need to use my .edu address for academic discounts or
verification.

~~~
robomartin
Landing page. Visitor accidentally enters invalid email address and clicks
send.

No validation = Gone. You lost them. You can't email them for a correction.

With validation = You catch the issue before the visitor leaves and you ask
them to fix it.

Sure, it doesn't verify the 1 to 1 relationship between the email and the
person. That requires a round-trip verification. I get it. At least you ensure
that it isn't all garbage-in to begin with.

The other aspect of email verification is that you don't have to choose to bug
the user with the results. Depending on what it is, if someone enters an
obviously junky address you can simply tag that email as potential crud in
your database. Someone would then manually look at these every so often for
cleanup or re-categorization.

I don't like the idea of looking signups or customers in a transaction where
both parties are interested in transferring the information accurately. That's
a use-case where validation works well.

Now, regarding your last name. The issue is cause by programmers who simply go
around grabbing code off the internet without vetting it in any way. There are
email "validation" regex expressions out there that are horribly wrong, yet
people post them on blogs and others use them without question. It's
unfortunate.

------
daGrevis
Useful. Thanks to this, found a bug in regex I did just today.

`/[^0-9^\\+]+/` -> `/[^0-9\\+]+/` # Thought that I needed to negate "+" as
well.

~~~
morsch
You typically don't need to escape a + inside a character class, either, so
/[^0-9+]+/ should work.

~~~
daGrevis
Thanks! :)

------
cnlwsu
Be nice to be able to include the regex in the url so I could link it in
documentation or have some sort of rest interface.

------
pyre
Breaks for:

    
    
      ^([^\W]+)@((?:[a-zA-Z0-9-]+\.)+(?:org|com|net|gov))$
    

Though this works:

    
    
      ^([^\W]+)@([a-zA-Z0-9-]+\.)+(?:org|com|net|gov)$
    

And it doesn't seem to be the nested groups that are screwing it up because
this works:

    
    
      ((?:tinker|tailor)+(?:soldier|spy))

~~~
yuchi
This is the minimal breaking input:

    
    
        ^(([a-z]+)+)$

------
alexchamberlain
Truly brilliant. Please add some examples, so I don't have to find some to
impress people with.

------
lysium
Nice! Small glitch: shows the visualization twice if enter is pressed twice
quick enough.

------
sixbrx
That's really neat.

My only suggestion is that it would be nice to have each regex diagram be
separately addressable, by either putting the regex as a query param and
yielding a raw image/svg/whatever, or by using a shortened url like a gist
after saving.

Very nice work.

------
ricardobeat
The parser is failing for regexps containg the caret or dollar chars:

<http://cl.ly/image/3q0B2X1F3i1H>

<http://cl.ly/image/171f172x0P1M>

~~~
lolindrath
You need to remove the slashes from that example

~~~
ricardobeat
Ha, thanks. Old habits die hard.

------
emiliobumachar
To try it out, I made up the regex "car[pet] cleaner", but what I really meant
was "car(pet)? cleaner". When I visualized it, I instantly noticed the bug.
It's an indicator that this can turn out to be very useful.

------
kolinko
it breaks with this: (?:(?:\r\n)?[ \t]) _(?:(?:(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?: \r\n)?[ \t]) _)(?:\\.(?:(?:\r\n)?[ \t])_
(?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:( ?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n)?[ \t])_ )) _@(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\0 31]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\
](?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+ (?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:
(?:\r\n)?[ \t]) _))_ |(?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z |(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n) ?[ \t])_ ) _\ <(?:(?:\r\n)?[
\t])_(?:@(?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\ r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)
_\\](?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n) ?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[
\t] ) _))_ (?:,@(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[
\t])* )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[
\t]) _))_ ) _:(?:(?:\r\n)?[ \t])_ )?(?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[ \t]))
_"(?:(?:\r \n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?: \r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[ \t
]))_"(?:(?:\r\n)?[ \t]) _))_ @(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031 ]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](
?:(?:\r\n)?[ \t]) _)(?:\\.(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)
_\\](?:(?:\r\n)?[ \t])_ )) _\ >(?:(?:\r\n)?[ \t])_)|(?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)? [
\t])) _"(?:(?:\r\n)?[ \t])_ ) _:(?:(?:\r\n)?[ \t])_
(?:(?:(?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]| \\\\.|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>
@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|" (?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t]) _))_ @(?:(?:\r\n)?[ \t] ) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>@,;:\\\
".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[ \t])
_)(?:\\.(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>@,;:\\\".\\[
\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\\](?:(?:\r\n)?[ \t])_ )) _|(?:[^()
<>@,;:\\\".\\[\\] \000- \031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|( ?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t]) _)_ \<(?:(?:\r\n)?[ \t]) _(?:@(?:[^() <>@,;
:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([
^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[ \t]) _)(?:\\.(?:(?:\r\n)?[ \t])_
(?:[^()<>@,;:\\\" .\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\ ]\r\\\\]|\\\\.)
_\\](?:(?:\r\n)?[ \t])_ )) _(?:,@(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\ [\\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\ r\\\\]|\\\\.)
_\\](?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]
|\\\\.)_\\](?:(?:\r\n)?[ \t]) _))_ ) _:(?:(?:\r\n)?[ \t])_
)?(?:[^()<>@,;:\\\".\\[\\] \0 00-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\ .|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t]) _(?:[^() <>@,
;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t]) _))_ @(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>@,;:\\\".
\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\\](?:(?:\r\n)?[ \t])_
)(?:\\.(?:(?:\r\n)?[ \t]) _(?:[ ^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]
]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[ \t]) _))_ \>(?:(?:\r\n)?[ \t])
_)(?:,\s_ ( ?:(?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\ ".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n)?[ \t])_ )(?:\\.(?:( ?:\r\n)?[ \t]) _(?:[^()
<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\\["()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t ]) _))_ @(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[
\t]) _)(?:\\.(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\\](?:(?:\r\n)?[
\t])_ )) _|(?: [^() <>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\".\\[\ ]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t]) _)_ \<(?:(?:\r\n) ?[ \t]) _(?:@(?:[^()
<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["
()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[ \t])
_)(?:\\.(?:(?:\r\n) ?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>
@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\\](?:(?:\r\n)?[ \t])_ ))
_(?:,@(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@, ;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)
_\\](?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?:\r\n)?[ \t] ) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\["()<>@,;:\\\
".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[ \t]) _))_ )
_:(?:(?:\r\n)?[ \t])_ )? (?:[^()<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\\["()<>@,;:\\\". \\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t])) _"(?:(?:\r\n)?[ \t])_ )(?:\\.(?:(?: \r\n)?[ \t]) _(?:[^()
<>@,;:\\\".\\[\\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\\[
"()<>@,;:\\\".\\[\\]]))|"(?:[^\"\r\\\\]|\\\\.|(?:(?:\r\n)?[
\t]))_"(?:(?:\r\n)?[ \t]) _))_ @(?:(?:\r\n)?[ \t]) _(?:[^() <>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.)_\\](?:(?:\r\n)?[
\t]) _)(?:\ .(?:(?:\r\n)?[ \t])_ (?:[^()<>@,;:\\\".\\[\\]
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\\["()<>@,;:\\\".\\[\\]]))|\\[([^\\[\\]\r\\\\]|\\\\.) _\\](?:(?:\r\n)?[
\t])_ )) _\ >(?:( ?:\r\n)?[ \t])_)) _)?;\s_ )

~~~
godDLL
To anyone that sees this email validation regex, DO NOT USE IT. Hope that was
clear, if not obvious.

Use something like `^([^\s]＊)@([^\s]＊\\.[^\s]＊)$` which will do most of the
work for you, then check second group for common domain typos, and what have
you.

~~~
robomartin
>Use something like `^([^\s]＊)@([^\s]＊\\.[^\s]＊)$` which will do most of the
work for you

I don't understand. How does this expression do anything even remotely close
to email validation?

For example, how does it tell you that:

    
    
        These are valid:
          test@nasa.gov
          ~~~~@nasa.gov
          joe+sometext@nasa.gov
          test@bbc.co.uk
    

and that:

    
    
        These are NOT valid
          test@example.com    (no MX RR)
          test@-nasa.gov
          test"@nasa.gov
          test@nasa.gov-
          test         
          test@nasa.rockets
          test@bbc.co..uk
          test@bbc.com.uk
          test@bbc.co.eu.uk
    

You'd have to write all the validation logic yourself all over again. And
that's just a few examples.

Barring anything else, the RFC822 expression isn't so bad that someone should
replace it with the kind of thing you are suggesting.

Sorry if I don't see it.

~~~
rawb92
Honestly I don't understand why people get all flustered over email
validation, I would probably use something along the lines of this just to
check that the email address is along the lines of name@domain.com, obviously
this could do with a little tweaking.

The best way to validate an email address is to send an email to whatever
address is supplied to you, if it is a true email address the user will
receive an email and it will be validated, if not then their account or query
will go unused/unanswered and that will be down to them.

~~~
robomartin
> Honestly I don't understand why people get all flustered over email
> validation

Multiple reasons, and, yes, context is important.

Landing Page: You have one, and ONLY ONE, opportunity to capture a potential
new customer's contact info. If they make a mistake entering their email and
you didn't catch it you'll loose them forever. You can't send an email to let
them know they entered two periods by mistake, can you? They are gone and you
screwed-up.

Every single potential customer is sacred. Thou shalt not loose them by being
careless.

Forum signup: In general terms, if someone is visiting a forum it probably
means that they want to sign-up. In this case, it is OK to make them enter
their address twice, make sure they match and send them a confirmation email.
They'll probably try to log-on later on and discover something went wrong and
re-register.

While I said "that's OK", I also think it is bad form not to at least do
enough validation of all input data, including email, to catch innocent
mistakes. I think people who are against email validation might have that
position because they don't understand it or gat bitten by a crappy regex
expression and that is that.

Now your forum sign-up user is angry because they have to enter all of their
information again and go through the process one more time. Who knows, they
might make a mistake once again. While I don't have any data to back this up I
would venture to guess that the drop-off rate for making a visitor enter all
of their data multiple times is significant.

Payment Confirmation: Must check as much as you can.

From my vantage point taking ANY action that might loose or annoy a visitor is
simply --to be kind-- programming. There's no excuse for that in my book.

------
DigitalBison
Unfortunately it seems to be broken now, but reAnimator was an awesome regex
visualization tool (<http://osteele.com/tools/reanimator/>). It let you enter
a regex and would display the state machine similar to this, but then it would
also let you enter a string and it would animate the progress of the string
matching (or not) against the state machine as you enter each character.

------
nathell
Nice! Like Edi Weitz's "The Regex Coach", but Web-based. It might be worth
looking at TRC's way of illustration of the regex matching a specific string.

------
hamburglar
Pretty neat, although I have a nitpick with your choice of a block labeled
"None of:" in for example this regex: /foo(.*)[^0-9]+bar/

It ends up saying "none of [0-9]" when that's not actually a good english
description of what should go there. It should say something equivalent to
"one or more things that are not [0-9]". The empty string fits the description
"none of [0-9]" but does not match that part of the regex.

------
mratzloff
Hey, this is really cool, and a great visual language for regular expressions!
Must have been fun to build.

I found a couple of bugs: it doesn't seem to handle `?>` and inline modifiers
(<http://www.regular-expressions.info/modifiers.html>), and with a couple of
expressions it simply gave me a server error with no further information.

------
jgalt212
Doug Crockford would love this tool (despite disliking regexex in general). He
uses a railroad diagram of a regex on page 67 of _Javascript: The Good Parts_
to walk through a regex that validates URLs.

[http://books.google.com/books?id=PXa2bby0oQ0C&pg=PA67...](http://books.google.com/books?id=PXa2bby0oQ0C&pg=PA67&lpg=PA65&ots=HIqon2w7iK&dq=doug+crockford+regex)

------
amenghra
Nice. I wrote something similar a while back (with the goal to show lint
errors): regexp.quaxio.com

Note: if you type [b-a], you fail to say it's an invalid range.

~~~
amenghra
(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)\10 incorrectly gets parsed as ... + back
reference to group 1 + "0".

------
itry
Its beautiful to see the "check if number is a prime" regex as a diagram:

    
    
        ^1?$|^(11+?)\1+$
    

I noticed two glitches:

1) It gives me the same output image for these two different regexes:

    
    
        ^[a-x]*yo$
        ^[a-x]+yo$
    

2) It gives me a server error for this:

    
    
        a(b*(c*(d*)*)*)
    

Edit: I was wrong about glitch 1. See below.

~~~
Falling3
>Gives me the same output image for these two different regexes:

> ^[a-x]* yo$

> ^[a-x]+yo$

No, it does not.

When you do (something)* , pay attention to the additional path that is
created that circumvents the (something) box. With (something)+, there is no
such path.

~~~
itry
Aaaahh! Yes! Awesome!

------
lectrick
Chokes with a "Server error" on the following URL validator:

\b(?:((?:https?|mailto|s?ftp): )(?:\/{1,3} |[a-z0-9%])
|w{2,3}\d{0,3}[.]|[a-z0-9\\.\\-]{1,40}[.][a-z]{2,4}\/)(?: \&[a-z]{2,8}\; |
[\w\\(\\)\\.\/\:\@\\#\?\=\&\\-\\!\~\;\'\\[\\]] | \%[0-9]{2})+

This is after changing ?> to ?: since it doesn't understand the non-
backtracking syntax.

------
jontro
The following valid js regexp gives a server error:
^/([^/]+)[/]?(_((site)?(edit|show|manager))(/|$))?([\s\S]*)

~~~
k3n

      [\s\S]*
    

Doesn't this match every possible character, 0 to infinity times?

~~~
dgreensp
It's an idiom for `.*` when you want `.` to mean any character, including
newlines. `.` never matches a newline in JS regexes.

------
jrajav
+1 for indicating what type of regular expression it is.

Also, this would be a very useful tool in a Formal Automata class.

------
mbq
Fails on non-ASCII characters.

------
amix
Great tool! I also find the title funny (which referes to following quote by
Jamie Zawinski: Some people, when confronted with a problem, think “I know,
I'll use regular expressions.” Now they have two problems.)

------
evolve2k
It would be great if it started with a Regex example in the text box to
instantly see what the tool is capable of, that or include a simple 'show me a
demo' button, which adds a example regex and queries it.

------
the_cat_kittles
very nice! makes me wonder, would it be helpful to have a natural english
style language to express regular expressions in? Since the diagrams say "one
of" and things like that, it makes it so much clearer. Maybe a library that
let you say the same things would help too, since I think the ultra-
hieroglyphics are part of what makes regex so confusing. Nothing very
ambitious, just changing dots and stars and weird parentheticals to english
words so when you read a regular expression, it reads more like the output of
this site, which is nice.

------
athesyn
Definitely bookmarked, this is the easiest method i've found so far that
actually _shows_ the exact rules of regex. It's infinitely more useful than a
book.

------
misleading_name
but it doesn't show what is matched - as in a example matched string - which
is the most important thing

also how do you put in the case insensitive flag?

~~~
Cyranix
I think this would be quite valuable -- provide a second input for sample text
and highlight the path taken. Bonus points for figuring out a way to handle
multiple sample inputs, showing capturing group output, etc.

~~~
pacaro
Extending this idea, a regex (code)coverage tool would be great...

------
manish_gill
Heh, I was actually thinking about creating something that teaches Finite
State Machines using something like this! Very cool. :)

------
danso
OK I'm missing something...if this post wasn't labeled "Beautiful regexp
visualizations" I don't know if I would've poked around for more than a couple
minutes (and I _love_ regexes)...perhaps the opening screen should by default
have one of the more appealing examples? I hadn't known that I had to hit
"Enter" for anything to happen...I still don't know what I'm supposed to be
seeing, though I'll support anything that makes regexes more accessible.

------
fatjokes
Really cool. Any chance you could add an export-as-<some vector graphics
format> option?

~~~
tripzilch
Well the image already is a SVG element. But it's dynamically generated so you
need a slight bit of trickery to grab it, enter this in your JS console:
(tested with Opera and Chrome)

    
    
        document.location.href = 'data:image/svg+xml,'+encodeURIComponent(document.getElementById('paper-container').innerHTML)
    

it'll open just that element as a SVG document, where you can save it with
ctrl-S.

I'd like to have used window.open instead of document.location so it wouldn't
replace the RegExper tab, but that gets caught in my browsers' pop-up blockers
:) Inserting it as a regular link in the original document would've been the
best solution, in a pinch, I am offering this quick hack :)

Also, poking at this puzzle, I just noticed the _badass_ colour green he's
using for the diagrams and header.

------
ankneo
Cool! Though it would be great if you could have a shareable link to the regex
visualized.

------
PaulCapestany
This visualization looked particularly good to me:

\b((regexs)[=]+([AWESOME])(am|i|right|\??))?

------
nathan_long
I wonder if the code behind this uses regexes on the entered regexes...

------
esbwhat
((((a))))

-> server error. Looks like 3 groups within each other are the limit

------
jachwe
nice tool. bookmarked and upvoted. :-) thank you.

Next level would be to turn this around and build a (nice) visual
regexbuilder. The would be f __ __ing nice.

------
gosukiwi
Wow that's awesome.

------
hodgesmr
Requires www.

------
annapowellsmith
Really nice.

------
smilekzs
Can't seem to handle UTF-8. WTF?

------
ilovekhym
this statement triggers server error: ^(\d(\\.(\d(\\.(\d)?)?)?)?)?$

------
martinced
It would be gorgeous if upon clicking enter you'd be redirected to an URL that
you could share with someone.

Even better: make it directly a "tiny" URL.

I realize that then some kind of a database would be needed but it would be
really sweet.

~~~
jmilloy
Or, you know, a good old query string.

------
maglio
bookmarked

------
knowtheory
Neither beautiful nor a visualization really.

For the uninitiated, there's an isomorphic relationship between regular
expressions (not PCREs which are way more complicated) and finite state
automata proving that if you have a regexp you can generate a FSA for it, and
visa versa.

What we have here is a system that generates a graph of the finite state
automaton for any given regular expression.

Neat project, misleading headline.

edit: let me add more constructive comments. What is useful about things like
rubular (<http://rubular.com/> ) is the ability to see how a Regexp behaves
with a given input. Where a finite state chart can be useful is giving users a
better impression of the internal workings of a particular regular expression.
Where rubular can indicate where a match can be found, it would be cool to
have a tool like this if you can show the path a particular input would take
(and possibly fail out on) through a given regular expression.

~~~
RegEx
This type of useless comment is why I tend to avoid HN for periods at a time.
Congrats, you've shown the world you know what "isomorphic" and "FSA" means,
but did you initially bring anything constructive? No. You had to edit in
things _barely_ constructive because your impulse was to tear down rather than
build up.

> Neither beautiful nor a visualization really.

This is the least helpful thing I can imagine. "Here's a beautiful regex
visualizier" "LOL NO IT'S NOT". Geez.

~~~
tripzilch
> This type of useless comment is why I tend to avoid HN for periods at a
> time.

There's useless pedants like this all over the Internet. You should probably
wear a hat.

I'd agree if this were the top two comments without many disagreeing replies
or down-votes.

I see this comment in down-voted grey, somewhere way down the page and assume
a lack of social skills, bad-morning-before-coffee-grumps.

Or simply Internet Asshat Background Radiation, it's everywhere. Legends say
you can trace its echos back to the First Flame from which the Internet was
born.

~~~
RegEx
It wasn't grey when I commented. I wouldn't take the time to get on a soapbox
if the community had already buried it.

