
How "junior" developers can become regex wizards - joshuakemp1
http://joshuakemp.blogspot.com/2013/11/how-junior-developers-can-become-regex.html
======
tghw
A couple things:

1\. Be careful what you use regex's for. Email addresses are very
difficult[0]. HTML is impossible[1].

2\. There are a number of tools that make it easier to understand, including
[2].

[0] [http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html](http://www.ex-
parrot.com/pdw/Mail-RFC822-Address.html)

[1]
[http://stackoverflow.com/a/1732454/2363](http://stackoverflow.com/a/1732454/2363)

[2]
[http://ivanzuzak.info/noam/webapps/fsm_simulator/](http://ivanzuzak.info/noam/webapps/fsm_simulator/)

~~~
thedufer
> HTML is impossible.

This comes up a lot. Most languages' "regular expressions" aren't, in fact,
regular. A true regular expression wouldn't be able to match HTML, but Perl
regular expressions (the de facto standard) can because of backreferences.

Edit: I'm not saying this is a good idea; it most certainly isn't. I'm just
saying its possible.

~~~
ygra
Perl's regex support can match HTML because of recursive matching, not only
because of backreferences (the latter of which is widely implemented, the
former not quite so).

That being said, there surely are some fun languages that can be matched by
what's commonly called regular expressions. Notepad++ was notable (before
switching to PCRE) that its "regular expressions" could not even match every
regular (or even finite) language
([http://stackoverflow.com/a/4815422/73070](http://stackoverflow.com/a/4815422/73070)).
Many regex engines allow matching languages that are context-sensitive, while
at the same time not accepting all context-free languages.

------
wonnage
This might be overkill but I found I never "got" regular expressions until a
class made me think about them as state machines. The additional bashing over
the head of having to implement a parser/matcher made it really stick. The
quirks and syntax make much more sense when you know why and how a regex
engine works.

That said, anything involving extended/perl regex I wind up googling.

~~~
sliverstorm
Huh, to me I "got" regexp the first time I used it (although it took a while
to learn the details). To me if you understand the idea of wildcards, you
understand regexp.

~~~
omnibrain
I began to understand Regex after reading the awk chapter in Masterminds of
Programming. After that I understood how "the machine" inside might work. I
really understood how to apply Regex to single strings after I started to see
a regex as a "mask", so very similar to your wildcard approach.

------
pjscott
How to become a regular expression wizard:

1\. Write a bunch of regular expressions.

2\. Fix them when they break.

Anybody can do this, however junior they may be. (And yes, it does grant you a
superpower.)

~~~
chaz
3\. Write tests.

Don't change your existing regular expressions without tests, or Bad Stuff
happens.

~~~
mtdewcmu
I don't usually write regular expressions that are complex enough to need
maintenance. If you are writing one that is enough effort that it isn't
disposable, then you might want to reconsider whether you're using the right
tool.

------
dsymonds
[http://swtch.com/~rsc/regexp/regexp1.html](http://swtch.com/~rsc/regexp/regexp1.html)
should be required reading before _any_ developer tries to become a "regex
wizard".

------
colbyolson
Is it worth mastering? No. Worth understanding? Yes.

Regex is used in so many applications and commands, it would be silly not to
learn it.

You don't need to be a wizard, but do understand the basics and it will get
you far.

~~~
sliverstorm
Somewhere in between "the basics" and "wizard" is probably best. Regexp is
_powerful_ , extremely valuable in the right places, and knowing more than the
basics will be useful. On the other hand wizard-level expertise is not
necessary to net 99% of the value of regexp.

------
erichurkman
Also, check for your language's options for white space and comments within
regular expressions [0]. Regexps don't have to be blobs of characters -- you
can use white space to make them more readable and use comments embedded
within a multi-line regexp to describe what/why you are doing.

Bonus: it makes them easier to diff, too!

We don't write our code on one line with no comments, writing regexps should
be no different.

[0] Python example:
[http://docs.python.org/2/library/re.html#re.VERBOSE](http://docs.python.org/2/library/re.html#re.VERBOSE)

------
pekk
Making "junior" developers solve problems with regex sounds like a recipe for
terrible maintainability, unless it is necessary

~~~
rallison
I do actually somewhat agree here. If one doesn't know what they are doing, it
is very easy to do regular expressions incorrectly.

That said, regex is sometimes necessary, so it is important for developers to
be competent in this realm. In my opinion, ideally, junior developers would
start with using them in non-production environments to become familiar, then
go from there. It is also important to be able to distinguish which problems
should be solved by regular expressions, and which shouldn't. A good mentor
here can be great.

------
DonGateley
If you really want to master Regular Expressions in a way you are not likely
to forget do what I did in the '80s. Write a regex interpreter and then a
compiler. I was doing a contract for an embedded controller and found a great
use for compiled regular expressions within it (I remember every nuance of
regular expressions but can't for the life of me remember what my use case
was.)

Debugging it was the really fun part but my earlier career with IBM had taught
me how to test software effectively and some embedded work with PAL's (an
early form of programmable logic) taught me how to really use state machines.
I was surprised how little time it took to write and debug. I don't think you
can really appreciate the elegant logic of the regular expression language
without implementing it.

------
ygra
I found [http://www.regular-expressions.info/](http://www.regular-
expressions.info/) to be an invaluable resource when learning or understanding
regular expressions. Basically every page that explains a feature in the
reference _also_ explains what's going on within the engine, how it works,
when it backtracks, etc. Those things are sometimes hard to see in
applications that just give you the matches from a text (as rubular.com seems
to do).

------
rmrfrmrf
You don't need to be a developer _or_ a wizard to be proficient in writing
regular expressions. Regular expressions are used by all kinds of professions;
that's why regular expression capability is included in almost every text
editor.

The best way to learn to love regular expressions is to use them outside of a
programming context, where you can get real-time feedback with actual test
data. Some text editors will even highlight matches as you type the expression
out.

------
duey
During University one of the projects we were given was to write a regular
expression parser and evaluator. From the moment that project was complete I
have never had trouble understanding regular expressions. I thought it was an
excellent way to learn them.

------
webmonkeyuk
I'm torn a little on my opinion of this post. On the one side I applaud anyone
who's willing to self-learn and isn't a "I'll just Google it" developer. On
the flip side I tire of hearing people tell me that they've written a CMS of
their own, in some cases piggy backing on someone who's solved the an
identical problem pays dividends and will be more mature than your solution.

Perhaps for a self-learner a nice approach is to write it yourself and then go
searching for a solution. That way you'll likely: validate what you did,
figure out that you missed a detail or perhaps learn some new regexp foo or
different way of writing the same thing.

------
gavinlynch
I love these types of things. Kudos to the author.

This one in particular reminds me of a tool I've always found useful. It's an
interactive Regex builder just like the one linked in the OP. I would say it's
got some additional compelling features: like mouse-over breakdowns of each
expression as you build it, a handy reference list as well, but also a
community concept and saved expressions. Really, I've never found anything
better. My only complaint is that it's flash-based, but it's an amazing tool,
so can't really complain too much.

[http://gskinner.com/RegExr/](http://gskinner.com/RegExr/)

------
mtdewcmu
Regular expressions were simple and straightforward tools until Perl. They
were originally intended to be equivalent to finite state machines, but,
thanks to Perl, you can write regexes that may or may not halt, and no one can
can ever prove one way or the other. If you want to take the best parts of
regexes and leave the rest, don't bother with all of the Perl extensions and
just study the basics.

------
adestefan
Go read Mastering Regular Expressions.

~~~
tmallen
This is the bible of regex. Its explanation of "unrolling the loop" will
change how you write regular expressions if you don't already use that
technique. Its discussion of the operation of NFA and DFA engines is great
too.

------
gaving
Can't remember the last time I had to actually hand craft a regex.

Any serious developer these days will be using standardised libraries for this
sort of validation and not reinventing the wheel with some half baked do-it-
yourself regex.

------
jheriko
ugh.

do they really need to be good at regex? we don't all work on the internet you
know... regex is basically pointless for most application development. most
programmers i consider to be exceptionally talented can not write a regex
without reference (although they will do it when necessary by using reference
- and very well too).

on the other hand the general approach to problem solving advocated here is
quite sound. "find good tools" "don't rush pointlessly" "measure don't guess"
"google is good, copy-paste blindly is bad"

~~~
UK-AL
RegEx probably has most use in data validation, which has nothing to do with
internet.

~~~
jheriko
i generally backlash against this. i've seen articles where people are stupid
enough to say "if you don't know regex, you aren't good" my experience tells
me the converse is true. good programmers are so good they end up working in
environments where regex is an irrelevant curiosity and almost never a tool.

------
PavlovsCat
also:

[http://www.rexv.org](http://www.rexv.org)

[http://www.debuggex.com](http://www.debuggex.com)

[http://refiddle.com](http://refiddle.com)

~~~
itsybitsycoder
I like [http://gskinner.com/RegExr/](http://gskinner.com/RegExr/)

------
thomasknowles
You don't need to be a developer to understand or use regex. I use regex on a
daily basis as I have to deal with a lot of text manipulation.

\--edit

I am not a developer.

------
alayne
Please don't use regexes.

~~~
kbrowne
Ever? For any purpose whatsoever? That seems overly prescriptive.

~~~
alayne
Sure, for scripts, adhoc analysis, command line hacking and whatnot, go ahead.

Regexes should be seen more like a last resort than a good software
engineering choice. They're a code smell. I have seen so many incorrect, slow
regexes from people who don't know what they are doing that I have to
recommend as a best practice that you don't use them unless you are going to
study automata, read Friedl's book, the Dragon book, and study with Tibetan
regex monks for years, and mentor everyone who has to maintain your code until
you die.

------
yOutely
a regex tester online?!??!? what "hacker news"!!! This has never been done!

