

Is it a must for every programmer to learn regular expressions? - riyadparvez
http://programmers.stackexchange.com/questions/133968/is-it-a-must-for-every-programmer-to-learn-regular-expressions

======
bermanoid
Yes, it is, without a doubt. It's one of the most universal tricks of the
trade that you'll literally never regret learning, mainly because just about
any environment you'll ever work in will by necessity support regexes, and in
many, it will be the primary way you interact with text.

But it's also a must that you realize that despite the fact that they're
exceptionally useful and widely supported, regexes are a disgusting
abomination, one that we should be absolutely mortified to be associated with.
It's one of the worst syntaxes to ever be invented, and every one of us should
feel the cold stink of UX failure wash over us every time we write a regex. If
we ever catch ourselves writing a DSL that in any way, shape, or form
resembles regular expressions, we should stop _immediately_ and ask what the
fuck is wrong with us and why we're being so opaque and random. Regular
expressions are quite literally one of the worst syntaxes to ever be
introduced in our field.

I worry a lot when someone doesn't know regular expressions at all. But I
worry far more when someone thinks they're beautiful. That person has far too
high a tolerance for unintuitive syntax and code, and will cause vastly more
damage to my codebase than even the rank amateur that still uses "goto" on a
regular basis.

Which is not to say that we shouldn't lean on regexes heavily anyways when
they're appropriate - as programmers our primary job description is that we're
paid good money to work with shitty interfaces in order to express simple
ideas and algorithms.

~~~
padolsey
How would you design a regular expression syntax more intuitively? Personally,
I find beauty and simplicity in regular expressions. Sure, they can grow to
hideous atrocities, but you can achieve such disastrous feats with any
language/syntax. Maybe you could back up your claim of regexes being a
disgusting abomination with, at the very least, anecdotal evidence.

~~~
yen223
A typical regex looks like this:

    
    
      \b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
    

Which is also what happens when a cat walks across the keyboard.

~~~
Auguste
I find that perfectly readable, except for the \b which I hadn't seen before.
It's matching an all-uppercase email address.

~~~
subsystem
There's plenty of upper case e-mail addresses that won't match that
expression.

~~~
InclinedPlane
/i

~~~
subsystem
There's characters missing[1] and the tld is too short[2]. And even if that's
fixed, we still don't match internationalized addresses or actually validate
that the e-mail address exists. You're probably better off with something
like...

    
    
       if "@" in email and "." in email.split("@")[1]:
           send_verification(email)
    

...but you should probably also check for common misspellings like "gmial.com"
etc.

[1] <http://en.wikipedia.org/wiki/Email_address#Syntax> [2]
[http://en.wikipedia.org/wiki/List_of_Internet_top-
level_doma...](http://en.wikipedia.org/wiki/List_of_Internet_top-
level_domains)

------
peter_l_downs
I really liked eykanal's answer [1].

    
    
        > Regular expressions are a tool. It happens to be a
        > very useful tool, so many people choose to learn how
        > to use it. However, there's no "requirement" for 
        > you to learn how to use this particular tool, any
        > more than there is a "requirement" for you to learn
        > anything else.
    

Nails it. I do think most programmers will eventually run across a problem to
which the solution is 'Use Regex', but it's not an absolute "must" like
boolean logic.

[1] [http://programmers.stackexchange.com/questions/133968/is-
it-...](http://programmers.stackexchange.com/questions/133968/is-it-a-must-
for-every-programmer-to-learn-regular-expressions#answer-133970)

~~~
stinos
good point indeed. I spent my first years programming DSP algorythms on
embedded systems. Hardly any strings used, let alone there was any need for
regular expressions. Learning their ins and outs back then would have been a
waste of time, and would have been like a bricklayer having a fork in his
toolbag.

------
wwweston
Probably worth revisiting this bit of commentary from Rob Pike:

"Regular expressions are hard to write, hard to write well, and can be
expensive relative to other technologies... Standard lexing and parsing
techniques are so easy to write, so general, and so adaptable there's no
reason to use regular expressions.

"Another way to look at it is that lexers and parsing are matching statically-
defined patterns, but regular expressions' strength is that they provide a way
to express patterns dynamically. They're great in text editors and search
tools, but when you know at compile time whatall the things are you're looking
for, regular expressions bring far more generality and flexibility than you
need.

"Encouraging regular expressions as a panacea for all text processing problems
is not only lazy and poor engineering, it also reinforces their use by people
who shouldn't be using them at all."

[http://commandcenter.blogspot.com/2011/08/regular-
expression...](http://commandcenter.blogspot.com/2011/08/regular-expressions-
in-lexing-and.html)

<http://news.ycombinator.com/item?id=2915137>

Personally, I think they're pretty darn useful and too powerful to _not_
learn, but Pike's comment makes me think that maybe they're also a crutch that
I've relied on too much rather than learning enough about lexing/parsing.

------
anatoly
Yes. Next question?

------
ambirex
From time to time I re-read Larry Wall's "Apocalypse 5" -
<http://www.perl.com/pub/2002/06/04/apo5.html?page=6> It is still in
interesting look into why regexs are the way they are.

Also if you are familiar with this quote:

    
    
      "Some people, when confronted with a problem, think 
      'I know, I'll use regular expressions.'   Now they have two problems."
    

Read Jeffrey Friedl's research into the quote -
<http://regex.info/blog/2006-09-15/247>

------
xyzzyz
While knowledge how to use regular expression is invaluable, I also recommend
learning how they actually work under the hood. It really gives a good lesson
when regular expressions are applicable, and when they're not. From my
experience, while many programmers are apt in tools like regexps or
grammar->parser generators, they very rarely know how it actually works, which
results in people trying to parse HTML with regexps or similar things. It is
also a good starting point to some very interesting theoretical stuff like the
theory of computations.

~~~
InclinedPlane
Also, you see people doing simplistic string comparisons using regexes. Which
is ok sometimes but is an easy target if your system has performance issues.

------
johnwatson11218
I find one the hardest aspects of using a regex is all the subtle differences
between languages/environments. I use them about four times a year but I feel
like I have go read a mini tutorial each time. I know the concepts but I can't
remember the special char for whitespace or digits in the particular language
I'm working in. Also the java implementation is so bad. Having to encode it as
a string with its own escaping rules is not easy. Then the 3 line api usage is
a real hassle.

------
yen223
The ability to parse text quickly is invaluable when writing code.

I would say it's a must to take your coding skills to the next level.

------
acqq
There's no "a must." But I don't consider a programmer to be a good one if he
doesn't know regexps.

If you ask "do you personally really need regexps" I'll tell you: don't learn
them. As you're asking that question at all, I understand that you're not
interested to learn and that you are looking for an excuse not to learn, so do
something that interests you.

------
richardw
I regularly forget regex intricacies, so now have an app called "regexbuddy"
for Windows. Very very useful, has a killer help file, has the ability to
adjust and test regexes for many languages.

Surprisingly, here: <http://www.regexbuddy.com/>

The help file alone will make you want to buy it.

------
scott_w
I think they're a valuable as a simple tool e.g. using :s/^#// in vim.

However, I try to avoid using them in my code unless they improve readability.
Using re.VERBOSE can help in Python.

If you find a regex online, you should definitely reference it in your code,
to help provide background understanding, such as validating a UK Post Code.

------
aidos
I taught my friend how to use regular expressions for search and replace. He's
found them invaluable and he's NOT a developer. He uses then to clean up and
filter lists of keywords from Adwords.

I reach for refex frequently but almost never as something I add to the code
I'm writing. They're just amazing for filtering and bulk editing text.

------
einhverfr
I was thinking about this question and it occurred to me. The programmers who
probably don't need to know regexes[1] almost certainly already learned them.
So I guess that's a yes.

[1] Thinking of embedded systems developers creating code for, say, automotive
entertainment systems, or control code for scientific or medical hardware.

~~~
DavidSJ
_[1] Thinking of embedded systems developers creating code for, say,
automotive entertainment systems, or control code for scientific or medical
hardware._

Don't they have log files to read?

~~~
einhverfr
That's what grep is for ;-)

But the point is I can't imagine someone getting to that point without
previously having learned it.

~~~
DavidSJ
Err, grep uses regular expressions.

~~~
einhverfr
You don't need to know regular expressions to use grep for reviewing log files
though.

------
Auguste
I think every programmer should have an understanding of what Regular
Expressions are, how they can be used, and where to find a cheat sheet or
reference (like www.regular-expressions.info).

It helps to know the basic syntax by memory, but you can just as easily look
it up if you understand how they work.

------
lparry
I think every programmer should know them, but should try to avoid using them
unless they really are the best solution to your problem. Too many people
reach for them too soon, leaving hard to understand and hard to maintain code
that could have been better expressed some other way.

------
Axsuul
Yes! They're not that hard and there's only so much syntax to them (vs. an
actual programming language). I essentially learned from this site by just
entering in random strings and trying to match them. <http://rubular.com/>

------
ExpiredLink
Regex is a good example for a bad interface. When usually half of the input
needs to be escaped something must have gone fundamentally wrong. I regularly
forget regular expressions and look them up again when I really need them.

------
mootothemax
I think it's essential to know of their existence, but not necessarily to know
their syntax inside-out. Basically, enough enough that you're not writing
looping and parsing code yourself, using a regex where applicable.

------
subsystem
The problem is that regexp is often misused as things like html parsers and
e-mail validators. Learning regexp syntax without learning when and how to use
it, makes you a worse programmer not a better one.

------
brunnsbe
At least is good to know what can be solved with regular expressions, then if
you need to solve something you can always look it up in a book or use some
software for it (I use RegexBuddy).

------
spaghetti
Only if you're doing a lot of string parsing. Just like learning about
algorithms on graphs is a must only if you're working with graphs of a non-
trivial size.

~~~
pif
I don't agree. Regular expressions come in handy as well during your day to
day work: you can save a lot of time if you know how to use them in your
editor "search & replace" function, or just when a simple search in a
directory gives you too many results and you need to switch to power tools
(read: egrep).

------
smegel
Learn the 20% that you will use 80% of the time. If your not willing to do
that your probably not serious about programming in the first place.

~~~
michaelcampbell
Amen to that. For reasons I can only guess at, I'm the "regex guy" at work.
When people have a regex question they come to me. And the interesting thing
is that the answers are not hard, well within the 20% area. I've found that
most people don't care to learn how to use character classes, or even what
they're for. For example, to match the single letter 'i', I've seen people
write:

    
    
        [i]
    

Many don't know the difference nor implications of .* ? vs .*

And on and on. It amazes me that these people actually code for a living.

~~~
itmag
Learning about the nongreedy modifier is IMHO a must when it comes to regexps.
It makes them much more convenient to write.

------
DannoHung
If you wanna do it for serious. At the very least you really ought to work how
grammars work (which are strictly more powerful than regexes).

------
malsme
They are useful but non-intuitive, so I imagine most people relearn them each
time they meet - unless you are stuck doing it all the time.

~~~
einhverfr
I wonder if Perl programmers are resented for our regex knowledge ;-)

------
danso
I regularly tell non programmers that it is the most useful concept that they
can learn (if they deal with data)

------
InclinedPlane
You can get by as a programmer without knowing regexes. You can also get by
without knowing about pointers. Heck, maybe you can even get by without
knowing SQL or OOP.

But you will be limited. You will run into a lot of situations where your
ignorance will create friction and limitations. Something as simple as editing
an nginx configuration file, for example.

You _should_ learn regexes, you should be able to use a regex as necessary and
be able to understand regexes you come across, you should understand when they
are most useful, and when they are not. You should learn their limits and
_your_ limits in using them as well. When used appropriately they can be a
potent addition to your technical knowledge. When they should be used but are
avoided the result is typically a massive inflation of the effort to solve a
simple problem. And when used incorrectly they can result in an inflation of
intractable complexity (much like any technology).

