Hacker News new | past | comments | ask | show | jobs | submit login

Please, let's try to keep the jokester replies to Reddit. I'm trying to understand why people who are ignorant of what regexp can do for them are so dead-set against learning regexp.



> Please, let's try to keep the jokester replies to Reddit.

You realize the linked to article is entirely about a joke about regular expressions. I can't imagine a thread where a joke about regular expressions would be more appropriate.


Nobody is against learning regexp. (Except lazy people, perhaps) But a lot of people are against using regexp as a default solution. Here's a pretty common case: see if the string you have contains a delimiter.

First you think, okay, I can use a function like strchr() or index(). It'll immediately return the location of the delimiter. Can't get much simpler or efficient than that!

  $loc = index($_,$delimiter)
But wait. What if my string has quotes or spaces before the beginning of the string or delimiter? I don't want any of that crap. Now I need to write a bunch more parsing code - or, I can use a regex!

  $delimiter = "=";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, looks good. Let's try a couple different delimiters.

  $delimiter = "_";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Hmm... no output at all. Weird. Oh! index() will normally return -1 on failure, but $-[] doesn't get set if the match fails. We forgot to change the delimiter. Ok, try again:

  $delimiter = "_";
  $_ = q| " my key _ something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ah, that's better. Let's try another delimiter.

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  Quantifier follows nothing in regex; marked by <-- HERE in m/^\s*"?\s*(.+?)\s*(+ <-- HERE )/ at -e line 1.
Holy shit, a fatal error? Hmm, it was just a delimiter change.... welp, looks like the regex parser thinks '+' is mart of the match. Need to escape it so it's not interpolated:

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, it's working again. But what if there was no key entered at all - just the delimiter and the rest of the string, like if a filesystem path was entered?

  $delimiter = "/";
  $_ = q|/path_to_a_file.txt|;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Crap. The delimiter is there, but my regex is broken again, because it expected a (.+?) before the delimiter. Time to fix it again:

  $delimiter = "/";
  $_ = q|path_to_a_file.txt|;
  /^\s*"?\s*(.*?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  0
  0
There! Whew. That didn't take too long. Let's just hope nothing else unexpected happens, huh?

index() and rindex() would not have had all these issues - they would have returned a location if the delimiter existed at all, or -1 if it didn't, and wouldn't run into interpolation issues, etc. All of these bugs (AND MORE!) can be solved by just writing a parser, or using a couple index() and rindex() calls, or restricting the format of the string to more rigid rules. But by using regex's, we've doomed ourselves to more unexpected issues in the future.


This is a ludicrously contrived strawman.


Not as ludicrously contrived as an html parser using regexps.


Dude, code however you feel like. I'm not getting into a troll fest about why it's stupid to use regex's for everything.


I have never seen anyone argue for using regexp for "everything", but I see on a very regular basis people arguing they should apparently never be used for anything. Even the simplest of questions on StackOverflow get answered with the ever condescending, "what are you trying to do?" Followed by "now you have two problems." Followed by "use a parser." Followed by silence on how that specifically applies.

I called it a ludicrously contrived strawman because your proposed remedy to "just write a parser" is not any simpler of a task than the one you mocked up for regexp. There are still plenty of bugs you get to write and miss for several hours when you write any nontrivial software.


1. Nobody on HN is saying to never use a regexp 2. It's not a ludicrous example 3. A parser is not a ludicrous way to solve the above problem 4. It's not a straw man because it's not an irrelevant argument set up to be defeated, it is specifically an example of how EITHER using some simple functions OR a parser would be less problematic in practice than the gradual bit rot of erroneous use of the extremely powerful and unnecessary regular expression 5. The software becomes nontrivial when you complicate it with regular expressions 6. Where are these examples of people telling you never to use regular expressions 7. How is it you run into this on a very regular basis 8. If it's a simple question it probably has a simple answer and regular expressions are not simple as my example has shown 9. There's a reason this phrase is a truism and it doesn't need a mathematical proof to be accepted as a truism 10. Code however you want dude, it doesn't matter what a bunch of people on StackOverflow or any other website say except that 11. If a lot of people keep saying the same thing, there might, just might, be some merit to it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: