
Regex You Can Read: Why This Is Different and You Are Not Your Users - savolai
https://medium.com/@savolai/regex-you-can-read-why-this-matters-1414a4409ae5#.6t6qwhifk
======
brudgers
Discussion of the previous article:
[https://news.ycombinator.com/item?id=11700725](https://news.ycombinator.com/item?id=11700725)

My feeling about regex's is that the difficulties lie mostly in building a
mental model of automata and not in syntax. The syntax beyond the ordering of
arguments between the /'s doesn't matter much. People find regular expressions
difficult because they are sophisticated mathematics.

~~~
savolai
The thing is, people don't find regexes difficult for any single reason. It
depends entirely on context, the goals you're trying to achieve, and your
skill level. The whole point about this article is that the real issue is
regex experts making assumptions about novices' experience based on their own
experience.

It depends a lot on what level of learning you are on. My hypothesis is that
beginners are learning the individual commands and stumble on the escaping and
the context. You don't need the mathematics for the mundane everyday simple
matching that goes just a tiny bit beyond replace "a" with "b". The popularity
of the original article seems to support the hypotheses that to the bulk of
novice users (vast majority of whom probably are not HN users) the visual
syntax approach would be helpful.

-OP/author

~~~
brudgers
A few years ago, when VerbalExpressions hit HackerNews I became interested in
regexes [enough so that I did a Racket Port and put in some time on a building
a more complete model]. VerbalExpresions is still popular enough to hit Hacker
News from time to time.

Anyway, I bought the O'Reilly _Mastering Regular Expressions_ and looked at
some StackOverflow questions and bought Ullman's _Introduction to Auomata_ and
took his _Coursera_ course and in general thought about the nature of regex's
[and regular expressions] off and on.

What really highlighted the nature of regular expressions was thinking about
regex generators. They popup from time to time and their purpose is to take a
set of inputs and produce a regex that matches them. The problem is that given
strings [x,y,z] the two best regex's are * and x|y|z. Any system that produces
some other regex is by definition more difficult to understand than regex's
themselves.

Now circling back to novice users, simple code that iterates over strings with
a loop or comprehension is probably going to be less prone to bugs than using
regex's because regex's are so powerful.

It's also going to be simpler to understand because the tools are familiar and
purpose built and the abstractions of the code can be matched to the
abstractions of the business logic. What makes regex's hard is that the
abstractions are mathematical and intuitions about regex's rely on
mathematical intuitions about automata [and in the case of Perl type regex's
not simply finite automata due to backtracking and capture].

Please don't misunderstand me, I think the article is a neat piece of work.
Even though I think that simplifying regex's isn't really possible because of
mathematics, the article may introduce someone to the idea of using them.
Because regex's are immensely powerful, that's a good thing.

Good luck.

~~~
savolai
Thanks for your kind words.

You still don't see the point though. I am _not_ simplifying regexes one
single bit. The regex syntax is quite simply broken. It is fragmented as hell,
and its escaping and context rules stem from a time when all we had was ASCII.
This might have been acceptable in the eighties, yes?

These have _nothing_ to do with whether or not underlying regex theory is
mathematically deep or not. I am not touching the theory. It's simply the
syntax that is broken. You can go exactly as deep with Regex UCR as you can
with traditional regexes. It's the same regex. You just don't have to torture
yourself while doing it.

We need to move forward from the single-eyed view that everything must be done
in text, even if it means wasting countless hours in man hours and loads of
cognitive processing. We need that energy for fruitful work. You are currently
just advocating bureaucracy.

Your looping example is yet another example of a theoretist making enormous
assumptions the context of usage. How do you write a loop in the
search/replace field of a text editor or an IDE? How does proneness to bugs
matter one single bit if you're just using the tool to avoid a bit of manual
work? Which [warning, hypothesis:] seems to be mostly what regexes are
actually used anyway.

~~~
Tiksi
> _and its escaping and context rules stem from a time when all we had was
> ASCII._

My keyboard still only has ~ASCII on it, so having to input anything else is a
huge barrier to entry.

Regex syntax sucks, I'll be the first to agree with that, but it's more of an
annoyance. If I write some regex to match ips(obviously not properly, but
quick rough match):

    
    
      ([0-9]\.){1,3}[0-9]{1,3}
    

My thought process isn't "open paren, open bracket, zero" etc, it's more of
"Ok, I have a 4 groups of 1 to 3 numbers separated by dots, which can be
written as 3 groups followed by a dot followed by a fourth group" and my hands
translate it the rest of the way into text.

Knowing exactly which character to use may have been a hindrance the first few
times I ran into regex, but the real difficulty is in knowing what to match
and how to match it. Providing that info to your computer once you know it is
the easy part, even if the interface is horrible.

It's like driving a car or a bike. You don't think "Ok, time to move my hand
to move the steering wheel to the right and that will turn the wheels which
will then the car to the right", you just need to turn right and your
subconscious takes care of the rest.

Improving the syntax would be like trying to improve how well/safely you drive
by changing the steering wheel. Sure, it might help a little bit, but that's
not really the difficult part of driving well, just an interface to it.

~~~
savolai
Ok. Please actually read my posts, make an honest attempt to understand what I
am actually saying, and come back. If this comment of yours still seems
relevant to in any sense of the word, please do tell me, and I will try to
rephrase or answer any questions you may have. Would that be ok? Thanks.

~~~
Tiksi
Ok. I did read your posts, especially the one I replied to, and it does in
fact seem relevant. However with that level of snark, I'm just gonna drop out
of the conversation here as it's unlikely to be productive. Thanks.

~~~
savolai
Sorry. Perceived snark was completely unintended.

What I have thought to have repeated here starting from the actual blog post,
is that it is completely natural for you to perceive that the escaping is just
a nuisance. You are an expert user who has already overcome the initial pain,
so you can't feel it. That initial pain is exactly what I am trying to help
users overcome.

The pain wasn't a prohibitive barrier for you, but you can't really deduce
anything from this fact. _This is why your perception about your own
experience of using regexes is not very useful to this discussion_ about
whether Regex UCR is useful to its main target group.

The title of the post was You Are Not Your Users, remember? You are an expert
on the tool - that does _not_ mean you are an expert on every use of the tool,
of all of the people on this planet, who would use the tool, if only it wasn't
broken _for them_. It also doesn't mean that only difficulty of the usecases
you find relevant, like matching parentheses, are the only valid criteria for
these users to evaluate whether regexes are useful for them.

You bias, as you mention, seems to make it difficult to perceive this. There's
nothing wrong with that, of course - that's what most of reddit was doing too.

As more and more people are accustomed to modern user experiences, more and
more of even developers will not want to go through stone walls just to get a
simple matching task done. They require quality from their tools. Also, as
many people will never become experts, the initial pain will just repeat on,
and on, and on, and on.

Correction: brudgers and Tiksi - As you may have noticed by now, I did not
notice I was talking to two different people, and my comments reflect this.
Apologies for the confusion.

~~~
brudgers
For what it's worth, I am not an expert on regex's. A few times a year I might
write a very simple regex for something that would otherwise require a lot of
tedious repetition.

They're just another thing I find intellectually interesting and my comments
are based on my struggles to learn them as a non-developer.

