
Show HN: I wrote a book on Python regular expressions - asicsp
My book titled &quot;Python re(gex)?&quot; is free to download through this weekend [1][2]<p>The book covers both &#x27;re&#x27; and &#x27;regex&#x27; modules, has plenty of examples and chapters also have cheatsheets and exercises.<p>Code snippets, exercises, sample chapters, etc are available on GitHub repo [3]<p>I used pandoc+xelatex [4] to generate the pdf.<p>[1] https:&#x2F;&#x2F;gumroad.com&#x2F;l&#x2F;py_regex<p>[2] https:&#x2F;&#x2F;leanpub.com&#x2F;py_regex<p>[3] https:&#x2F;&#x2F;github.com&#x2F;learnbyexample&#x2F;py_regular_expressions<p>[4] https:&#x2F;&#x2F;learnbyexample.github.io&#x2F;tutorial&#x2F;ebook-generation&#x2F;customizing-pandoc&#x2F;
======
Maro
You might like:

[https://github.com/mtrencseni/rxe](https://github.com/mtrencseni/rxe)

So you can write:

    
    
      username = rxe.one_or_more(rxe.set([rxe.alphanumeric(), '.', '%', '+', '-']))
      domain = rxe.one_or_more(rxe.set([rxe.alphanumeric(), '.', '-']))
      tld = rxe.at_least_at_most(2, 6, rxe.set([rxe.range('a', 'z'), rxe.range('A', 'Z')]))
      email = (rxe
        .exactly(username)
        .literal('@')
        .exactly(domain)
        .literal('.')
        .exactly(tld)
      )

~~~
sweeneyrod
This seems cool, but I think that's mostly because the regex you think of as a
point of comparison is something like

    
    
        r'^[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,6}$'
    

which does look awful. But there's no reason you can't do

    
    
        username = r'[\w.%+-]+'
        domain = r'[\w.-]+'
        tld = r'[a-zA-Z]{2,6}'
        email = username + r'@' + domain + r'\.' + tld
    

which is arguably easier to read than the rxe version for someone familiar
with regex.

~~~
jazzyjackson
wow I didn't know you could concatenate regexes, thanks

~~~
gnulinux
Why not? They're just strings until they're compiled. "Code is data"

~~~
jazzyjackson
Well I suppose the r in front of them in python made them look just enough
"not a normal string" for me to forget how the + operator might act.

My day-to-day experience is in nodejs, where adding one regex object to
another coerces them to normal strings first

[edit: hey neat, the hackernews form strips emoji from comments, I wonder if
that's just ranges of unicode or if there's some crazy regex going on :D]

~~~
baudehlo
In Javascript you just need to wrap in `new RegExp(r1 + r2 + r3)`.

------
natpalmer1776
I personally learned RegEx with PowerShell for a work initiative that required
heavy usage of it. Most of the documentation regarding RegEx was pretty
language agnostic, so it's interesting when I run across guides that are
specific to a particular language.

Is there anything about Regular Expressions in Python that creates a unique
need for it's own domain specific guide?

~~~
just_myles
None that I know of. My thought process has always been that regex was
language agnostic. I think Perl has its own version that is widely used. But
other than that, I do not know.

~~~
Lordarminius
> My thought process has always been that regex was language agnostic.

+1

I was surprised to find that regex in python was not much different from the
language I use ruby. Would anyone with sufficient knowledge care to eli15 why
this is and how it is implemented ?

~~~
cutler
Ruby's regex engine is Onigmo which is very similar to Perl 5.10. Perl was one
of Ruby's main influences, along with Smalltalk and Lisp, whereas Python's
BDFL was never a fan of Perl. However, Perl's implementation of regular
expressions (PCRE) has been widely adopted hence the similarity you refer to.
If you compare with Javascript and PHP you'll find they're all similar.

------
danso
Thanks for sharing! I only skimmed it but I liked how you include usage of the
external regex module, which I hadn't realized allowed for the use of
variable-length look-behinds.

~~~
ErikCorry
The engine has support but the language doesn't expose it?

~~~
danso
I don't think I understand your question (nor am I an expert on Python
regex!)...but just to be clear, Python's regular expression standard library
is named `re`. But there is an external lib – ostensibly a drop-in replacement
– that goes by the name of `regex`. It is the `regex` library that supports
variable lookbehind, _not_ Python's standard library `re`

[https://pypi.org/project/regex/](https://pypi.org/project/regex/)

------
spazzy81
This is super helpful! I've always felt Python regex was one of the more
complicated parts of the language.

------
nathanbarry
Congrats on writing and publishing this book!

------
smitshah0014
Thank you very much! This is really helpful.

------
dhairya
This is a great resource. Thank you for taking the time to write this and
making it available to us!

------
AlexanderDhoore
There two kinds of text search/parse problems:

\- The really easy ones. A simple string search/split will do and a regex
would be overkill. \- The really hard ones. You'll need to fully parse this
and using a regex will result in fragile/hard to understand code.

Please don't use regexes in production software. Learn how to write simple
parsing code.

~~~
madelyn
Can you elaborate a bit on this please? I'd be interested in resources on
writing better parsers!

Quickly looking at the python standard lib (urlparse, shlex, etc) and Python
packages (NLTK Treebank tokenizer), a lot of packages related to slicing,
dicing and parsing strings use a mashup of regex and rule based code.

~~~
cben
No experience with them in python, but look also for PEG grammars. They are
significantly simpler than the traditional tower of lexer + ambiguous limited
lookahead grammar.

Plus the memoized Packrat algorithm allows throwing in functions with custom
conditions. (Somewhat like parser combinators, they also support custom
logic.)

