

Common Regular Expressions Made Simple - madisonmay
https://github.com/madisonmay/CommonRegex

======
brudgers
I am reminded of VerbalExpressions
[https://github.com/VerbalExpressions](https://github.com/VerbalExpressions)
which hit HN in early August. Interested in learning more about regex's I did
some work on a Racket port.

That project shows there are two tracks by which to tackle the problem that
regex syntax is incoherent gobbledygook. The first is creating a regex version
of pho with 5000 functions and mix and match names piece by piece. The second
is to rename regex symbols to something that is easier for humans to parse.

~~~
yconst
Very good point! It's not the reasoning and logic behind regex that would turn
a beginner away from it, its the syntax that is being used that often results
into expressions turning into impenetrable goo!

(thanks for the verbal expressions link btw, seems really promising)

------
JadeNB
Unless I'm not understanding my way around the project, this seems to be a
very small library of regular expressions. As one would expect, Perl's
offerings blow anything else out of the water:
[http://search.cpan.org/~abigail/Regexp-
Common-2013031301/lib...](http://search.cpan.org/~abigail/Regexp-
Common-2013031301/lib/Regexp/Common.pm).

------
JazCE
I'm gonna be _That Guy_ and say that the e-mail regex isn't upto scratch and
probably should not be included.

generally, very good.

~~~
abolibibelot
I'm gonna be That Other Guy and say that the date and phone regex are
respectively english-language and US specific. So it's common for a narrow
definition of common.

~~~
phorese
The time regex is, too. In German you can expect to encounter the text
fragment

> um 6:00 am 05.12. (at 6:00 on 12/05/..)

If i read it correctly, the time regex would extract "6:00 am" as time, but
the "am" is wrong (German uses 24h format).

~~~
jrabone
Haha that's excellent. Reminds me of a normalisation rule I wrote as part of a
larger system to convert "Joe Bloggs Md." into "Dr. Joe Blogs MD" (where MD is
Medical Doctor). TIL that "Md." is a common abbreviation for "Mohammed" in
large parts of the world...

~~~
petepete
The text-to-speech system in use at my local GP's surgery (that announces to
patients which which rooms they need to go to) pronounces 'Dr' as 'Drive',
rather than 'Doctor'. I thought someone would have tested that!

------
camus2
Isnt there an i18n extension built on lib ICU in python , like Intl extension
in PHP ?

[http://site.icu-project.org/](http://site.icu-project.org/)

I'm sure there are better alternatives than just regexp to validate numbers,
emails ,etc ...

~~~
ygra
It's not validation in this case, it's finding them in a body of text. For
validation you generally have it easier because you often only deal with a
single datum in a field (and thus string).

~~~
masklinn
The CLDR provides locale-specific formats for some datum types (IIRC numbers,
dates, and durations). These formats can probably be reversed into the
corresponding regular expression in order to perform locale-aware data
detection.

Of course further complexity is added in people being able to provide partial
and context-dependent information, which is much harder to detect e.g.
"December 3rd"

