
You can't parse [X]HTML with regex - xparadigm
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
======
raiph
I've always loved that post but the truth is that while you can _not_ parse
[X]HTML with what I'll call a "regular expression", by which I mean the formal
CS definition [1], you can with a suitable "regex", by which I mean what most
folk mean by the term "regex".[2]

[1]
[https://en.wikipedia.org/wiki/Regular_expression#Formal_defi...](https://en.wikipedia.org/wiki/Regular_expression#Formal_definition)

[2] PCRE engines support recursive matching etc. but perhaps the most
illuminating example is regex in Perl 6 such as this JSON grammar (a Perl 6
grammar is a class containing Perl 6 named regexes):
[https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...](https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Grammar.pm)

