
RegEx match open tags except XHTML self-contained tags - kornish
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
======
SimonPStevens
For some context, this is a very old stackoverflow answer. At the time there
was a bit of a meme on SO that any question about regex and HTML would be
answered with "you can't parse HTML with regex" which of course is true, but
was probably applied a bit overzealously as while you can't parse the full
HTML language with a reg ex, you can extract data in limited scenarios or
subsets.

This post was the culmination of that meme.

------
CJefferson
I don't understand the replies on stackoverflow.

The request seems perfectly sensible -- write a regexp to parse a single HTML
tag, without a closing slash. Then all the replies are from smart-alecs trying
to look clever about how regexps can't do generic nested parsing. Am I missing
something?

~~~
kornish
I mean, the replies are all jokes - that's why I submitted it. It's funny.
Clearly the responders aren't trying to be helpful.

~~~
ggggtez
I'd argue they are trying to be helpful. The top post highlights that this
question is asked on stack overflow frequently, and that it begins with
incorrect assumptions about what the technology is capable of doing.

Parsing arbitrary HTML requires a stack, and you don't get that with regex.
It's the wrong tool for the job.

Edit: Additionally, the regex proposed doesn't even come close to a solution.

~~~
CJefferson
The top post is just a nonsense meme, and also completely ignores the
question, which is about finding tags of a particular kind. In many html
parsers, information like if a tag is self-closed isn't easy to find (or may
even not be stored), so they could at least show how to do what the questioner
asks, rather than just mock them.

------
chx
This is so much part of programmer lore that I am wondering: is HN playing
[https://xkcd.com/1053/](https://xkcd.com/1053/) today?

