

Why you can't parse HTML with regex - happy_wanderer
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

======
kazinator
> _HTML is not a regular language and hence cannot be parsed by regular
> expressions._

But, but! In StackOverflow, "regex" doesn't mean "specification for a finite
state automaton which recognizes whether strings belong to a given regular
language", but rather "a character matching tool implemented in whatever
language or its library that a question is about".

Some so-called regex engines will match nested tags.

The question has a problem because it makes no mention of what concrete tools
OP is using to solve a concrete problem.

------
wodenokoto
The second answer is correct, not the top answer. If you have some knowledge
about the form of your HTML, then regex might very well be the best tool for
the job.

I've spend hours trying to get something to work in BeautifulSoup that could
have easily been done in minutes with regex and a few lines of python.

