RegEx is powerful, yes. But HTML (even though it's a standard) can be buggy, inaccurate, and difficult to expect. Some people, encouraged by older books that support the method, don't close things like `<p>` tags. Others omit the self-closing / on certain tags because they can (i.e. hr, br, img).
So can you use regular expressions to parse a known subset of expected HTML? Probably. Could use use it to parse arbitrary, unfiltered, potentially broken HTML? No. And you wouldn't expect to use RegEx to parse any other broken text document that fails to follow a defined, knowable schema either.
And you wouldn't expect to use RegEx to parse
any other broken text document that fails to
follow a defined, knowable schema either.
If you want to extract information from a text document without a defined and respected format, a RegEx will often do better than anything else common.
RegEx is powerful, yes. But HTML (even though it's a standard) can be buggy, inaccurate, and difficult to expect. Some people, encouraged by older books that support the method, don't close things like `<p>` tags. Others omit the self-closing / on certain tags because they can (i.e. hr, br, img).
So can you use regular expressions to parse a known subset of expected HTML? Probably. Could use use it to parse arbitrary, unfiltered, potentially broken HTML? No. And you wouldn't expect to use RegEx to parse any other broken text document that fails to follow a defined, knowable schema either.