I think the worst thing is to parse HTML with regexes.
Had research in past related to this. The trick is that big amount of websites have broken HTML, what brings unexpected results when parsing with regexes.
Entire internet is a bit broken and it's interesting that ALL browsers do more than usual work, outside of RFCs to "fix it" and bring content to user without issues.
Had research in past related to this. The trick is that big amount of websites have broken HTML, what brings unexpected results when parsing with regexes.
Entire internet is a bit broken and it's interesting that ALL browsers do more than usual work, outside of RFCs to "fix it" and bring content to user without issues.