Hacker News new | past | comments | ask | show | jobs | submit login
Parsing HTML with regexes (stackoverflow.com)
2 points by louis-paul on July 23, 2015 | hide | past | favorite | 1 comment



I think the worst thing is to parse HTML with regexes.

Had research in past related to this. The trick is that big amount of websites have broken HTML, what brings unexpected results when parsing with regexes.

Entire internet is a bit broken and it's interesting that ALL browsers do more than usual work, outside of RFCs to "fix it" and bring content to user without issues.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: