Hacker News new | past | comments | ask | show | jobs | submit login

It’s much harder to build good search engines on top of plain text though, because it’s not possible to deterministically detect hyperlinks within text.





If the word starts with "http" and contains a period, it's probably a hyperlink. HN uses this assumption to automatically convert plain-text hyperlinks, and it works well enough.

The real problem is dealing with SEO spam, not occasional false positives with hyperlink detection.


Sometimes auto-conversion (esp with all the TLDs and gTLDs) has weird outcomes: >> "Rudy Giuliani accuses Twitter of bias for hyperlinking text" | https://www.theverge.com/2018/12/5/18127063/rudy-giuliani-tw...

> The real problem is dealing with SEO spam, not occasional false positives with hyperlink detection.

I mean that's easy to say, the obvious question then is why it's apparently too hard for Google, since they don't do it despite the fact that it would obviously be of value given their algorithm.


Why would it be of value? Google isn't in the business of giving best-quality search results; it's in the business of giving passable search results while exposing users to ads and collecting data for their advertising machine.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: