Hacker News new | past | comments | ask | show | jobs | submit login

I work for Google but not on search.

I think in this case semantic web would not work, unless there was some way to weed out spam. There are currently multiple competing microdata formats out there than enable you to specify any kind of metadata but they still won't help if spammers fill those too.

Maybe some sort of webring of trust where trusted people can endorse other sites and the chain breaks if somebody is found endorsing crap? (as in, you lose trust and everybody under you too)




> I think in this case semantic web would not work, unless there was some way to weed out spam.

That's not so hard. It's one of the first problems Google solved.

PageRank, web of trust, pubkey signing articles... I'd much rather tackle this problem in isolation than the search problem we have now.

The trust graph is different from the core problem of extracting meaning from documents. Semantic tags make it easy to derive this from structure, which is a hard problem we're currently trying to use ML and NLP to solve.


>Semantic tags make it easy to derive this from structure

HTML has a lot of structure already (for example all levels of heading are easy to pick out, lists are easy to pick out), and Google does encourage use of semantic tags (for example for review scores, or author details, or hotel details). For most searches I don't think the problem lies with being able to read meaning - the problem is you can't trust the page author to tell you what the page is about, or link to the right pages, because spammers lie. Semantic tags don't help with that at all and it's a hard problem to differentiate spam and good content for a given reader - the reader might not even know the difference.


> PageRank, web of trust, pubkey signing articles...

What prevents spammers from signing articles? How do you implement this without driving authors to throw their hands in the air and give up?


In the interests of not causing a crisis when Top Level Trust Domain endorses the wrong site and the algorithm goes, "Uh uh," (or the endorsement is falsely labeled spam by malicious actors, or whatever), maybe the effect decreases the closer you are to that top level.

But that's hierarchical in a very un-web-y way... Hm.


The internet is still kind of a hierarchy though, "changing" "ownership" from the government DARPA to the non-profit ICANN.

And that has worked... quite fine. I have no objections (maybe they're a bit too liberal with the new TLDs).

Most of the stuff that makes the hierarchies seem bad are actually faults of for-profit organizations (or other unsuited people/entities) being at the top, and not just that someone is at the top per se. In fact, in my experience, and contrary to popular expectation, when a hierarchy works well, an outsider shouldn't actually be able to immediately recognize it as such.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: