Hacker News new | comments | show | ask | jobs | submit login
Introducing Structured Snippets, now a part of Google Web Search (googleresearch.blogspot.com)
86 points by peeyek on Sept 22, 2014 | hide | past | web | favorite | 22 comments

I always assumed that the personal-agent style work (ie a bot that is there to help you wade through the morass of information) would be a personal computation - local to you, probably configured and augmented by you.

Yet Google is trying to do it for everyone - and honestly does not seem to be doing a bad job.

So what is the advantages of the old idea of an almost-AI assisstant working for me when BFG (Big Friendly Google) does it?

> So what is the advantages of the old idea of an almost-AI assisstant working for me when BFG (Big Friendly Google) does it?

Privacy. Yacy[1] looks like a good start but it has miles to go before it can reach the same usability as popular services.

[1] http://yacy.net/en/

> honestly does not seem to be doing a bad job.

Is there any difference between the goals of an individual (person, company) vs. the average? Is information a competitive advantage? Today we provide unpaid feedback to improve private algorithms, tomorrow ..?

Edit: downvoters, what is a rhetorical question?

One is derived from algorithmic matching, the other is explicit opt-in by the user.

> local to you, probably configured and augmented by you.

Even if you opt-in to a centralized black box algo, it's not easy to audit the resulting advice.

There is a big difference between explicit goals and the reverse-engineering of intent, e.g. from search history.

We need both explicit goals and auditable algos.

At what point will all these snippets stop being "fair use" and start being copyright violations? What happens if your website aggregates schedules of some obscure (but interesting to a niche audience) event and makes money off of advertising. And Google decides to show the snippet of the next 5 upcoming events on the SERP itself, thus killing your traffic.

It's a relevant question. Regardless of what one thinks about copyright law, it is what it is and all players should abide it. IANAL, so I can't say :(

However, I think there is a need for a more granular robots.txt such as:

    User-agent: *
    Allow-crawl: /
    Limit-display: /events 10words
    Limit-display: /blog 30words
This way, websites can state how big of a search result they are comfortable with and everyone abiding these limits is safer legally. Websites will have to be a bit lenient in order to have prominent results and gain visibility, but they will have to limit that so users click the search result.

Right now, Google has its algorithms decide how much copying is legal and if you disagree you can either disallow Google Bot or sue them, so this would provide a middle ground.

Maybe showing the schedule for an event shouldn't be worth anything and it is an anomaly that it currently is? My guess is that most of the time sites like this are just out-SEOing the actual event page.

  User-agent: *
  Disallow: /

If the value of your site can be diminished by showing 5 lines of text, your site probably isn't worth that much.

> algorithms to determine quality and relevance that we use to display up to four highly ranked facts from those data tables.

Are the selected facts related to the user's query or search history?

Looks exactly like DuckDuckGo's topic summary boxes, only I bet google spins way more CPUs to autogenerate them...


That looks more like google's oneboxes/knowledge graph box, which already had that kind of data (unless you're referring to something else on that page, correct me if I'm wrong).

This looks like it's per-page extracted data, if google has been able to extract data from that page, so basically a structured-data result snippet.

Google finally launched Yahoo's SearchMonkey.


It looks like the main difference is that SearchMonkey reads structured data, whereas Google infers structured data: even if the relevant data isn't marked on the page or in any particular format, Google will still find and surface it.

It's not so difficult to "infer" structured data from sources like Wikipedia. Dpreview is also an easy example. I don't see any proof the feature is actually about automatic inference, and not about hard-coded rules how to find structured data on some popular sites.

Come on Google, when you are going to impress us again?

Things like this feel like the 90s and shouldn't even to be something worth speaking or caring about.

Introducing "we are the world's largest scraped round 202 and no one seems to notice or care."

Google should share their income from SERPs with webmasters.

... and let the spamming begin

Google once again takes information from your web page and puts it on their search results page. How will this affect click-throughs? I can see how it might help with traffic sent from SERPS as your site might seem more authoritative.

I really hate the way they market this changes as improvements for the users. This is meant to improve just one thing: the time you spend on a serp, where Google's directly owned advertising slots are. Authorship was actually good for the user, but it was not for their ads, because it actually improve CTR. No surprise they decided to kill it. Plus, they're gathering information people (or people made systems) spent time organizing and writing. Don't tell me that you can just opt out of Google (as I see mentioned in one other comment) because that's ridiculous. If you're not there you don't exist. Unless you're selling shady products/services on the deep web, you need to be indexed by Google. That's the biggest monopoly nobody's really enquiring about, and I keep wondering why.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact