Hacker News new | past | comments | ask | show | jobs | submit login
Google Directly Embedding Stack Overflow Responses in SERPs (ma.ttias.be)
96 points by Mojah on Jan 27, 2015 | hide | past | favorite | 63 comments



Stack Overflow utterly destroyed the other answer sites for just this reason: they make choices based on what is best for the user, not Stack Overflow. Check the traffic of Expert Sexchange, or Quora, vs. the entire SO network. If all you care about is short term gains, you lose to the company that wants to build a trusted brand.

At least that's the way I think it should work out, and in this case I think it has. There is such a clear delineation between good and evil in this market, and such a clear leadership position held by the "good", that it's interesting that folks are wringing their hands worrying about whether SO gets enough impressions out of this. They don't need/want merely impressions. They want your trust and your participation. I know where I go when I have a question...how about you?


I started seeing this feature a few weeks ago, and I love it. They even bold exactly the segment of the answer you're interested in. Amazing stuff.

I basically treat it like any other link in Google that just happens to have the text I want formatted differently than other links. In most cases I end up clicking the link anyway to verify the context is correct.

VERY FRICKIN' COOL.


I go to Google (or Bing), and the search engine sends me to stack overflow.

To agree with you: occasionally, quora pops up. But I actively avoid them.


Also Quora forces you to signup even to view all answers/comment. (or you have to do silly url hack, appending ?share=1). Secondly the quality of answers on Stackoverflow network is much better than Quora. Lastly, I have seen some answers deleted by Quora mods for their personal reasons.* There is no way such thing can happen on Stack Overflow.

* this [0] is the question I remember. Directi is an Indian company. And mods who work for Directi deleted the answers which showed Directi in badlight. It doesn't matter whether answers were correct or not, let the downvotes decide them, but this kinda of censorship is not cool.

[0] - http://www.quora.com/Is-Directi-banned-from-IIT-placements


Yes - I deleted my Quora account after catching them doing search-engine cloaking. Their policies make it clear that their priority is juicing numbers for eventual resale rather than building a useful resource.


Lastly, I have seen some answers deleted by Quora mods for their personal reasons. There is no way such thing can happen on Stack Overflow.

First, there is no way that there is "no way such a thing can happen" on SO. SO has moderators who are human. They have guarded themselves against becoming paywalled by putting a CC license on everything they make, but there's no technical solution that can solve moderators being human.

Second, there are people who complained specifically about the pettiness of the mods. Google for complains about Stack Overflow and you'll find them, including on HN.

EDIT: it now occurs to me your specific complaint may have been "mods deleting answers." But mods on SO can delete questions, answers, and comments. You can find people talking about this in meta.stackoverflow.com, among other places.


As much as I love the Stack Exchange sites, I also have occasional complaints about moderation. Our stuff (Webmin and Virtualmin; which whether one likes GUIs for sysadmin tasks or not, is downloaded 3 million and several hundred thousand times a year, respectively, and so is a very real part of sysadmin infrastructure in a lot of places) gets asked about occasionally, and no matter where it is asked, including on Server Fault (about system administration) there will often be a mod saying it is off-topic. Not sure what their bone to pick there is, but I don't really hold a grudge. I'd just as soon folks come to our sites or mailing lists to ask questions, anyway. It's just an odd thing some mods do.


Note that high-reputation users can see deleted questions and answers on SO, and petition for them to be undeleted. So there is at least a check on the ability of the moderators to delete things for personal reasons.

Most of the complaints about deletion or closing on SO are not about it for petty personal reasons, but instead a general community mindset that is a bit too quick to jump to closing imperfect questions, questions that are close to duplicates but not quite, questions that are somewhat subjective but still have a lot of users interested in them, and so on.


>> Note that high-reputation users can see deleted questions and answers on SO, and petition for them to be undeleted.

While overall, SO's a great place to get questions answered, the problem, in my mind, is that there might be a disconnect between the high-reputation users and newbies who asked and the newbies come to SO to find answers to the questions that do flagged down or deleted.

While I don't think that any of the high rep users are newbie haters, I do think it can be challenging for them to see things from a newbie's point of view.


In my experience, the standard newbie questions have all been answered. A quick search is all it takes to find the answer. The problem, of course, is that newbies rarely know the right questions to ask (read: search), which is to be expected because they don't posses the right amount of domain knowledge.


>> The problem, of course, is that newbies rarely know the right questions to ask (read: search)

When I'm "learning a new tech" -- let's use switching from Rails to Node as an example, I usually go to Google first.

Ask a question like "why would I choose Framework X over Y" in Google's searchbox, and your top results will have the same question on SO, except that it has been closed because the answer is opinion-based. The few answers to the closed question are usually fairly objective and insightful and leave me wishing that more people were able to add their 2 cents.

Sometimes newbies ask a broad question to get a big picture perspective (from multiple people answering) that can't be provided by a couple of blog entries found in Google results. And yes, those questions might elicit opinionated responses, but I would argue that the responses add more value to SO than they take away.


Ah, if your problem is about opinion questions like that, I recommend you not try to get StackOverflow to support that. StackOverflow is specifically geared towards objectively answerable questions; solving an actual problem that can be solved by code. "What's the best framework" or "why is framework X better than Y" will never fit well into its model.

I recommend starting another site, tailored for that kind of question, if that's something you're interested in. I imagine it would look pretty different than SO; instead of simply sorting by votes, you'd want to group answers by "reasons to prefer X" and "reasons to prefer Y", each of which could be voted on, and with optional discussion threads to clarify each one of them, or something of the sort.


>Ask a question like "why would I choose Framework X over Y" in Google's searchbox, and your top results will have the same question on SO, except that it has been closed because the answer is opinion-based. The few answers to the closed question are usually fairly objective and insightful and leave me wishing that more people were able to add their 2 cents.

They are often very outdated too - some were closed 3, 4, 5 years ago and never updated.


>> They are often very outdated too - some were closed 3, 4, 5 years ago and never updated.

This is true, but closing a question pretty much guarantees that it won't ever be updated.


I don't really think that you're right that high-rep users can't see things from a newbie's point of view. Most high-rep users got there by being good at seeing things from a newbie's point of view, and thus being able to explain things well to newbies.

The answers that get the most upvotes, and give you the most reputation, are the basic problems that everyone has, since then everyone finds that question, and upvotes the answer that solves the problem for them.

I'll note that as a high-rep user (top 0.1% overall), I more often see the moderate-rep users as the ones closing a lot of questions. The high-rep users are busy getting more reputation by actually answering questions, it's the moderate rep users that have just enough to be able to vote to close that wind up spending all of their time policing the site.

The biggest problem that I have with SO is the review queue, which queues up questions that have close votes or flags, in order to get a quicker resolution, and rewards people with badges for spending a lot of time reviewing questions that they don't really know much about. I've seen so many slightly vague questions closed because of that, when if you had actually just spent a few minutes getting the user to clarify via comments, you could have figured out what their actual problem was and answered it. But because there's a review queue, and badges for reviewing lots of questions, answers, edits, etc, some people just spend all their time in there rather than actually trying to answer or clarify.

Anyhow, I very frequently hear vague complaints about SO, but it's much more informative to look at particular examples. Do you have examples of questions that have been closed that you don't think should have been? As a high-rep user, I can vote for them to be re-opened, and as an experienced user I can help reword them and make them more clear so that they will be more likely to be reopened and less likely to be closed.

By the way, one thing that can be challenging is figuring out, from a question that doesn't provide enough information, what the user is asking. This is hard for anyone to figure out, newbie or experienced user. The best way to avoid this is to ask questions clearly. State the exact problem you're trying to solve, show a small code example that demonstrates the problem, state what you've tried in order to solve it. And state a question that can be solved objectively; there's no way to provide a good, unambiguously correct answer to an opinion question, and the whole reason for StackOverflow to exist is to provide good, unambiguously correct answers, not to prompt endless discussion threads in which everyone discusses their favorite X.

If you follow these guidelines, you should generally be able to get a good answer on SO, and not have your question closed (unless as a duplicate, which is another way of solving your problem by redirecting you to existing answers).


> First, there is no way that there is "no way such a thing can happen" on SO. SO has moderators who are human.

I agree. It may be possible, but highly unlikely. However I am yet to witness such thing happen on SO or read someone's experience on the same. I agree, SO has different problems with moderation, but mostly they are related to community, not personal level.

Regarding edit, yes mods also delete questions/answers on SO. But they are still visible to high rep users and you can still talk about it on meta to reinstate. No such system on Quora, afaik.


There's an extension to get rid of the log-in requirement:

https://chrome.google.com/webstore/detail/quora-unblocker/pc...


Not to mention that SO's content is released under a Creative Commons license, and for exactly this reason. Google doesn't really even need their permission to do something like this.


Extremely ironic that you post this in response to Google search results, which have become a prime example of choosing what's best for Google over what's best for the user after first decimating the opposition.

And so far still pays off nicely for Google, even though the results are getting shittier and more biased year after year.

The only lesson here seems to be "destroy your opposition first, before you start focusing on ruthless exploitation". Let's just hold off on drawing conclusions about SO until another decade.


I agree. I was a Google cheerleader for years (and still hold some GOOG, though one of these days I'll do some research and find a new company to move it to). But, I have lost a lot of trust in them. I use DDG for search, Firefox for browser, though I still have a lot of data at Google. But, in this case I don't think Google's behavior is misaligned with user interests or with those of SO.


Who's the "user"? I sometimes find useful information on SO, but I would never think to ask a question there ("closed because ..."), much less play their "gamification" of Usenet. They're less terrible than the "?share=1" people, but that's not saying much.


Their traffic and user engagement says they have a lot of "users", and vastly more than their competition (despite the huge hype about Quora, they got absolutely wrecked by SO/SE).

I don't go in for the gamification stuff, but I like finding correct answers without the bullshit of forced logins, SEO trickery, ghosting of answers, etc. I login to SO/SE because I get value from it (I ask questions and logging in let's me get notifications when someone answers). I suspect the gamification is the least of what they got right. The core things they got right are that they respect their users, they make it easy/quick, they give you the best answer where you expect to find it (including in search engine results and without logging in), and they treat the content as thought it belongs to the community rather than the company. It's kinda like what reddit (and HN) got right. The upvotes are ancillary, but fun sometimes.


> So it appears this is happening with Stack Overflow knowing about it and approving it, after all — they implemented schema.org.

Not really. As far as I know, implementing schema.org means your data is structured, not that anyone can do whatever they want with it. The real reason Google can do it is because Stack Overflow user submitted content is licensed under CC BY-SA 3.0.

Unfortunately Google is not complying with the license since it requires providing a link to it and indicating which changes were made[0].

There is a mechanism in schema.org to specify licenses[1]. I couldn't find it in SO answers' attributes, but Google shouldn't assume that means they can do whatever they want with the content! In fact, wouldn't that mean the content is under copyright (unless specified otherwise) and therefore not remixable/shareable? As far as I can tell, even if those are excerpts, Google's use does not fall under fair use.

Anyways, props to SO for choosing CC BY-SA for their user submitted content. I think it's fair (after all SO feeds from their users) and, even if detrimental to their interests in the short term, in the long term builds trust among their users.

[0] https://creativecommons.org/licenses/by-sa/3.0/

[1] https://schema.org/license


CC BY-SA is not the license that Google is using.

Stack Overflow added metadata to their HTML to enable Google to use the answers in their response box. This is the primary (and probably sole) reason for this metadata, therefore it constitutes an implicit license grant to Google.

IANAL.


> CC BY-SA is not the license that Google is using.

Exactly, so their license is not compatible with SO's (which requires Share-Alike) and therefore they're violating SO's license until they explicitly comply with CC BY-SA in those excerpts.

What matters here is which license the source content is distributed under.

> Stack Overflow added metadata to their HTML to enable Google to use the answers in their response box. This is the primary (and probably sole) reason for this metadata

SO added metadata to enable anyone to use their data in any way (as long as they comply with the license).

I'm very grateful for having structured data. Even if schema.org is a Google (+ Microsoft + Yahoo + Yandex) initiative, the idea is to structure the content, not to give it for free.

> therefore it constitutes an implicit license grant to Google.

I don't think it works that way. I'm not sure if "implicit licensing" will hold in court (after all, there is a explicit license contradicting it), but even if it would, the only thing schema.org implies is that the data is structured.

Here are schema.org terms and conditions: https://schema.org/docs/terms.html It never says you give them your content for free.

> IANAL.

IANAL either :)


As the owner of the content doesn't SO reserve the right to also release in whatever other license they deem useful? Is there some source stating they didn't give Google express use to use the content in the exact way it's currently being used?

Edit: My thanks to the people who spelled out in the terms. It appears it is CC licensed.


As the owner of the content doesn't SO reserve the right

SO isn't the copyright holder of the content; the users are.

http://stackexchange.com/legal/terms-of-service#3SubscriberC...


http://stackexchange.com/legal/terms-of-service

> 3. Subscriber Content

> You agree that all Subscriber Content that You contribute to the Network is perpetually and irrevocably licensed to Stack Exchange under the Creative Commons Attribution Share Alike license.


> under the Creative Commons Attribution Share Alike license.


Did you read the words before those?

> irrevocably licensed to Stack Exchange

> You agree that all Subscriber Content that You contribute to the Network is perpetually and irrevocably licensed to Stack Exchange under the Creative Commons Attribution Share Alike license.

This bit seems like SE owns the content:

> You grant Stack Exchange the perpetual and irrevocable right and license to use, copy, cache, publish, display, distribute, modify, create derivative works and store such Subscriber Content and to allow others to do so in any medium now known or hereinafter developed (“Content License”) in order to provide the Services, even if such Subscriber Content has been contributed and subsequently removed by You.


I find it hard to imagine that a court would interpret "licensed under the Creative Commons Attribution Share Alike license", followed by a summary of the rights that are granted by said license, as anything other than a grant of rights under that specific license. It certainly doesn't say anything about transferring copyright ownership.


> This bit seems like SE owns the content

Nope, they own a perpetual and irrevocable license. That's distinct from owning the content, because you still own the content, still can use it yourself, and still can issue any license you want to others to use it.

Were the perpetual and irrevocable license also exclusive, this analysis would be different.


Yes, it's irrevocably licensed to SE under the CC-BY-SA license. Seems clear to me.

This bit seems like SE owns the content:

Yes, you grant those rights. That's exactly what granting them the content under the CC-BY-SA license does. You don't transfer the copyright, so they don't "own" the content.


But the user no-longer owns the content. The user can't control what SE does with the content.


> But the user no-longer owns the content.

Yes, they do.

> The user can't control what SE does with the content.

To the extent that is true (because they have granted a generous license) that doesn't change the fact that the user still owns the content, can use it themselves without permission from SE, and further can license it to others without permission from SE.

The original owner having granted a generous but non-exclusive license to one party does not make the party benefitting the license the owner of the content or stop the original owner from being the owner of the content with all the privileges and rights that go with that.


This is why using the word "own" and the concept of "property" in general when applied to these things is folly.

The user is still the copyright holder, which is what's usually known as "owning" the content.


Not sure why you got downvoted - here, have +1.

I agree that they probably don't comply to the license by not linking to the license itself, however, I doubt anyone will take them to court for this.

Also, this could be seen as quoting and is in some jurisdictions allowed (with due restrictions). So maybe they don't care about license... Aren't they doing the same for ages, when they show relevant portion of the page with the keywords?


> I doubt anyone will take them to court for this.

Probably not worth the hassle for SO, but it's a matter of courtesy. "Since you're so kind to add metadata and have a permissive license, I'll respect your license so you keep attaching metadata to your content."

Everyone wins.

> Aren't they doing the same for ages, when they show relevant portion of the page with the keywords?

Good point there.

I'm not knowledgeable enough on what exactly constitutes fair use for search engines. Is remixing the content actually a snippet, or is it a derived work? I'll research it ASAP, out of curiosity.


I agree with your first paragraph, but I don't agree that google is violating CC BY-SA. They do provide a link back to the original source, very clearly shown in the picture in the article. This seems sufficient for attribution, though perhaps they should go further and actually cite the answerer's username.

You could argue that combining two answers counts as modification, but that seems more of a nit-pick than enforcing the spirit of the license. Otherwise the text is unchanged.


From the license:

> Attribution — You must give appropriate credit, _provide a link to the license_ [...]

There is no link to the license. There isn't even a mention of the license! For me this is the most important point, therefore Google is not complying with the license.

---

> You could argue that combining two answers counts as modification, but that seems more of a nit-pick than enforcing the spirit of the license.

This is a minor point for me, but I still disagree.

Google's work is definitely a remix and the license is very explicit about it. We're not the ones meant to interpret what's the spirit of the license or when it's fine to violate its terms. I guess (IANAL) even non-remixed excerpts (i.e. removing content) should be marked as excerpts to fully comply.

Here's example of why it might not be just a nitpick: answers are sometimes conflicting, and their meaning might change radically when automated excerpts are amalgamated. Imagine a right answer turned wrong by the automated merge. This could damage SO's reputation, driving away visits which would otherwise come back to SO as a reliable source.

If I were to remix them by hand, I would definitely mark it as a derived work.


If this was a concern than Search Engines would be worthless!

They couldn't embed any snippets from any webpage ever, as all materials are by default copyright their author with no license for reproduction.

I'm very sure there have been several legal cases involving search engines extracting information from the website. Spain's news organizations come to mind - the end result being that not licensing Google was bad for business.



I suspect this actually increases exposure and traffic to SO. If you look at the quoted data it is definitely not enough to decide if that's the right answer. It's going to get you to click through as you really need context, like vote ups, compare/contrast with other answers, look at context of question, etc. I think it's a bright move on SO's part and they really do control what they share with Google.

It's basically a free ad for SO. Smart.

Also, it could be that Google is paying SO to do this. You never know.


They are not paying us :) They did work with us to develop a new schema.org type for Questions (http://schema.org/Question) and Answers (http://schema.org/Answer) which we implemented.


So do you like what they are doing here or would you prefer they didn't?


> So it appears this is happening with Stack Overflow knowing about it and approving it, after all -- they implemented schema.org. But at the cost of pageviews?

Having a useful answer appear at the top of a Google search with an obviously link to your site can only help you. It builds your brand as an authority and provides the first link for people to click for problems. I think this is a clear win for StackOverflow.


Duckduckgo has had this feature for a while... is Google playing catch up?


Yeah. This isn't anything new or unique. DDG does it better.


Given the difference in size and userbase, if duckduckgo were to be the one playing catch up on new small features like this, they would really be doing it wrong ...


I'm not quite sure why I'm being downvoted here ? I'm not criticizing but pointing out the obvious.

Or do you think the smaller/newer players aren't supposed to be the ones bringing new ideas and executions but are meant to merely follow the giants from a distance and copy them ? In search or in other fields ... Because that's what downvoting what I said implies.


While it is a good point about maybe affecting page views, the use of schema.org markup is a tide that lifts all boats: great technology. While other embedded semantic markup schemes are also very useful, it makes sense (to me) to have schema.org be the standard that gets used.


Is there a search engine tuned to programming or CS search? I.e. rank programming related results higher?


I've been working on such a thing here:

http://gotoanswer.stanford.edu

The hypothesis behind the search engine is that correct answers will share some of the same tokens in common. For example a search for "mysql error 1045" brings up posts from stackoverflow, linuxquestions, ubuntuforums, and serverfault that all mention an incorrect password. The "best" answer is the one that instructs you on how to reset your password.

Other queries you can try are:

"linux check hard drive space" "openssl aes" "reset safari 8 on yosemite" "how to remove spilled wine from macbook?"

I had concerns as well about whether this type of search engine "steals" content from other sites. I consulted with a few lawyers and their legal consensus was that this fell under fair use since I was a) Only showing a portion (specific posts) from each site and b) Transforming the posts in a novel way (i.e. re-ranking them). The transformative requirement is the #1 factor for fair use:

http://fairuse.stanford.edu/overview/fair-use/four-factors/#...


Google. With time, it will prioritize sites you tend to click.


Seems like a short-sighted strategy, unless they have a plan to pay for the content they're using. Impressive though.


Surely relying on ad revenue from people redirected from search engine results is the short-sighted strategy? It makes your revenue stream rely on the whims of uninterested third-parties.

The comments on the article even imply that Wikipedia "suffers" from having their content embedded in Google search results in this way; as if Wikipedia relies on ads, or has free bandwidth to waste, or cares more about getting visitors than spreading knowledge.

Unlike Wikipedia, StackOverflow is a commercial venture with ads, but are we really so jaded that we expect them to sabotage those who want to use their content? Have CC licenses turned into nothing more than trendy logos, to which we only pay lip-service?

SO's informal opinion-of/relationship-with Google is well-documented, for example: http://www.joelonsoftware.com/items/2008/04/16.html http://blog.codinghorror.com/trouble-in-the-house-of-google/ http://blog.codinghorror.com/the-importance-of-sitemaps/


> Surely relying on ad revenue from people redirected from search engine results is the short-sighted strategy?

What's the long-sighted alternative?

> It makes your revenue stream rely on the whims of uninterested third-parties.

Involvement with search engines makes your traffic rely on the whims of uninterested third-parties. Ad networks will predictably go wherever the users are. Search engines are rather unpredictable.


"What's the long-sighted alternative?"

StackOverflow's primary income source is Stack Overflow Careers.


Yes. That's why they took more investment this year: http://joelonsoftware.com/items/2015/01/20.html


If they let Google cut off their supply of eyeballs, Careers isn't going to do too well, either, though. Google is adding a tall hurdle to the front of their user-onboarding funnel.

It's a fair point that ads are not the best way for them to monetize, but practically any other form of monetization also involves getting users from a search engine onto their site.


> Involvement with search engines makes your traffic rely on the whims of uninterested third-parties.

That might be the case, but I was interested in revenue, not traffic.


As detailed in TFA, this only happens for sites that have been intentionally set up for it to happen, via microformats. Perhaps there is a risk that a particular site will cease to operate in that fashion, but it would just be replaced by another, and Google are pretty good at choosing the most appropriate links to highlight.


If you google "html anchor element" you get an embedded result from w3schools.com

Needs more testing for quality control.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: