Hacker News new | comments | show | ask | jobs | submit login

Can you provide a query where that is still the case ? It hasn't happened for a few weeks for me, since stack overflow changed their title seo.

Happened to me not ten minutes ago with the search string "pass json body to spring mvc"

The efreedom answer at the 5th position is actually the most relevant - the stackoverflow question from which it was copied doesn't even show up on the first page. There is one stackoverflow result on the first page, but it deals with a more complex related issue, not the simple question I was looking for.

In Google's most recent cache, the efreedom result has the word "pass" on the page due to some related links content near the bottom, whereas the stackoverflow page does not. If you modify your query to [parse json body to spring mvc], stackoverflow is at position #1, and efreedom is at position #4. This still has room for improvement, but it would seem like the simplest explanation is just the better match on your query terms.

Didn't notice that - that's good to know. That's actually exactly how I'd expect a good search engine to behave. As annoyed as I am when I get a junk result, I'd be even more pissed if Google dropped terms from my query just so it can return a more popular site.

Of course, then all the content-copy farms will respond by copying valid content plus word lists - hopefully Google knows how to detect that.

to be completely fair, the very first link (from the official springsource blog) also appears to answer your question.

True, it does... I just noticed this because I've actually got in the habit of scanning for stackoverflow results, first - they almost always are right on the money, and it's less cognitive overhead to read a site format I'm familiar with, with extraneous discussion well tucked-away.

It almost feels like a cache miss when I have to drop down to the official site/documentation, since that typically requires a greater time investment to read through to find the relevant sections.

I guess that's a tribute to how well stackoverflow works, most the time. And also to how lazy I am.

Thanks for the concrete query--I'm happy to ping the indexing team to make sure it's not tickling any unusual bugs or coverage issues. Jeff's original blog post helped us uncover a few of those things to improve.

If you're looking for SO results why not use 'site:stackoverflow.com'? That would clear out everything else.


Stackoverflow comes in at number 8 while clones are 6 and 10

Assuming we are looking at the same results, the pages at position #6 and #10 are not copies of the stackoverflow content at position #8. They are copies of http://stackoverflow.com/questions/1399293/test-priorities-o.... Unfortunately, the only place that the word "delay" (which is in your query) sometimes appears on that stackoverflow page is in the "related" links in the right column. At the time Google last crawled that stackoverflow page (see the cache), "delay" wasn't on the page, only "delayed". Whereas, the last time Google crawled the other two pages you mentioned, they did have "delay" on the page. Google should still be able to do better, but this little complication certainly makes things more difficult.

Yeah.. it might not be the best example, as what I was searching for is not current possible, so there are no correct results for it.

One UI issue we've struggled with is how to tell the user that there isn't a good result for their query. This comes up when we evaluate changes that remove crap pages all the time. For nearly any search you do, something will come up, just because our index is enormous. If the only thing in the result set that remotely matches the query intent is a nearly empty page on a scummy site, is that better or worse than having no remotely relevant results at all? I definitely lean towards it being worse, but many people disagree.

I also have one spam site example. http://www.google.com/search?q=internet+phone+service Look at 3rd result for internetphoneguide.org

What I find seriously bad is that even a huge site like stackoverflow has to optimize its search engine strategy to fight the problem. Little web sites are doomed.

SEO is supposed to be StackOverflow's core competency. They are completely aware most people end up on their site via Google. The search on their own site sucks.

The reason Q&A sites are so visible is that people tend to type questions in their search engines, so Q&A sites are a good match to those.

Not really, a big website can not cover all keywords in its niche, no matter how big it is. The strategy for small sites is to focus on long tail keywords (3-4 terms) and outrank the big guys.

Yes but it sucks Stack Overflow had to add that to the title to fight the spammers because it is often distracting to see it the search results.

Agreed. My ideal search engine wouldn't require real websites to play in the SEO arms-race to beat out the junk sites.

That ideal search engine would find itself quickly the target of people that would try to gain an advantage by figuring out how it works.

And then another SEO cycle would start. Don't forget that before google came along nobody was trying to 'game the system' with backlinks and other trickery, the fact that that google is successful is what caused people to start gaming google.

If it were "ideal", it wouldn't be game-able. I'm not going to claim that this ideal is possible!

Any real-world search engine is going to be analyzed until enough of its internal mechanisms are laid bare to allow gaming to some extent.

Typically you pretend the search engine is a black box, you observe what goes in to it (web pages, links between them and queries) and you try to infer its internal operations based on what comes out (results ranked in order of what the engine considers to be important).

Careful analysis will then reveal to a greater or lesser extent which elements matter most to it and then the gaming will commence. Only by drastically changing the algorithm faster than the gamers can reverse-engineer the inner workings would a search engine be able to keep ahead but there are only so many ways in which you can realistically speaking build a search engine with present technology.

Your ideal, I'm afraid, is not going to be built any time soon, if you have any ideas on how to go about this then I'm all ears.

I think the solution is a diversity of search engines. Maybe even vertical search engines. These days I get such shitty results from google for programming related searches that I've started going straight to SO and searching there. If I don't find it there I then try google, then try google groups search.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact