Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google is getting so incredibly fast. As I'm reading this, your post is 16 minutes old. I copy/pasted your calculation into google search just to see how well the google calculator would interpret it. Your post here shows up as the first result. Now, I may expect it to show up first but within 16 minutes and assuming nothing else links to this content yet?


Google indexes this site very fast. I think it probably happens within 5 minutes.

Let's give it a try. It's 10:26PM EST.

This is a very obscure phrase that no one else ever uses.


There are at least tens of millions of web sites. This site ranks 3500 on Alexa and 12k on Compete. Google knows that and prioritizes its crawlers accordingly.

Maintaining a fresh index of the top 100k sites is where search engines get their best bang for the buck.


Doesn't Google use Pagerank for crawling policy/priority?


2 minutes later, it's there. This is a very obscure phrase that no one else ever uses. ... hackerne.ws/item?id=973578 - 1 minute ago


Going even further off-topic, the sheer speed of Google's updates is one reason why I have felt that the pain-point of "real-time search" is overrated.


There's a difference between searching for a unique phrase (the document is the only exemplar, therefore it ranks #1) and searching for the best page on a breaking topic.

That said, it's not clear to me that humanity actually needs real-time search. Maybe there's some advertising value in winning the race to be #1 for "hudson plane crash" but is there really a business here?


If you were able to sort the results by "last updated" then I might agree, but AFAIK you can't.

With Twitter Search you can easily see the most recent results first.


You can sort by date in a google search. Click on "Show Options" just below the search field, and about 15 lines down in the left bar, "Sorted By Date."

I think it's a fairly recent feature, but it's there.


Damn.


That the hackerne.ws domain is what gets indexed is so fucking ridiculous:

  1) Some cretin camped on a domain and didn't have the courtesy to do a 301
  2) PG is an asshole and ignores the Host header. HTTP/1.1 was a decade ago!
  3) Google is a jackass and prefers a low-pagerank domain.
     Maybe some asshat used Feedburner to proxy RSS via that domain?


That domain is no longer the one indexed. It knows.


Google offers a "push" mechanisms for sites. You'll have to google for the specifics, but if HN implements it then google knows when a page is updated and re-reads just that page.


I wonder if this site uses it though. I would guess, it probably does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: