Likely explains  from last year (see  for HN thread).
I remember some thread from the Safari's early ages where they hid the user-agent, to avoid raising attention.
So if I set in my robots.txt to disallow all bots except Googlebot, Applebot will index anyway? I don't think I like that precedent.
For those of us whose resources are especially tight, blocking Yandex, Baidu and MSN may be very helpful, your ideals notwithstanding.
The whole idea of versioning content based on who accesses it is broken and fundamentally at odds with the idea of the open web. Same goes for user-agent string madness, by the way. Yes, we should be able to tell robots from humans, but otherwise, it's supposed to be the Web.
Incidentally, this hits close to a pain point: I find it extremely annoying when publishers (like Elsevier) hide content behind a paywall, but still expose it to Googlebot for indexing. The result is that you are able to find a scientific article, which is not accessible (but Googlebot has cached snippets). This goes against Google's own guidelines (they used to tell people that Googlebot must not see different content from browsers). And it goes against the whole idea of the Web: if you want to hide stuff behind a paywall, do so — but then it is no longer accessible.
Going back to Applebot, I love the fact that they will now follow Googlebot instructions. Hopefully people will stop distinguishing who accesses content.
The idea was that instead of having to go to all sorts of places to find content, it would show you what was available at a given time based on what you were looking for.
...and then all of the network sites and the free Hulu stuff got put behind a user agent filter and essentially wiped out a huge part of GTV's reason for existence. The goal was to bring all of the free content into one place but the networks didn't want you watching a free stream in lieu of a cable broadcast. They wanted you to watch cable on your living room TV and only use the free streaming episodes from the computer in your office as a backup.
Same goes for services where web viewing/listening is free but if you try to access it from a mobile web browser, you have to either fool the site or subscribe to some mobile version.
I interpreted the sentence less literally and more like "in absence of rule, default to GoogleBot ones"
So if you put a wildcard rule forbidding access and a specific one allowing access to GoogleBot, AppleBot will honour the wildcard one.
That's how I would have coded it anyway: parse the rule for current agent string, if no rule applies, run it again with GoogleBot one before assuming that website does not contain restrictions.
I would assume this means that it will follow GoogleBot unless you specifically mention AppleBot by name and not by using a wildcard.
So a User-agent: * would be ignored if a User-agent: GoogleBot is found.
Getting crawled by the major search engines typically isn't that bad, they tend to know what they're doing. Getting hammered by some crappy local search engine is what's annoying.
We don't limit any bots, except once where we completely blocked Eniro in our firewall. Google, Bing and a ton of other could index at the same time, with no issue. Eniro for some reason decided to just index way to much at once, no reaction to robots.txt and no reply from the email they so kindly included in the headers.
But I see your point, it's just a bit sad when Google has become "The Internet".
I would assume that was more to keep the tech press from seeing it.
I may also be completely wrong :)
Results may suck in the beginning, but well, competing in hard stuff is hard, this is another apple maps. Hopefully they won't get bashed so hard since this is not so user facing.
That's one way of looking at it. A different one would be that the wealthiest company in the world could work with practically anyone and get a better product than they could build themselves more quickly with more features. The idea that they have to do everything in-house to get the best is paranoid and stupid.
For example, rather than build AppleBot, why couldn't they pump a few billion into DuckDuckGo to get use of DuckDuckBot? Or fund archive.org to access their index? Or buy in to commoncrawl.org?
It would be possible for Apple to use its fortune to benefit both their customers and the world. Google and Bing are not the only options.
I'm pretty sure they're not interested in quickly, or more features. Their users can still use other products (and lets be honest, most products WANT to be on the apple platforms), but at least this way they make sure their users aren't left stranded if those products cease to exist and/or are not updated. Remember the maps situation?
They are making sure their users have core features without having to depend on others' good will.
They could buy another company, sure, but I wouldn't count on DDG being ready to sell out, and besides, getting new people to work on their new thing is probably easier/better organisation-wise than to on-board a different company/organisation with a lot of baggage.
A while back, I think either Cook or Jobs mentioned that Apples makes PRODUCTS and doesn't sell ADS.
If that's true (and stays true) AND this is the beginning of a search engine for them, it's going to be VERY interesting to see what it looks like.
Thing is, if you asked the sender for permission to post pieces of their email, they'd probably say no. It seems a bit gauche to say posting is okay because "nobody told me not to."
I don't know why people don't treat statements like that with enough cynicism.
Shutting down Freebase was a big hit for many projects in the AI space, Freebase had 2,903,361,537 facts in comparision Wikimedia's Wikidata has just 13,924,224 facts - that's still a huge difference.
http://en.wikipedia.org/wiki/Freebase , http://en.wikipedia.org/wiki/Knowledge_Graph , http://en.wikipedia.org/wiki/Blekko , http://en.wikipedia.org/wiki/Powerset_(company) , http://en.wikipedia.org/wiki/Knowledge_Vault , http://venturebeat.com/2015/03/27/ibm-acquires-web-crawling-...
"Our business model is very straightforward: We sell great products. We don’t build a profile based on your email content or web browsing habits to sell to advertisers."
This obviously is not saying Apple doesn't sell ads. It's saying their core business model is selling products. The users are not the product.
And while I would love a privacy focused search engine, I just don't think you can build a good one without private data.
But truth to be told, DDG is so handy with the shebang keywords, I search Google using it and only leave DDG in my Firefox search bar and specialize with !g !gv !gi as needed.
and there are tons of them!!
EDIT: on that point. It would be very neat but probably very difficult to build a search index based on a agreed upon algorithm like bitcoin. There would be a need for some sort of voting system to update the algorithm and moderate people gaming it.
Apple's businessmodel is based on selling real products.
Google makes 90% of their revenue from advertising.
Apple makes <5% of their revenue from App Store sales.
Apple knows my web browsing behavior because Safari is my default mobile browser app syncing my history, reading list and bookmarks with desktop Safari.
Apple knows where I work, live and my travels via Apple Maps and Passbook
Apple knows a few of my purchases via ApplePay both in the real world transactions and via ApplePay in apps.
I used MobileMe mail before switching to GMail and have considered switching to iCloud mail.
Apple knows a lot more interesting things about me than Google.
I could be wrong, but I think Apple might have an ideological problem with a lot of content. It's their call if they decide to filter that stuff out, but it's censorship, and I struggle to see how that would result in a "better" search engine.
EDIT: Applebot still has uses beyond a search engine though. I think Apple are being straightforward in it's explanation. It's for Siri and Spotlight.
On the other hand: it is kind of crazy that when I read about "indexing the web" it starts ringing all kind of privacy bells, while apple's incentive might not even be to violate people's privacy.
If a company manages to upset Google then that company ceases to exists online, regardless of the validity of Google reason to blacklist that company. Ideally no search engine would be above 20% market share (20% being a random low number I just made up).
Right now websites and marketing material/money is directed at Google exclusively, making it continually harder for new search companies to succeed.
Google works well today, but what about 5 years from now, or 10 years from now? What if you need to be signed into Google+ to Google search? What if your Google+ account needs to be tied to a mobile phone? What if Google has limited search, and you need to own a Chromebook or Android device for the full search experience and viewing more than 10 results?
Competition keeps them in check from doing anything outrageous, and then if they decide to anyway, we have other choices we can migrate towards.
Can't Apple build itself some spam protection??!? Search is harder than this.
This is nice to see for a change. There are a lot of search engine bots out there and forgetting a a lot of them is easy to do.
In other words: Both Apple and Google in the mobile OS and device space. Both Apple and Google in the (mobile) search space.
Risks are inherent in the use of the Internet
That is to say, not very close at all.