
Google: "We're Not Doing a Good Job with Structured Data" - Anon84
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php
======
zandorg
Brewster Kahle (Archive.org) made a small fortune by selling WAIS (Wide Area
Information Server), which made each machine a search engine, and used a
protocol to request from all engines. Developed in the 80s.

Unfortunately, the Web has gone down this road of having centralised search
engines which somehow know their way through a maze.

It would be better to contact several sites in turn (NYT, FT, Archive, etc)
and pull information. If you have to pay, you go through the credit card
paywall. Then you don't have to worry about NYT's dark kilobytes.

Why aren't we using it? I guess that's what happens when you sell something
like that to AOL. ;-)

I should also point out that Google could have a stream for the NYT, where the
NYT feeds all its stories to Google on creation, and Google doesn't enable
cache for the stuff people pay for. But for all I know, that's already being
done.

But service-push-to-server is better than Google's pull-service-to-server.

~~~
Anon84
_which made each machine a search engine, and used a protocol to request from
all engines. Developed in the 80s._

Peer-to-peer search is alive and well, though:

<http://sixearch.org/>

------
uberc
It's a useful reminder of how distinctly unsolved the search problem is.
Google has taken stabs at this area from different directions with Google Base
and Product Search, but there's still a whole world of information "out there"
which is inaccessible or not usefully organized.

------
leoc
The long-delayed epiphany.

Google's weakness with structured data and its weakness in cultivating third-
party developers are mutually reinforcing and seem to have arisen from the
same hubris. In other words, bring back the search API already!

~~~
litewulf
(What do you mean by bringing back the search API? Isn't
<http://code.google.com/apis/ajaxsearch/> what you want?)

~~~
wildwood
From that page: "The Google AJAX Search API lets you put Google Search in your
web pages with JavaScript."

I've never understood why they call that an API. It's not. It's a web 2.0
widget.

Google used to have a SOAP-based API for natural search results, and it was
sweet. I miss those days...

~~~
leoc
It seems the AJAX Search API now lets you get a machine-readable list of
search results in a relatively straightforward fashion; I'm not sure that was
true back when the SOAP API was canned.
<http://code.google.com/apis/ajaxsearch/documentation/#fonje> But the terms
and conditions still seem to prevent you from using structured data to do
anything useful to the search results.
<http://code.google.com/apis/ajaxsearch/terms.html> (see especially the start
of 1.3)

------
th0ma5
if only there was a machine-readable semantic-based web everyone could use ;p

~~~
jdrock
Actually, we're creating a way for developers to access the web really easily
for the purposes of different kinds of analysis, including building semantic
frameworks. The idea is that we give you really cheap, really fast access to
millions of pages, and you use our platform to analyze Internet content how
you want. $2 per 1 million pages crawled, $0.03 per CPU-hour used for any
computing you want to do. Not yet at beta, but you can check our site:
<http://www.80legs.com>.

~~~
ntoshev
Looks cool. How would it compare to Amazon/Alexa search service? In theory
they allow you to build your own search engine, but in practice you can't
really amend their ranking formula and don't get access to the raw inverted
index (with tf-idf statistics and such). Yahoo BOSS is in the same league.

Your service would be cheaper, though.

~~~
jdrock
Yes, in theory you could do something similar with AWS. However, you'd have to
put in the work to handle all the complexities of parallel-computing and web
crawling. We do that for you. And yes, our service is cheaper.

We'd love to see developers using our platform to build some very interesting
indexes based on innovative concepts.

------
zandorg
Maybe Mechanical Turk could help with bulletin boards, etc.

