

Google Squared - secret
http://www.techcrunch.com/2009/05/12/what-is-google-squared-it-is-how-google-will-crush-wolfram-alpha-exclusive-video/

======
robotrout
There are those who say that Google is a parasite that profits off the rest of
the internet, displaying copyrighted content of others and managing to sell
ads for the privilege. Of course, the people who say these things are quickly
slapped down, by folks citing how, without Google, nobody would find those web
pages anyway, and Google is doing them a great service.

I have to note that this sort of data scraping shown here, by Google, with no
incentive left for anybody to actually travel to the original source web
pages, seems to make the first groups case much, much, stronger.

~~~
swombat
I'd like to take the opposite side:

I believe that once you publish data (particularly data, but even other kinds
of information) on a public, indexed web page, you automatically relinquish
control over how it will be used.

That's just reality, and it's also the most profitable way to view online
published data, from a global perspective. It is better for all of us if the
act of publishing data on the web grants an automatic licence to the
downloader to mash it up any way he sees fit. The alternative scenario, where
you have to ask for permission for every bit of data, is frightening.

Just because it's "big Google" who is doing the mash-up doesn't make it less
ethical than if it was some start-up coming out with a new product (or, say,
Wolfram Alpha).

~~~
robotrout
I would submit that just because it's 'big Google' who is doing the mash-up
doesn't make it MORE ethical, either.

Mash-ups are ethical only as long as they provide more money in the pockets of
the people you are stealing content from. If they do that, fine. If they
aren't, your mash-ups are not ethical, and neither are Google's..

~~~
swombat
No, you're missing my point completely. I'm saying that it doesn't matter
whether or not you line the pockets of the people who produced and published
the initial data on the web. If it's published on the web on a public site, it
is available for anyone to use.

Particularly when it's factual data, rather than, say, an article. You might
recall that the copyright acts do _not_ protect factual data.

~~~
robotrout
Sure. That's true. Copyright doesn't protect factual data.

But there is what's legal, and there is what's ethical. If I traveled around
the country, measuring the heights of roller coasters for my website
rollercoasterheights.com, I did it to get people to come to my site. If that
information is harvested from my site and displayed elsewhere, I've done a lot
of work for nothing.

~~~
swolchok
Perhaps next time you'll think harder about the likely reward for your work.

------
martythemaniak
Seems Eric has kinda missed the point. From what I've read, Wolfram Alpha is:

\- get a bunch of structured, verified, curated data \- use mathematica to
understand and reason about the data \- use NLP to expose mathematica to the
web surfer.

What google wants to do is get a bunch of structured data from unstructured
data. Great, but competition for Wolfram Alpha? Doesn't look like it right
now.

But I guess we'll have a better view of this in a few weeks.

~~~
smhinsey
he didn't miss the point, that is contemporary "journalism." in other words,
the story is skewed towards the most controversial possible angle.

it beggars belief that, after dozens of stories on the topic, each with its
own universe of commentary, most of which pointing out basically what you've
said, that they are unaware that their premise is flawed.

------
calambrac
_Turning the Web into a giant database will crush any attempt to segregate the
“best” information into a separate database so that it can be processed and
searched more deeply._

This is pure speculation, and I personally doubt its true. First, much of the
content of on the web amounts to echoing, filtering, and distorting high-
quality source data. If you can tap that source data, are the gains you
realize from what's been derived from it worth the extra cost of going out and
doing the work to acquire it and then separate out the noise?

Second, there is a lot of real or perceived value in knowing where your data
is coming from. Tons of companies pay tons of money to get data and statistics
from sources that they can trust. Until Google's willing to vouch for these
results beyond "we tried", the bespoke curatorial approach is going to keep
capturing these dollars.

~~~
robryan
I think it can rival a structured dataset if it is done correctly. The big
advantage that google has, and I know there was a paper they put out about it
a couple of months ago, is that they have access to such a large source of
data that they can use a great deal of sources to weed out the noise by only
using the stuff that appears over many sources.

Then you could also assign sources a reliability rating based on how acurite
the information they provide is compared with other reliable sources.

Kind of like a new pagerank but for data integrety.

~~~
calambrac
Don't get me wrong, I think it's very cool that they're doing this kind of
work, and I hope it takes off. What I was really responding to was the idea
that this approach would "crush" a more traditionally curated database. At
least in the short term, I don't see that happening, because the number of
really high-quality data sources is still pretty tractable, because a lot of
what's on the web is just going to be echoing/distorting those primary sources
anyways, and because trust is such a huge part of that market.

~~~
ErrantX
The issue is though that Wolfram is not going to provide the mudane sort of
data that people want in the instant way Google are going to claim to do it.

Search camera on Wolfram and I expect you get lost of data on the history of
camera's and other such stuff. Google seem to be offering to provide a list of
camera models with some pertinent data for each... I suspect the latter will
be more "useful" (especially as Wikipedia would probably be fairly reliable
for the camera background...)

I think your point is valid: a curated database _will_ be the #1 source for
info on camera history and facts. Unfortunately that means Wolfram is
competing with Wikipedia not Google... and that is probably even worse for
them :(

~~~
calambrac
You make a really good point. For more consumer-oriented data, a more casual
level of accuracy is probably acceptable, at least acceptable enough to hang
ads off of. I had my head stuck in the world of data that people are actually
willing to pay for directly.

------
snprbob86
1) This looks awesome. I can't wait to try it!

2) The interviewers are surprisingly immature and unprofessional. They are
downright rude to the demonstrator and are clearly ignorant about the
amazingly cool technology that they are privileged to be seeing.

~~~
skorgu
They seemed more concerned about user interface decisions and whether it would
be launching at IO than the actual technology behind it. I suppose that's a
valid angle, it's just not one that appeals to me. And yes, they were quite
rude.

------
JacobAldridge
At a first glance, this seems to help take 'Google-fu' to the unwashed masses.

If I ever do a search as generic as 'camera', it's very closely followed by a
more detailed search (eg, camera olympus "flash time"). Usually, however, my
first search is the most detailed, and I become more vague if I require more
results.

As I'm sure most of us are aware, this helps quickly deliver the outcome /
answer we seek from searching, but I'm continually surprised at how many
people bang a series of vague search phrases into a Search Engine and spend
time sorting through the chaff.

Google Squared seems to prompt the sort of thinking I either do before hand,
or as an immediate result of seeing 8.6M results. Google-fu enhanced.

(My favourite evidence of this is from Allyn Gibson's blog, which is routinely
located by people Googling the phrase "things that happen on my birthday".
Think about it, or read #5 on this list <http://www.allyngibson.net/?p=1686> )

------
huhtenberg
I am getting a strong feeling that Google is genuinely concerned about Wolfram
Alpha, which makes latter that much more interesting. In the end Google's
hyperventilation may create an exact opposite of the effect they are going
for.

~~~
snprbob86
I get the strong feeling that this Google vs Alpha silliness is a
manifestation of the internet media.

Google is likely carefully studying their competitors continuously and taking
prudent actions as necessary. This product doesn't seem like an answer to
Alpha as much as it does an interesting 20% project growing up.

~~~
huhtenberg
I was not referring to the media coverage. I was referring to Google's
practice of piggybacking on others' PR efforts -
[http://www.marksonland.com/2009/04/google_likes_to_steal_oth...](http://www.marksonland.com/2009/04/google_likes_to_steal_others_t.html).
In Wolfram's case they did it twice in rapid succession. _That_ is what's
interesting.

~~~
ErrantX
Perhaps that is not a case of worrying about WA: but using clever tactics to
get their new products into media focus.

It matters not whether the "New Stuff" from Google is a competitor or reaction
to Wolfram or completely unrelated. What matters is the media _thinks_ it is
competition and generates all this Google vs. Wolfram media hype :) Wolfram
get some sympathy as the underdogs but ultimately it just pushes Google brand
(they still just hold the public opinion of not being evil: so if you see a
Google vs X discussion you imagine it is Good vs. Good battle and that both
products are OK).

It's a tactic Google have always used - often to superb effect :)

------
aston
The video demostrates some decidedly non-Googly slowness. If it takes 10+
seconds to load, I don't think very many people are going to be using it...

~~~
derefr
Hypothesizing: it's doing things which require decidedly non-BigTable-like
architecture. Most other Google apps in their alpha stages can just take
advantage of running on, basically, App Engine; this seems like one of the
ones that can't, so it has to run on a separate architecture that hasn't been
scaled yet. That would be alright for some _other_ company's demo—if they were
anyone else, they'd just use limited, cached, pre-defined data sets. But
they're Google, they have the raw data, and they wanted to show off the
possibilities of working "from scratch" on anything they like in realtime.
Thus, it was similar to an app without any scaling consideration as-of-yet
(consider a normal startup webapp) drinking from the entire search index
firehose on each query. Ten seconds is _impressive_.

Once it's deployed across Google's entire cloud it'll speed up nicely.

------
rms
Like Google isn't going to buy/license Wolfram Alpha...

~~~
dbul
You would have thunk Google would have bought loopt, too.

~~~
rms
They bought a startup in that area a while ago, Dodgeball.

~~~
dbul
What's your point? That they must buy _one_ startup in each area in which they
are interested? They don't have to buy anything.

------
eggnet
I wonder when google will publish the energy requirements of a google squared
search.

