

Greplin (YC W10) Search Engine Tackles Facebook, Twitter - danicgross
http://blogs.wsj.com/digits/2011/02/16/greplin-search-engine-tackles-facebook-twitter/

======
vladgur
This whole indexing business worries me. They actually store my data
internally to provide these "instant search" capabilities. Im pretty sure this
goes against terms of API use of many data providers. Linked in for instance
says the following at <http://developer.linkedin.com/docs/DOC-1013>

"3.4 Data Storage and Conversion. You may not store or cache any Content
returned or received through the APIs, including data about users, longer than
the current usage session of the user for which it was obtained, except for
the alphanumeric user IDs we provide you for identifying users, unless and to
the extent that such storage or caching is expressly allowed in the Platform
Guidelines. You may store the alphanumeric user IDs we provide you
indefinitely unless we terminate your use of the APIs for breach of these
Terms. The restrictions of this Section do not apply to “Independent Data,”
which means data that users provide directly to you, provided that you cannot
convert data received from the APIs to Independent Data (e.g., by obtaining it
from the APIs and asking the user for permission); Independent Data must have
been separately entered, uploaded, or presented to you by the user of your
Application."

Basically Linked disallows storing data directly or
converting/hashing/indexing it and then storing it. It only allows storage of
user ids.

Yet greplin is getting away. I suppose they pay for special data licensing.

~~~
ntoshev
I don't know what Greplin is doing, but indexing alone doesn't imply they
cache your data somewhere. Although the index contains pieces of your text,
you can't reconstruct it from the inverted index alone.

~~~
mychacho
From usability standpoint, they have this google-instant-search-esque thing
going where they produce instant search result with text as soon as you start
typing. So unless they are hitting the APIs of every provider you authorized
to them and searching through them in near real time, they are storing the
data.

------
zackola
This is what Google should have been building instead of Buzz or Wave.

------
iterationx
>>For Google, developing such a service could be a challenge, in part because
it likely wouldn’t get the same access to users’ Facebook accounts that a non-
competitor startup has

If Facebook would block Google from this, then they should stop Greplin,
unless we are so naive as to think Google won't buy Greplin.

"The best thing that would happen is for Facebook to open up its data," Mr.
Schmidt said. "Failing that, there are other ways to get that information."

~~~
justin
Facebook could block Greplin upon an acquisition by Google.

Allowing Greplin to get Facebook data by using the Facebook API (which it
does) is much different than allowing Google to get EVERYONE'S data by
spidering. One involves consent by the user, and provides search only for that
user. The other involves no consent and provides access to user data over a
publicly available interface.

