

Python Library for Google Sets - pkrumins
http://www.catonmat.net/blog/python-library-for-google-sets/

======
rarestblog
Just a heads up for guys that are new to this: seeing that this is scraping -
you can get your IP _blocked_ from Google.

~~~
pkrumins
That's true. But you can always use anonymous proxies.

------
andreyf
Sorry for nitpicking, and although I'm not sure about Latvia, (where Peter
lives), in the US "no license whatsoever" certainly != "you are free to do
absolutely anything you wish with the code".

~~~
pkrumins
Thanks for the comment.

What I have in mind is that there is no license, none, it's public, free code.
Doesn't belong to anyone. Anyonce can take it and use it any way they wish.

What do you suggest I write instead to make this point correctly?

~~~
cdibona
Honestly, pick a proper open source license like apache to protect yourself
(no-warranty clause) and your users (explicit patent and copyright grant).

Also, as you are latvian, under Wipo, you are essentially unable to give
something away without a license due to the moral rights clause. I mean you
can, but no one has a right to use it unless you explicitly grant it. You
cannot disclaim rights and place it into the true public domain.

Also, scraping is bad, m'kay.

~~~
andreyf
_scraping is bad, m'kay_

To the contrary, I think scraping is part of the original smalltalk OOP
vision, where objects are entities universally accessible, without limitations
on who/what/how they are being seen/edited.

~~~
seiji
I think he outranks you when talking about scraping google contents:
<http://en.wikipedia.org/wiki/Chris_DiBona>

------
jamesk2
This is cool but is there a function for finding hidden topics? i.e. given
red, yellow, blue -> colors

~~~
pkrumins
It doesn't have, and I can't think of an easy way to group them this way.
Perhaps I could cook something up, like given "red, yellow, blue", do a Google
Search (with my library) for each "red", "yellow", "blue" and find the
surrounding words. Then count the frequencies of surrounding words, do another
search at Google Sets for "red, yellow, blue", remove all those items from
frequency list, and then output the most frequent word. It might be "color" or
"colors". Perhaps also de-pluralize the words so that "colors == color". Just
a quick idea.

mmm... might actually try it out, if it works, will write an article about it.

~~~
jamesk2
That may work but there's going to be so much noise in the population how do
you filter it to get "colors?"

There's a google vid on this subject: I never did get through it but it might
make more sense to you: <http://www.youtube.com/watch?v=vgqWMGT9haY>

~~~
pkrumins
Thanks, will watch that one!

