
Performing Google Search Using Python - mkalygin
http://www.geeksforgeeks.org/performing-google-search-using-python-code/
======
tarikozket
I still can't understand how this made to frontpage and received this much
upvotes. How it was difficult to perform a Google search and what golden key
did this tutorial share?

~~~
shaklee3
Previously, one of the problems I've seen is Google obfuscates the source on
the results page. You can't just grep for what you ​want since it's not there.
They did this in a push to​ get you to use their search api, but for some odd
reason, their search API returns different results than a browser. I find this
very useful in that regard.

~~~
gh1
I tried this a couple of months ago. I could easily extract the url of the
result using regex. However, this url was a redirect url with the domain
google.com. At this point, I made one more request to this redirect url,
followed the redirect chain and easily obtained the actual url of the result.

------
nurettin
And google marks your ip as a bot. I did this three years ago in one of my DO
proxies. Google still asks me to confirm a captcha when I do a search.

~~~
bake
Indeed. If you're looking for programmatic web search, I'd suggest you go the
Bing API route.

~~~
charred_toast
I commented earlier on this post about it being impossible for a bot to do
search on Google. Turns out I turned to the same resource you're recommending.
I'd also recommend sending a python bot to your favorite news sites once a day
for updates instead. Same deal, Beautiful Soup. On the other hand, Google has
Custom Search, which is $100 a year for 20k queries. I use that as well.

~~~
bottled_poe
Google CSE seems like the most fitting solution, which appears to do pretty
much the same thing as the article, but without all the hackery nonsense:

Step 1 - Setup a CSE to search entire web:
[https://support.google.com/customsearch/answer/2631040?hl=en](https://support.google.com/customsearch/answer/2631040?hl=en)

Step 2 - Use the CSE API: [https://developers.google.com/custom-search/json-
api/v1/over...](https://developers.google.com/custom-search/json-
api/v1/overview)

------
happy-go-lucky

        Traceback (most recent call last):
          File "0a51cb60301828feeaf66cbc908297ae.py", line 9, in <module>
            for j in search(query, tld="co.in", num=10, stop=1, pause=2):
        NameError: name 'search' is not defined
    

Edit: Maybe imports don't work.

~~~
justinsaccount
You missed the

    
    
      Output:
        No module named 'google' found
    

below.

If they cut out this nonsense:

    
    
      try:
        from google import search
      except ImportError: 
        print("No module named 'google' found")
    

and just used

    
    
      from google import search
    

One would have gotten a sane error message:

    
    
      Traceback (most recent call last):
        File "00359ada2bfcd6dbabd1fa0207e683b8.py", line 1, in <module>
          from google import search
      ImportError: No module named google
    

catching ImportError when you have no fallback is a pointless thing to do.

~~~
ehsankia
First thing that jumped at me when I was reading the code.

Why would you wrap an import in a try catch unless you want to import
something else when the first one fails.

And if you don't, you should exit ones the import fails rather than let it
continue and hit inevitable failure.

~~~
philipov
Maybe it was written by programmers trained on compiled languages with no
interpreter to automatically catch all exceptions and print a traceback for
you, so now they're cargo culting onto "Every exception must get caught to
avoid undefined behavior."

------
whoami_nr
There is also this[1]. Used it around a year ago. Not sure if it still works
though.

[1][https://github.com/rnikhil275/pygoogle](https://github.com/rnikhil275/pygoogle)

------
charred_toast
This is good work. I thought there was no way to do this since Google Search
blocks bots.

~~~
slackingoff2017
They let some obvious bots through if you go slow enough. A lot of SEO and
AdWords tracking companies still do unofficially tolerated automated google
search.

I assume allowing some low amount of obvious botting prevents people from
developing really sneaky bots that are much harder to block. It's probably to
prevent an arms race that google may not be able to win.

~~~
i336_
What an incredibly good point.

That gives me hope, as someone who wants to do low-frequency data analysis
that's only possible via multiple queries.

~~~
charred_toast
I redundantly made the following comment upon referring to this comment later
in this thread:

Try Google Custom Search as well, $100 a year for 20k queries; I'm a client.

~~~
slackingoff2017
Good evening, I regret to inform you that Google custom search is dead.

[http://fortune.com/2017/02/21/google-site-search-
discontinue...](http://fortune.com/2017/02/21/google-site-search-
discontinued/)

The good news it's that it's reborn as an ad support version. Google doesn't
want your dirty cash when your competitors are willing to pay much more for ad
placement on your site.

~~~
primozk
Google Site Search is dead, but Google Custom Search is not.

------
PokemonNoGo
If this was a post on the best practices to bot google I would understand it
being frontpage material. Does anyone have any insight into this kind of
botting?

