Hacker News new | past | comments | ask | show | jobs | submit login
What people want (according to Google Suggest) (lkozma.net)
126 points by lkozma on July 13, 2010 | hide | past | web | favorite | 66 comments

Google says they "apply a narrow set of removal policies for pornography, violence, and hate speech."

What seems incongruous with this to me is that they also filter many (non-explicit) gay-related searches.

For example, typing "is my son" yields:

is my son autistic? is my son smoking pot? is my son ready for kindergarten? is my son gifted?

Doing a google trends search for these four phrases along with "is my son gay?" shows that the gay query is dramatically more popular: http://bit.ly/aPNHxO

It seems like in some of those cases, knowing that others are searching the same thing could provide some comfort. I'm not sure why Google filters these queries.

Yeah, but the removal policies are regarding the suggest functionality, not the actual search engine, so I don't think this can be considered censorship to any extent.

Like even if a lot of people were searching it, you might not want to be inadvertently suggesting morally questionable queries to people who don't want to see them.

As such, it seems like they err on the side of caution with it. Like there are no suggestions for the word "fuck" despite the fact that it is a wildly popular query.

Their algorithm probably just ends up tagging "is my son gay" as a query that people can enter on their own if they want to search it.

I think they filter results that aren't themselves hate speech but that have hate speech pointing at them or contained within the SERP. I could see anti-gay sites that are considered hate speech having content about "Is my son gay" and that contaminating the suggestion results.

If that's the case, it should be fixed by tweak in the ranking algorithm (rank informative sites higher than junk filled with hate), not by filtering out query suggestions.

That's a very dangerous line to cross - ranking search results by, essentialy, ideology?

I think it would be appropriate to rank objective(ish) information about a topic higher than extreme opinions about the topic.

But what's objective? The best way to marginalize an opposing viewpoint is to paint them as extremists who shouldn't be taken seriously.

A second best method is to think hard of what problems other people might have that you can solve. Let's do [this] by seeing what people are searching on the internet

That's an interesting idea and I found the results intriguing. However, the search results themselves also need to be taken into account.

For example on the search term "How can I have a blog". The author's opinion is... "These suggest that many people are still left out of technology because they find services too complicated. Need for even simpler blogging, site creation, etc. platforms?"

I disagree with this view since it presumes people already know the major blogging platforms before they search. The search itself brings up Blogger and Wordpress as the first two hits which may have satisfied users.

Very good point, I agree. However, I also meant that even the way the questions are formulated "how can I do a blog", "how can I blog on facebook" betray a lack of basic understanding of these technologies. Perhaps not only simpler platforms are needed, but better offline education about these topics as well (via school, traditional media, etc.)

I'd say that language barriers also play a role. I've met plenty of educated, non-native, english speakers who would say "How can I do a blog".

That queries display a lack of understanding shouldn't be too surprising. I suspect people often search Google when they don't know about a topic. If my first search were "How can I do a blog", my second may well be "What are the differences between, Blogger and Wordpress".

If people are asking about blogging on Facebook I'd be interested to know what they learned from the results of the search. i.e has their understanding increased? If so, then I'd argue we don't need further education.

Overall, I'm suggesting that perhaps Google is already solving these people's problems by connecting them with answers (to some extent).

These terms are ranked by # of results, not search frequency! Nothing can be gleamed except the total popularity of the individual words and phrases as they are published on the web.

http://www.google.com/search?q=how+can+i+be+on+made About 25,280,000,000 results

http://www.google.com/search?q=how+can+i+make+a+site About 2,210,000,000 results

Both match up with the suggest data.

The words at the top of the list are generic: (will, free, help, made, etc.)

As you go down the list, the words become less generic: (dream, market, people, speed, password, etc.)

You would be better off looking at Google AdWords data or http://google.com/trends

I added the following correction to the article:

CORRECTION: as pointed out by users Adrian and Bones in the comments below and by Mark Fulton and apollo on the HN thread, I made an error in interpreting the data. The numbers given by Google for each query are not telling how frequently a query was asked, but rather, it seems, they are related to the popularity of the words and combinations of words (from the query) in a large corpus of website data. This way they are still related to how popular these words and groups of words are, however, not as queries, but as the text of websites. According to the Google Suggest help page, the suggested queries are all real queries, i.e. they have been issued in this exact form by someone, which in my opinion still makes the list below an interesting read, however my interpretation of the order as pure query frequency was wrong. I thank everyone for the feedback, and apologize for the misunderstanding. I leave the text below unedited, but consider this correction as you continue.

You are absolutely right.

What confused me was this paragraph from the Google Suggest page:

" Google Suggest returns search queries based on other users' search activities. These searches are algorithmically determined based on a number of purely objective factors (including popularity of search terms) without human intervention. All of the queries shown in Suggest have been typed previously by other Google users. "

I put a correction on the page, please let me know if you wouldn't want your name to appear there.

However, I disagree that nothing can be learned from the data, because all these are real queries that have been issued multiple times. At the same time, I admit that the ordering is totally different from what I understood it to be.

It seems that when you use Google Suggest, the 10 suggestions are ordered differently than this number (the # of results) would indicate, perhaps that is the true popularity of a query, however, those partial orderings are insufficient to reconstruct the full ranked list of all queries.

Thanks again for pointing out the (major) error in the article.

>People who search in "how can i ..." type of questions are probably not a representative sample of the whole population. My guess is that they are less tech-savvy users on average.

Hmm... I dunno. I search this way all the time. Mainly because I assume a lot of other people do as well and, more importantly, it works. Am I the only one?

I'll normally word a query differently until I find answers (ie I'll try "how to skin a cat," and if that gives no good result I'll try "skinning a cat," and then "how do I skin a cat" etc until I find the answer)

if that fails, if the question is related to software and/or business I will do site-specific searches at HN and lifehacker (eg site:ycombinator.com skinning a cat).

Sometimes I'll do this in reverse.

I have found that in most business settings I'm the only one who will search for things with this effort. Most people around me give up after a query or two, and then think I'm a genius when I find the information we wanted.

From mobile device, excuse typos etc

Tech-savvy people know that Google eliminates stop-words and uses stemming anyhow, although their algorithms for doing that aren't public, so ...

    lose weight
... is a lot more efficient to write than ...

    how can I lose weight
And while not yielding the same results, I think the first yields better results (in this case).

Then you start adding words for refinement ...

    lose weight safe
See a pattern? ... It's a lot like adding tags, instead of formulating real phrases.

But... the results for "lose weight" and "how to lose weight" are not the same. Nor are the results for "earthquake" vs "what is an earthquake". In both cases if I'm after the latter then it is the more verbose query that yields the better results.

stop words/phrase removal can't be done in a too simplistic manner. words like <how to> signify a different kind of intent. Also if stop words are always removed, queries like <who's who in america> wouldn't work properly.

I am curious also, I was just guessing, I hope I didn't insult anyone. For topics where you can expect to find the exact same question on Q&A sites and forums this is a good approach. It seems though that people who are not familiar with how search engines work often type in questions, as if they are asking an oracle.

WOW... I believe this is highly inaccurate. Maybe in the old days was it naive to type a question into a search engine. (until ask jeeves came along) but with the rise of Q-A sites this is a very valid way to find a question that has been asked before. Certainly it depends on the question you are asking but I think it's quite a false generalization at this point.

I don't search this way normally, but if I'm struggling to find what I want I will resort to this sometimes (in the hopes that a suitably titled blog post will appear somewhere).

154 how can i get more facebook friend

180 how can i get fans on facebook

Maybe someone should create lots of fake profiles to 1) sell friend requests to losers who want to pad their friend list 2) sell fans for people's pages

EDIT: Actually, maybe you can sign up other Facebook users to sell themselves as friends/fans and take a cut. I'm not sure how would you would track it though.

There was a company who did that, I think months or years ago. Not sure if they still operate.

Wow. I have a new appreciation for disambiguation as a competitive advantage. I know search queries are inexact, but I didn't realize that they might be entered billions and trillions of times

    130 how can i work with google 1150000000
Do they want to be employed by Google, learn about SEO, or be a supplier of Google's?

    166 how can i get some money 1020000000
Do they want to earn money, win money, or receive a wire transfer?

The just want to get some money!

This is a little scary ...

Here are the current Google Suggest phrases on Google Canada for 'how can I':

how can i kill my baby

how can i make my breasts bigger

how can i lose weight fast

how can i lose 10 pounds in a week

how can i watch hulu in canada

how can i get my boyfriend hard

how can i get pregnant

how can i keep from singing lyrics

how can i download youtube videos

how can i tell if i am pregnant

I often use the search term 'how can i ...', and I've noticed the 'how can i kill my baby' search phrase many times over about the past 6 months. It comes and goes from the Suggest list. Very strange.

EDIT: formatting

Odd that killing the baby is first on the list, yet further down there are at least 4 other suggestions directly or indirectly leading to having that baby.

If the Canucks are smart, that "how can I kill my baby?" search term is a honey pot.

Alternative explanation: The search is performed by new parents looking for tips on what NOT to do when caring for their baby.

Another alternative explanation: Pregnant women/teens who don't want to carry the baby to term.

Many of the questions that confused the author are down to language or cultural issues I think.

A "C Form" is a common name for a tax form in several countries, including India. Also short for concessional form.

CA is likely Chartered Accountant, an important accountancy qualification in several countries.

How can I get office/windows/etc for free: If you're new on the internet, it's going to be difficult to understand how such expensive software is ubiquitous. Office in particular costs 6 months salary in many places of the world (G8 countries: can you imagine spending $20k on an office suite?) Plus of course there are both valid, and copyright-infringing methods of getting these products for free.

How can I view a street? Google Street View - a valid question if you've seen/heard of it but don't have the name.

how can i copyright a website: It seems almost no-one understands copyright well, and copyright notices are everywhere on websites (despite copyright being automatic)

how can i get the new facebook 2010: related to phased rollouts of new features, or some scam or hoax, or both.

what version of windows 7/ office 2007 do I have? They mean which edition, Basic/Ultimate etc. Not a stupid question at all.

how can i start share business?: How can I start investing in stocks/shares.

how can i tell how tall my son will be: It's actually easy to calculate a fairly accurate prediction from the parent's height, and height is pretty important in many cultures.

I wonder if mining it regionally and seeing the unique searches will tell us more about a culture than years of on the ground anthropological observations. Of course, it assumes that people are kinda well off in the country and a sizable majority uses the internet.

Further, this is something that networks would kill for. Most people google the news they want to read. So, they could manage their editorial content better with this.

This data can have so many awesome uses...

P.S. - This comparison between Google USA and Google India is kinda right on the dot http://i.imgur.com/FPk2M.jpg Everyone I know uses a computer for one thing; porn and more porn. I think that this is a symptom of an extremely sexually repressed society. [edit: I am sorry if that sounded like a generalization, but what I intended to say is that from first hand experience and abnormally large amount of people are fixated over porn. It's kinda understandable with teenagers, but grown men and women? I know this stuff since I am sorta an unofficial tech support, so I come upon GBs of stuff that rattles the hell out of me. It might be personal bias, but I genuinely think that porn is more prevalent here due to social norms than a more liberal country kinda like Victorian england.]

[edit: I am sorry if that sounded like a generalization, but what I intended to say is that from first hand experience and abnormally large amount of people are fixated over porn. It's kinda understandable with teenagers, but grown men and women? I know this stuff since I am sorta an unofficial tech support, so I come upon GBs of stuff that rattles the hell out of me. It might be personal bias, but I genuinely think that porn is more prevalent here due to social norms than a more liberal country kinda like Victorian england.]

Nah, you're completely right. Me and my friends have a joke: nobody has sex in India; we're all children of God. I find it unbelievable that kids at my college never discuss sex.

Edit: formatting.

It is so hard to explain this to people who haven't grown up down here, but I was still wrong. We can't make generalizations at all.

Yes India is a society where menstruation is a taboo. It is a "dirty thing" that girls do, or something to most men. Live in relationships are against god. Women have to be "pure" until marriage. Men can do whatever the fuck they want as long as they aren't gay or effeminate, of course.

Ah yes, the joys of indian "society", but do you know something?

I cherish people who do not live by these standards.

When you meet someone over here who treats women as human beings and considers live in relationships to be normal then you have found an outlier, and they tend to be pretty amazing folks.

I doubt filtering is set the same for both search results. Or else there is some other lurking variable I'm not thinking of. Just doesn't seem true that Indians are anymore into porn than any other people.

It is like that statistic about prostitution in the Victorian era. Most men lost their virginity to prostitutes.

I am not saying that Indians are more into sex than other populations in the world. I am remarking on how it manifests itself. I am talking about direct observation over here. I walk down the street and there is this guy who barely knows english who is watching porn on his chinese phone in the middle of the street. And this is not a one off incident.

I might have a skewed view point but I honestly believe that cultural impact works out in some pretty fascinating ways.

Did you make sure personalization in Google India was disabled for you? Just Joking ;)

I do think your statement about porn is an extreme generalization but the differences in suggestions are very interesting.

I am sorry if it sounded like an extreme generalization. It was wrong on my part, but you're from India and you know how this society is setup. It is just interesting that it spills over like this.

P.S. - I might be the only person I know who doesn't like porn. I know it's weird, but I just find it to be undignified in a lot of ways. Oh and I didn't take the screen shots I found it online a while ago.

This is a really clever way to look for business ideas. I've tried doing something similar with twitter, searching for phrases like "I'm looking to buy" and "I wish there was". There's a lot of noise though, and the publicly searchable data set is way too small. This seems like a much better data set. If you pair this with trends or some of the ads traffic data, you could probably figure out what sort of search volume these searches are getting.

Seems like there's a real opportunity to take the tactics that Demand Media uses to generate ideas for articles/videos and apply it more broadly to products (see http://www.wired.com/magazine/2009/10/ff_demandmedia/ for a good article about Demand Media's approach). Instead of trying to generate ideas on what products people might like and test if they actually do (using an MVP or dry testing), instead figure out what they're looking for that they're not finding, and offer that.

16 how can i search in google 2690000000

This one is just curious.

They are probably trying to improve their search keywords

Probably meaning 'how can I better search in Google'. Can the users be trained into making better, more specific queries? At school, maybe?

Believe it or not I learned some tricks I use daily with Google from an otherwise computer illiterate English teacher in high school: using quotes and - to modify how the search works.

I found

46 how can i find my son

55 how can i get my son back

very sad.

What google is actually showing me is much more sad: http://i.imgur.com/sjGqf.png

It matches what is shown to me on http://google.com/complete/search?output=toolbar&q=how%2... (note that it is NOT sorted by the number of queries)

I didn't search or visit anything related to these searches, and even have bocked google analytics tracker. Yet it's all the same on all my browsers, even those which were never ever logged in my google account.

I'm getting the exact same results. Thus, I doubt it's something that's personalized for you. More like generic entries when they don't have anything better to show you.

Check some of the (old) AOL search logs released.


I wonder if at least some of the time #55 refers more to getting him back mentally, as in back to the state before puberty hit.

#46 is indeed very sad.

No, it's in all likelihood the the parent-son bond has been broken and the parent would like to fix that. They are both very sad, and no reference to daughters in the list? Sons may stray further or abandon their parents more than daughters.

> No, it's in all likelihood the the parent-son bond has been broken and the parent would like to fix that

I'd have thought of a divorce gone bad, and a father barred from seeing his son.

I hadn't thought of that but that could easily be the case as well. Very sad also.

I too was struck by the absence of "daughter." I like your explanation better than my knee-jerk assumption that people care more about sons than daughters.

I suppose 46 could also be someone looking for (e.g.) a cell-phone tracking service for parents.

I wish I had found that sad. Thanks to the Jim Rome show all I could think of was http://www.youtube.com/watch?v=XGNW5ltWowA

I found

94 how can i get my hearing back

incredibly sad.

Not that this isn't a cool analysis, but seems as if there's selection bias to these data. Lots of psychographic segments will never start a search with "how can I...".

I think I mentioned this in the bullet points.

This post reminded me how I always was kidding on my coworkers and friends. When working both on the same PC and needed to download some utility. I was going to Google homepage and typing: "Dear Google, please help me. I looking for some information regarding this utility program, called XYZ. Do you, by any chance, know where from I can download it? Thank you very much in advance!"

At first I thought the numbers were number of searches, but they're not. For example, according to AdWords, "how can i be on made" is searched less than 10 times a month globally. The numbers shown are number of results.

You are absolutely right, I added a correction. What confused me, was the wording on the Google Suggest help page and the num_queries parameter name, as rbrcurtis mentioned. Please see my replies to this: http://news.ycombinator.com/item?id=1511824

looking at the xml on the url he is using for this experiment, the tag for the number is "num_queries" which makes me think that it is, in fact, the number of queries and not the results. view-source:http://google.com/complete/search?output=toolbar&q=how%2...

This is like the serious version of http://failblog.org/tag/autocomplete-me/

My favorite recently was "Do Not I...(ron clothes on body)"

What teenage people want.

It's nice to share the list of sorted results -- but why on Scribd in print format?

Found poetry....

35. Too funny. 59 is also hilarious.

And would you look at all those Facebook queries. That alone should have Google shaking in their boots!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact