
Improvements to searching for special characters in programming languages - TheQwerty
https://blog.google/products/search/improvements-searching-special-characters-programming-languages/
======
Animats
I can now find the C+@ programming language. So it's not heavily special cased
for common programming languages.

Google Code Search (2006-2013) [1] was more useful. I miss that. Its search
allowed regular expressions.

[1]
[https://en.wikipedia.org/wiki/Google_Code_Search](https://en.wikipedia.org/wiki/Google_Code_Search)

~~~
kristianp
It doesn't seem to work perfectly. Doing a verbatim search for "C+@"
programming language produces a lot of results without the "C+@" on the page.

------
TACIXAT
This is great. I feel Google has slowly become too user friendly. My mobile
results are always way less technical than my desktop results. If I'm in the
car (passenger) and want to look up a problem I'm having while programming, I
get mostly related queries that are a simplified version of what I'm looking
for.

I really believe that the technical crowd drives what becomes popular (app
recommendations for family and friends). I feel a lot of the "Google hacking"
queries have become less obvious and the search bubble stuff was getting
bothersome. This is definitely a step in the right direction. Hopefully I'll
be a little less frustrated with results in the future.

~~~
johnfn
Google tailors its results to the kind of person it thinks you are. For
example, if you immediately search "python", you will get results about
snakes. But if you search for programming first, and then python second, it
will now give back programming results on the second search. This continues to
apply if you searched "programming" last week.

This behavior is actually very nuanced and impressive to watch, once you
understand what's going on.

I don't think google is becoming more user friendly at the expense of being
technical. It certainly isn't for me. What your problem sounds like is that
it's built two separate profiles for you - one of which is what you're likely
to search of desktop, and the other for what you're likely to search on
mobile.

~~~
trishume
That specific example isn't actually true. Even with no prior history if you
search "python" the first result is the programming language.

That's because Google is smart and despite the fact that more people know of
Python as a snake, when somebody types just "python" into a search query, it's
almost definitely true that they mean the programming language. Few people
Google for types of snakes.

A similar thing is true for "ruby" and "rust".

------
binarymax
To make this change at Google's scale is a triumph. Even dealing with this
type of tokenization on our vastly smaller document set can be challenging.

~~~
chimprich
Genuine question: why should this be any more difficult than searching for any
other type of character? I've long found it hard to understand why Google is
so bad at searching for non-alphanumeric characters.

~~~
binarymax
When indexing documents (and querying for them), there is a process the terms
go through, to split them up and then normalize them so they can be found
easier. A trivial example is you want to find "can't" when someone searches
for "cant". Typically special characters are removed for several reasons: the
vocabulary of terms becomes smaller and saves space and time, you can ignore
punctuation (like searching things that jut against parenthesis), you can
remove accent marks and diacritics, and a host of other things.

This is hard because (a) you have to have contextual awareness of punctuation
in certain places, like the '&' character in john&jane vs the logical &&. (b)
your vocabulary of terms becomes larger - which is probably not a big deal for
most folks but if you are Google then a 0.0001% increase in the vocab is a
killer in space.

\--EDIT-- The vocab increase is probably not as much as I noted above - but
even adding a dozen terms can have an impact at Google's scale.

~~~
chimprich
OK, thanks - that sounds plausible on the face of it, but why wouldn't you
store special characters and then ignore them when matching patterns? You
could then make an exception for strings in quotes (or some other option for
activating a more precise search).

Maybe Google hasn't previously thought the extra space/complexity was worth
the special treatment but given the relative quantity of data they already
index and the usefulness of this feature I'm surprised.

~~~
nostrademons
[ex-Googler, used to work on search, this issue came up repeatedly during my
tenure then].

The storage cost was prohibitive. Search engines rely on a data structure
known as an inverted index; it's basically a list, for each token, of every
document that contains the token, and for a context-aware search engine like
Google it usually contains the position within the document of the token as
well. Single-character punctuation marks like periods, commas, parentheses,
dashes etc. appear in literally every sentence. That means that the inverted
index for periods or commas would have to contain an entry for literally every
single sentence on the web.

There's a similar problem for common words like 'a', 'the', prepositions, etc,
but these are usually already solved by stopwording.

That's why this announcement only covers groups of punctuation with 2-3
characters. These don't appear in ordinary text, and so you can generate
posting lists for them that are reasonably-sized. (I suspect that the
economics of the index have changed as well, making storage costs cheaper, but
this work happened after I left and so I don't know details.)

------
tyingq
Meanwhile, code searching at GitHub completely ignores characters like =, $,
{. And, it's case insensitive. Argh.

~~~
mxstbr
It's the most frustrating "feature" I've ever seen. GitHub, the platform for
hosting code, has a search function that doesn't work for code. How does that
make any sense?!

Fixing that seems like PM101 material, yet here we are in 2017 with this still
being a thing...

------
kolemcrae
Only very slightly related:

When I was a teenager I made music under the name shark^^bait

The ^^ is what made it stand out from others.

The issue is there is no efficient way to search for that phrase with the
special characters.

I have no idea if I can still find the absolutely god awful music I made back
then.

Using the phrase match in google just searches for sharkbait which doesn't
help at all.

It doesn't help that years later a little movie called Finding Nemo came out.

------
rspeer
This will be extremely helpful next time I have to use a Haskell library that
decides to implement everything as infix operators named "~<$>" and ".~=" and
stuff.

~~~
tomsmeding
Hoogle is the way to go, man.
[https://www.haskell.org/hoogle/](https://www.haskell.org/hoogle/)

~~~
rspeer
I know Hoogle exists, but that just searches one kind of documentation.
Despite the cute name, it's not Google. You can't Hoogle an error message and
see if anyone else got it.

------
lamida
Usually I use [http://symbolhound.com](http://symbolhound.com)

------
macintux
Catching up with DuckDuckGo?

~~~
james2vegas
needs to index perlvar for this; very few hits on anything from there and
those that did the results that come up are for bash only

it's google though shouldn't be surprised

------
AaronFriel
I am a bit sad that no Haskell results show up in this search:

">>= operator":
[https://www.google.com/#q=%3E%3E%3D+operator&*](https://www.google.com/#q=%3E%3E%3D+operator&*)

But it's a sight better than it was before. It actually shows meaningful
programming language results. And if I call the operator by it's Haskell name
at the same time, I get very good results:

">>= bind":
[https://www.google.com/#q=%3E%3E%3D+bind&*](https://www.google.com/#q=%3E%3E%3D+bind&*)

Or just the language name:

">>= Haskell":
[https://www.google.com/#q=%3E%3E%3D+haskell&*](https://www.google.com/#q=%3E%3E%3D+haskell&*)

~~~
kyrra
For your first search, I see "Operator Glossary - Haskell Lang" as the 9th
result.

~~~
AaronFriel
Ah, Google's personalization of searches. Here's what I see:

Operators in C++ - TutorialsPoint
[https://www.tutorialspoint.com/cplusplus/cpp_operators.htm](https://www.tutorialspoint.com/cplusplus/cpp_operators.htm)

Operators in C++ - Learning C++ in simple and easy steps : A beginner's
tutorial ... Right shift AND assignment operator, C >>= 2 is same as C = C >>
2\. ‎C++ Loop Types · ‎Conditional operator · ‎C++ Pointer Operators ·
‎Increment operator Assignment operators - JavaScript | MDN
[https://developer.mozilla.org](https://developer.mozilla.org) › ... ›
JavaScript reference › Expressions and operators

Feb 3, 2017 - An assignment operator assigns a value to its left operand based
on the value of its right operand. ... Right shift assignment, x >>= y, x = x
>> y. Operators in C and C++ - Wikipedia
[https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B](https://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B)

This is a list of operators in the C and C++ programming languages. All the
operators listed exist in C++; the fourth column "Included in C", states
whether an ...

Right Shift Assignment Operator (>>=) - MSDN - Microsoft
[https://msdn.microsoft.com/en-
us/library/y9h99e01(v=vs.100)....](https://msdn.microsoft.com/en-
us/library/y9h99e01\(v=vs.100\).aspx)

Using this operator is almost the same as specifying result = result >>
expression, except that result is only evaluated once. The >>= operator shifts
the bits of ...

<<= Operator (C# Reference) - MSDN - Microsoft [https://msdn.microsoft.com/en-
us/library/ayt2kcfb.aspx](https://msdn.microsoft.com/en-
us/library/ayt2kcfb.aspx)

Jul 20, 2015 - except that x is only evaluated once. The << operator shifts x
left by the number of bits specified by y . The <<= operator cannot be
overloaded ...

C# Operators [https://msdn.microsoft.com/en-
us/library/6a71f45d.aspx](https://msdn.microsoft.com/en-
us/library/6a71f45d.aspx)

Jul 20, 2015 - x >>= y – right-shift assignment. Shift the value of x right by
y places, store the result in x , and return the new value. => – lambda
declaration. -= Operator (C# Reference)1 - MSDN - Microsoft
[https://msdn.microsoft.com/en-
us/library/d31sybc9.aspx](https://msdn.microsoft.com/en-
us/library/d31sybc9.aspx)

Jul 20, 2015 - except that x is only evaluated once. The / operator is
predefined for numeric types to perform division. The /= operator cannot be
overloaded ...

What does this ">>=" operator mean in C? - Stack Overflow
stackoverflow.com/questions/17769948/what-does-this-operator-mean-in-c

Jul 21, 2013 - unsigned long set; / _set is after modified_ / set >>= 1;. I
found this in a ... The expression set >>= 1; means set = set >> 1; that is
right shift bits of set ...

java - What does "|=" mean? (pipe equal operator) - Stack Overflow
stackoverflow.com/questions/14295469/what-does-mean-pipe-equal-operator

Jan 12, 2013 - |= reads the same way as += . notification.defaults |=
Notification.DEFAULT_SOUND; .... 2 <<= Left shift AND assignment operator C
<<= 2 is same as C = C << 2 >>= Right shift AND assignment operator C >>= 2 is
same as ...

C++ Operator Precedence - cppreference.com
en.cppreference.com/w/cpp/language/operator_precedence

Oct 12, 2016 - Precedence, Operator, Description, Associativity. 1, :: Scope
resolution ... For relational operators > and ≥ respectively. 9, == != For
relational ...

------
doall
Fantastic! As a Lisper and Clojurian, I can say that this really helps
beginners to search for reader macros.

------
ino
Any tips on searching for C and not C++ or C#? (google or ddg)

------
hashhar
Long due considering DuckDuckGo is quite developer friendly.

~~~
TheGrassyKnoll
Agree. On DuckDuckGo just do:

<your programming problem> !so

Takes your search directly to Stack Overflow

If you don't like the results, try it again with !g and your search is
submitted to Google.

They've got 9000+ bangs now:

[https://duckduckgo.com/bang?q=](https://duckduckgo.com/bang?q=)

