
Most common phone interview question at Google - jquave
https://crunchskills.com/most-common-interview-question-at-google/
======
asdginioubnou
Both solutions are O(1). The alphabet is finite. Let's say there are M
different characters in the alphabet and N different characters in the string.
If there is a duplicate, it is guaranteed to occur within the first M + 1
characters of the string by the pigeonhole principle. As N grows without
bound, the number of characters that needs to be checked reaches a limit.

(I guess, most precisely, we should define K to be min(M + 1, N). Then the
first solution is O(K^2) and the second solution is O(K))

EDIT: I assumed the first solution would compare each character to the
characters visited so far, i.e.

    
    
        for i in [0..N]
            for j in [0..i]
               if arr[i] == arr[j] return arr[i]
    

Maybe it's

    
    
        for i in [0..N]
            for j in [0..N]
               if arr[i] == arr[j] && i != j
                   return arr[i]
    

That would make it O(KN). Or O(N) if we treat the size of the alphabet as
fixed.

~~~
v_g
Interesting point. However, Alphabet is probably not the best example, Unicode
perhaps? The upper bound in Unicode is ~ 1.1 Million, but still fixed

~~~
asdginioubnou
That doesn't make a difference asymptotically, though it obviously makes a big
difference in practice.

------
chrisseaton
> This means is a magnitude faster than the previous one!

Is this right? It's a quadratic not an order of magnitude difference isn't it?
And can you be an order of magnitude faster in an O notation expression
anyway? Isn't an order of magnitude a coefficient? Don't we drop coefficients
in O notation?

~~~
fancyfredbot
It's not wrong! For some value of N it is an order of magnitude faster. It's
just a bit misleading.

~~~
eesmith
For another value of N it's three orders of magnitude faster.

In the example case where s[0] == s[1], it might be an order of magnitude
slower, if there's much object initialization for the lookup table.

------
ayberk
I used to use this question before it became too common. Any reasonably-well-
prepared candidate should be able come up with the O(N) solution and the
follow-up is always good for a discussion: "what if the string is huge but the
alphabet is small? Eg, a DNA sequence."

There's a single pass solution that's left as an exercise for the reader.

~~~
chrisseaton
> Any reasonably-well-prepared candidate

Isn't that the point? You say it like it's a problem, but it screens for
someone who's reasonably prepared, which is probably what Google wants to do
because they really start looking for the super-world-tier people they employ,
because I guess that takes more time and resources.

> There's a single pass solution that's left as an exercise for the reader.

Doesn't the blog post already include a single-pass solution that works on a
small alphabet?

~~~
ayberk
> Isn't that the point? You say it like it's a problem, but it screens for
> someone who's reasonably prepared, which is probably what Google wants to do
> because they really start looking for the super-world-tier people they
> employ, because I guess that takes more time and resources.

Definitely. That's why it makes a good question for the initial screening. If
you can't come up with the O(N) solution, there's very little chance you'll
pass the on-site interviews.

> Doesn't the blog post already include a single-pass solution that works on a
> small alphabet?

It's actually two passes: one to create the set, and one more again to find
the character. There's a solution you can do in only one pass through the
string and one through the alphabet. Complexity is the same, but it's far
fewer operations for the specific example given.

~~~
chrisseaton
> It's actually two passes: one to create the set, and one more again to find
> the character.

Those are combined into a single pass in the example in the article, aren't
they?

~~~
bootloop
Yea I don't understand what the parent comment wants to say. You need to have
a lookup structure (even if its only a bit) and then you can do one pass over
the string. The lookup or adding a character doesn't count as a pass. And
there is no way to omit this anyway.

------
kinkrtyavimoodh
I love that this question is very basic, tests only entry-level data
structures, and can at least be talked and reasoned about even if you don't
know the exact API of a hashmap, and YET I am sure people will find a reason
to complain about this claiming they never actually need to find a recurring
character in their actual job role.

------
jrootabega
Congrats, it's now off the list of phone interview questions at Google.

------
thiht
> Most peoples immediate thought would be a double for loop

Really? I've a hard time believing this

------
jimbob45
Usually, part of the question is to ask about legal characters. If they tell
you it's letters only, then you can get away with a pre-allocated array (and
give a concrete big O answer).

------
TheOtherHobbes
It's not an order of magnitude faster. The details depend on the hash table
implementation and the language and the processor specifics and the
statistical distribution of string sizes and whether you can make assumptions
about the number of symbols being used and whether a tight double loop for a
small symbol set and string size would fit in the cache and whether it would
stay there long enough to make a useful difference.

Naive Big O is just as likely to be wrong as naive anything else.

I hope this isn't a genuine phone screen question, because if someone punted
this at me as a pass/fail I'd fail them for being confused about how real
computers and real languages work behind those nice clean-looking "if thing is
in other_thing..." statements.

~~~
asdginioubnou
The second solution is safer than the first. While it will sometimes be
slower, it will never be catastrophically bad. It may have a small,
predictable overhead, but it will never unexpectedly bring down the whole
program when you get a few unusually long strings.

When searching through a sorted list, your first instinct should be to use
binary search. Your first instinct should not be "well maybe the list is
always small, so linear search would be faster." Write the safe, predictable
thing and save the optimizations for later.

