
Retiring a Great Interview Problem - dtunkelang
http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/
======
georgemcbay
Why retire the question just because you saw it on Glassdoor? The same
question was posted to sureinterview last year, so anyone you hired since
November is suspect! (If you believe that potential advance knowledge of the
question is relevant).

Worse yet, even if you did come up with this problem on your own, this exact
problem was a fairly common interview question back in the mid 90s, when
string processing interview questions were all the rage for C/C++ programming
jobs -- I must have seen it a dozen times in various interviews over the
years. The problem is familiar enough that I'd bet a decent amount of money
that it must be listed in one of those interview problem books that were
popular before the websites for this stuff started showing up over the past
couple years... If you really relied on the interviewee having no possible
advance knowledge of this question prior to the interview you surely had a
false sense of security prior to seeing it appear on glassdoor.

As long as you engage the interviewee to see that s/he really understands the
answers they are giving, I don't really see why it matters if the question has
appeared on one of these sites. If the interviewee is preparing for their
interviews enough that they are actually looking at these sites and
understanding the answers to the point where they can intelligently discuss
them, that probably already puts them in the upper tier of people you'd want
to seriously consider hiring, so retiring the question is probably counter-
productive unless you have a non-disclosed alternative that you're sure is as
good of a question.

------
patio11
A spiritually similar question at a previous employer resulted in many
candidates attempting to iterate over the dictionary rather than iterating
over the string. We hired them. At least they could iterate over a dictionary.
That's a surprisingly rare skill in the hiring pool.

Maybe I'm just getting cynical in my old age, but sometimes I think the world
is awash in incompetence. We see so much of it in tech because our
incompetencies are (marginally) harder to hide.

~~~
dxbydt
>sometimes I think the world is awash in incompetence. We see so much of it in
tech because our incompetencies are (marginally) harder to hide.

A very candid Philip Greenspun in Founders@Work( p341-2) : "In aviation, by
the time someone's your employee, their perceived skill and actual skill are
reasonably in line. JFK Jr is not working at a charter operation because he's
dead. In IT,you have people who think 'I am a great programmer'. But where are
the metrics that prove them wrong ? Traffic accidents are very infrequent, so
they don't get the feedback that they are a terrible driver.

Programmers walk around with a huge overestimate of their capabilities. That's
why a lot of them are very bitter. They sit around stewing at their desks.
That's why I don't miss IT, because programmers are very unlikable people.
They are not pleasant to manage."

~~~
nasmorn
I second that. For musicians it is hard to hide their incompetence, for
programmers it is just a matter of picking the right succession of jobs.

------
pbh
I hope readers aren't getting the impression from this article that the code
examples provided are the correct way to do word segmentation in English.
(Though I understand this is an article about interviewing and not about word
segmentation. And this might be considered a preprocessing step for doing
things correctly...)

Norvig gives a very approachable version of English word segmentation that
uses a language model below.

<http://norvig.com/ngrams/>

~~~
dtunkelang
Peter actually emailed me directly, though I was already very familiar with
his work on this and similar problems. But yes, I make it very clear in the
post (and to candidates) that they should not assume an English -- or even a
natural language -- dictionary.

------
JL2010
I had a similar question with a twist asked of me during an interview. It went
something like this:

Given a list of all the short strings in the periodic table of elements (e.g.
Na, F, Al, etc) and a list of all the words in the English language: 1) write
a method that finds the longest possible English word you can spell given any
combination of the strings in the periodic table of elements. Re-usage of
elements in the same string are allowed. 2) Describe what kind of data types
you would want for the two lists and describe anything special about them. 3)
Give a big O estimation.

I thought it was a great question :)

------
tzs
I would have thought of dynamic programming when asked if I could solve it
more efficiently, but I've somehow just not had much occasion to use dynamic
programming, so would have had to admit it would take an evening of reviewing
my algorithms books to actually solve it that way.

However, I would have been able to come up with an alternative O(n^2)
solution, involving building a directed graph with vertices representing each
position in the string and the end of the string, with edges connecting two
vertices if there is a dictionary word starting at the position corresponding
to one vertex and ending just before the position corresponding to the second
vertex. This can be done in O(n^2), and then you can find the shortest path
from the vertex for 0 to the end vertex on this graph in O(n^2) (e.g., by
Dijkstra's algorithm), and that gives you an exact covering of the string
using the minimal number of dictionary words.

~~~
VinyleEm
Your approach is correct. The graph that you've built is a DAG. This means
that, you can solve for the lower vertices before solving for the upper ones.
This is exactly how a typical Dynamic Programming solution works. If for
solving a problem, we can see the relationship b/w this problem and other
similar but smaller ones, we solve the smaller problems first and use these
solutions to build up a solution for the larger case.

EDIT: A naive implementation of your idea can take upto O(n3m) time where m is
the no. of words in dictionary. Using a fancy data structure like a trie or
suffix tree or an automaton can speed it upto O(n^2). Can we better that?

------
StavrosK
Only slightly joking:

    
    
        re.findall("(your|dict|here)", "yourword")
    

I like the idea of constructing a state machine to do all the matching.

~~~
rfurmani
bzzt wrong.

>>> re.findall('(a|aa|aaa|ab)', 'aaab') ['a', 'a', 'a']

The correct answer would be ['aa', 'ab'] but unfortunately findall works
greedily and so will not find the optimal solution. It is possible to specify
it as a regex, but common implementations might take too much time to come up
with the good solution.

~~~
reidrac
I think ['a', 'a', 'ab'] is also valid, isn't it? So ['aa', 'ab'] would be
_one_ of the correct answers.

(of course ['a', 'a', 'a'] it's not because the 'b' is not a valid word and
it's not in the solutions)

~~~
rfurmani
Ah right, I was a bit too quick to challenge. Dictionary: aa, aaa, aaaa, ab
Word: aaaab

------
nandemo
Just for fun I decided to rewrite his first version in Haskell. This is
probably not idiomatic, though.

    
    
      segment_string :: String -> Set String -> Maybe String
      segment_string [] _ = Nothing
      segment_string str dict =
        if str `member` dict
        then Just str
        else let pairs = zip (inits str) (tails str)
                 pairInDict (x, y) = x `member` dict && y `member` dict in
             do (x, y) <- find pairInDict pairs
       　        Just (x ++ " " ++ y)

~~~
ottbot
[http://www.codinghorror.com/blog/2007/02/fizzbuzz-the-
progra...](http://www.codinghorror.com/blog/2007/02/fizzbuzz-the-programmers-
stairway-to-heaven.html)

~~~
nandemo
So what? This problem (the full version) is more complex than FizzBuzz.
Besides, I'm not pasting code here to prove that I can program, but to show
how the solution looks like in Haskell.

------
wensing
Humbling way to start the work week. I could produce the fizzbuzz solution and
in my sharper days the recursive backtracing one, but definitely no further.

~~~
SoftwareMaven
I wouldn't feel bad. The advanced answers to this question require spending a
lot of time understanding string processing. It's like having a CSS question
that can be implemented multiple ways: a simple, obvious, slow way and a
complicated, "deep knowledge required", fast way. If you have lots of
experience with CSS, you might get the fast way, but it doesn't really say how
good a programmer you are.

(Yes, not a perfect analogy, but it hopefully gives the idea.)

~~~
kragen
> The advanced answers to this question require spending a lot of time
> understanding string processing.

No, the advanced answers are a simple application of dynamic programming. If
you've never heard of dynamic programming before, you're unlikely to invent it
in response to an interview question, of course; but if you have heard of it,
it might occur to you to try it on this problem.

(Actually, if you've heard of memoization but not dynamic programming, you
might invent dynamic programming in response to this question.)

I think this is at the opposite end of the spectrum from your CSS example.
Dynamic programming has nothing to do with string processing or with any other
particular domain. There's a list of 29 significant algorithms that apply it
at
[http://en.wikipedia.org/wiki/Dynamic_programming#Algorithms_...](http://en.wikipedia.org/wiki/Dynamic_programming#Algorithms_that_use_dynamic_programming).
It might qualify as "deep knowledge", but it's not deep _domain_ knowledge;
it's the kind of deep knowledge that would make you want to hire someone from
a different domain.

~~~
onemoreact
Counterpoint: _@Retric congratulations, you’ve reinvented dynamic programming
with your O(n) solution That’d be a perfect solution._

And yes, I had zero idea what memorization or dynamic programming meant in
that context. After looking it up on wikipedia it seems to mean caching
intermediate steps to avoid recalculating them which seems obvious enough.

------
pkteison
Bit of a tangent, but a real question: If a writer doesn't approve of a site's
behavior (in this case, Glassdoor is desribed as "a site that does not seem to
mind when interview candidates violate NDAs"), why does the writer still link
to them? Inbound links (w/o a nofollow) help sites, why help sites you don't
like?

~~~
dtunkelang
Point taken. I thought of not linking to them, or even not mentioning them.
But they do play an important part in the story of the post, and I don't think
I'm helping them so much as I'm raising employees' and employers' awareness of
their existence. At least now a few more interviewers might check to see if
their questions are posted there. From my limited data, interviewees are
already more aware of Glassdoor than interviewers.

As for feeding them page rank, I don't think I have so much to offer that it
helps them materially.

~~~
logic
> interviewees are already more aware of Glassdoor than interviewers

My first amusing thought when reading this was to assume a correlation; ie.
that interviewees who lean heavily on Glassdoor prior to interviewing do not
eventually become intervewers.

------
mynegation
Quick heads up: page is not rendered properly in Mobile Safari on iPhone:
fixed width font lines are cut off.

~~~
dtunkelang
Sorry about that. Advice on how to fix it while maintaining good rendering
elsewhere? Looks good in Chrome on my MacBook and my Nexus One.

~~~
pavel_lishin
It's not rendering well for me, either: <http://i.imgur.com/Q89gP.png>

My suggested solution: don't wrap it in PRE tags with manual line-breaks. It's
not code, so why preserve the exact breaks? Try BLOCKQUOTE - I don't know if
it's widely supported anymore - or just italicize the whole thing.

I don't really have a good solution for what to do about the actual code,
though. :(

------
svdad
I think the concern about whether or not candidates have seen this, or any,
programming question before is missing the point. Think about what we want in
the ideal candidate -- we want them to come up with a good (elegant,
efficient) solution to the problem, and implement it. We (judging by all the
other responses) expect them to do that because they've had a solid CS
education (formal or informal) as well as significant experience.

But people with that background will give good answers, even if they haven't
seen _this specific problem_, because they have seen lots of problems like it
and recognize the pattern. And even in that case, we evaluate them based on
how well they can implement the pattern they saw, not just on whether they
recognized the correct algorithm. So what if they've seen this problem
already? Coding it up efficiently and elegantly in an interview context is
still non-trivial, and you can still push them to discuss edge cases and
performance tradeoffs.

The person who really has _never_ seen anything like this in his life, and
still can give a good answer, I have yet to meet.

------
Shenglong
With the exception of the last part of the question, you learn everything
there in your first year of CS at university. Do people who can't write this
_really_ put the language on their resume?

Can I get some stats? I really don't (want to) believe it. What percentage of
people get this question wrong? Are they all some sort of eng/cs graduate? I'm
not even a coder and I can solve this in a few minutes.

~~~
dtunkelang
I don't have stats I can share, but I assure you that this problem has
confounded many interview candidates with strong resumes. I agree with you
that it's all basic material -- that's deliberate. I'm glad you think it's too
easy. :-)

~~~
cema
Yes, it's basic. But many people who are not fresh out of college may have
spent recent years solving a completely different set of programming tasks,
and do not have it loaded in their brains.

When applicants prepare for an interview, they do not often know what kind of
knowledge to load in their heads. For example, just a couple of weeks ago I
was asked to figure out a simple bit-flipping scheme, and bit string
manipulations are something that I have not thought about in many years. So it
took me about 10 minutes for a problem that I would have done in less than a
minute when I was spending time thinking about similar things and my mind was
full with them.

Being prepared for a technical interview does not mean to have memorized a few
solutions to a few problems, but it means to have played with them
sufficiently to have the brain loaded with the material. This helps with
intuition as well as specific technical skills.

------
Havoc
Its a great question. Pretty sure its not nearly as much of a secret as the
author thinks though. I've seen a detailed write-up before somewhere (HN?) and
I'm not even a programmer by profession.

~~~
dtunkelang
Secret is a strong word -- I link to another post on the subject in the
article. Still, seeing it on Glassdoor for my own employer crossed the line of
disclosure.

------
ohashi
I read this and went, I had someone build this for me years ago!

So I looked at the code, it used the efficient solution. Now I am even more
impressed with the programmer. I've always thought highly of him (better than
me) but it's hard to evaluate someone better than you. His solution ran
circles around mine (I had general simple case with 2 words) and now I know
exactly how much more efficient his solution was. Very neat.

------
mak120
Its a nice Dynamic programming problem. The beauty of DP is that simply
memorizing one application of it does not guarantee you a solution to an
entirely different problem that might have a similar Dynamic Programming
solution. Look over the TopCoder SRM archives if you don't believe me.

So even though you are retiring this one, coming up with something similar
that tests for basically the same things shouldn't be impossible.

------
pavel_lishin
Quick note to the author - holy god, this is annoying:
<http://i.imgur.com/0cNx4.png>

~~~
dtunkelang
Sorry. The post renders well in Chrome on my Mac and on my Nexus One. But
apparently not so well in other browser / platform combinations.

------
wisty
The problem he's having is that good interview questions are getting busted,
as people post solutions on the web.

If you have a _lot_ of similar interview questions, then there's no way anyone
other than a savant can memorize them without actually learning the theory.

~~~
dtunkelang
Point taken. But it's hard to come up with good interview questions, as my
colleagues here, at Google, and at Endeca can attest. In contrast, it's much
easier to post solutions.

That's why I'm working on an approach that assumes the candidate does of prior
knowledge of the problem. But not there yet.

~~~
jswinghammer
One thing that I've noticed over the years is that almost no one prepares for
an interview in any way so you'll still keep out the worst candidates with
this question.

~~~
cema
I think Google etc are exceptions, people do prepare for technical interviews
there.

------
NY_Entrepreneur
The author just mentioned dynamic programming. Usually in dynamic programming,
e.g., as in Dreyfus and Law, to say that a problem has a dynamic programming
solution we outline the solution. But the author did not outline such a
solution.

An outline usually includes at least the definition of the 'stages' of the
dynamic programming solution. For the problem of 'string segmentation', the
obvious selection of stages would be each of i = 1, 2, ..., n for the given
string of length n. But this definition of the stages does not yield a dynamic
program because the solution at stage i needs more than just the solution at
stage i + 1 and, indeed, potentially needs the solutions at each of stages i +
1, i + 2, ..., n.

So, first-cut, there is no dynamic programming solution. For a second-cut,
there might be a dynamic programming solution if the author would outline one!

There is now some question if the Google interviewers really understand
dynamic programming!

~~~
anonymoushn
The article actually includes a solution that uses memoization, which is
equivalent to DP.

~~~
ominous_prime
I wouldn't say "is equivalent to"; rather "is a form of". While it is a
dynamic programming technique, unless it's specified people tend to think of
the bottom-up approach to be what's implied by "dynamic", with memoization
being a special case. Yes, it's semantics, but I think that where the GP was
coming from.

