
An Interview Question Too Many Developers Get Wrong - bgrainger
http://openmymind.net/An-Interview-Question-Too-Many-Developers-Get-Wrong/
======
ComputerGuru
* How many numbers are you looking for?

* How many times will you look for each number?

* How big is the original data?

* How sorted is the original data?

* What kind of access patterns do you have for lookups? (i.e. are the lookups themselves sorted)

* Are there perhaps special space constraints to consider that may have priority over time constraints?

* Where/how is the data stored/how is it backed? (perhaps you're reading from tape media or other non-random-access-optimized storage)

* Is the data a set or bag? (duplicate values allowed, do you care which one is returned, etc)

~~~
derefr
Probably a quicker way to get a developer to generate this same list of
considerations is to say "explain the tradeoffs in adding an index to a column
in a database table." We tend to search large tables much more frequently than
large in-memory lists these days, so that's where any discussions you'll find
on the topic are focused.

~~~
jmduke
Dumb/honest question from someone who isn't as experienced with database
mechanics as he'd like to be:

What exactly _are_ the tradeoffs of adding an index?

~~~
viraptor
Storage size of the index itself, insert/update performance. Also there may be
very little gain from adding an index to a column with only few distinct
values or a very skewed distribution (partitions may be a better approach in
those cases).

------
Groxx
Since they're evidently looking for pedantry, I present the following:

> _What factors influence which approach, between a linear scan and a
> sort+binary search, you should pick for an unsorted array?_

"A" search implies once. Linear will always be better, since sorting implies
visiting every element at least once, followed by a lookup, where linear is at
most once. You can of course come up with data structures and contents where
this will not be true, but if you're asking for all solutions to everything,
the only possible answer would be "it depends".

------
beeffective
In my personal experience, if this question were posed to me by a project
team, I would ask questions about usage and purpose (granted it would be more
business-like such as given an array of 10 million transcript credits, does
employee X exist for the Code of Conduct training). If knowing whether a
particular number (or item) existed in a large array, maybe we then create a
boolean and check during inserts/updates/deletes. Who knows, lets ask more
questions and understand each other. Not every developer works on cutting edge
Google Search algorithms in lonely cubicle all day by his lonesome.

As a developer, understanding why and coming to an efficient solution (in
terms of development hours and customer satisfaction) is about problem
decomposition and problem solving skills, not being an algorithm expert.

------
larsberg
tl;dr

The easiest way to paint yourself into a corner on a technical interview
question - especially a trick one - is to answer it without asking clarifying
questions about boundary conditions, usage patterns, data set sizes, target
hardware, etc.

~~~
hnriot
and if you didn't know this going into an interview, you probably had no
business going in the first place.

It's not a trick question, it's as much a part of quality software development
to ask the right questions as it is to start hacking on the keyboard. The
right answer to any interview question starts with "it depends, ..."

------
tunesmith
"No, I'm sorry, that answer is incorrect. The correct answer is, you don't
need to sort the array at all because you are a polar bear on an ice floe."

------
cvursache
Oh man, that interview question gives one hacker the freaks. After reading it,
my biological reaction was like: wow, I've got stuff to learn before becoming
proficient in Computer Science as a college dropout.

~~~
decadentcactus
If it helps, this type of problem has come up precisely zero times ever. Use a
built in function, loop through the array, and if in the future it's running
slow, _then_ optimise

~~~
latch
Hi. I wrote the post. I'm releasing a free ebook next Tuesday where hopefully
I can change your mind. (I also wrote The Little Redis Book and The Little
MongoDB Book, so hopefully you'll give me a chance!)

~~~
cvursache
How nice of you to share! Thank you Mr. latch (:

------
stephencanon
Depending on your tolerance for false positives, space and time requirements,
size of the array and data type, either a bitmap or a bloom filter or other
hash-based structure may be a much better option than sorting. I would expect
a candidate to offer that suggestion.

------
nly
How many needles you're looking for, and at what frequency (if ever again)
you're going to need to run the search.

Building any alternative data structure from an unsorted array will require
O(n) time. If you only need to find something once, then a O(n) linear scan
time is optimal.

------
jodrellblank
Sorting and binary searching is more code. Do you need the benefit enough to
add the code?

Do you need to implement sorting/binary searching or find libraries, or do you
have libraries handy?

I guess the "correct" answer is "lots of things could affect the choice, tell
me more about the context".

------
kinesh
Cache efficiency for linear reads CAN greatly outpace a large array where both
the sorting method and the binary search itself might cause a high number of
cache misses.

------
notaddicted
Personally, I'd use whatever built in function exists without giving it a
second thought. Until it shows up as a hot spot in the profiler I don't care
about performance.

~~~
Negitivefrags
I often see this kind of attitude expressed on internet forums as if it is the
right answer in every situation.

All of the options expressed here will usually be built in to the language,
there is no extra implementation cost at all. Pick a vector, sorted vector,
set, map or hash-map.

Knowing and picking what is usually correct will not take more than a few
seconds of extra thought.

Your attitude leads to "generally slow" programs that are the hardest to make
faster.

~~~
rhizome
_Your attitude leads to "generally slow" programs that are the hardest to make
faster._

Really? Even if they have proper encapsulation, separation of concerns, etc.
(even unintentionally)? "Refactor the function or method later if necessary"
is not the worst way to write software, and in fact I would say it's the
dominant technique _even when_ initial attempts are optimized.

------
tptacek
This is a fun question to discuss, but I don't see its value in an interview.
If you were on the fence about a candidate, it's hard to see an immediate "how
many times do I need to search" response pushing them over, and a candidate
that can implement sort+bsearch in an interview setting has cleared the
"search an array for a number" bar.

Be very wary of questions that may just be a mechanism for demonstrating how
smart your team is to candidates.

------
rhizome
Some people just love to find ways of asking about big-O without asking about
it.

~~~
smsm42
You'd be surprised how many people interview as programmers without knowing
what big-O is. Some of them aren't even bad and could answer the same question
on the substance, but don't know the terminology for some reason.

~~~
rhizome
Big-O is a interviewer's proxy for a CS degree. Most software engineering
(particularly web dev) is not specialized (or complicated) enough to require a
degree under US labor laws, so asking about Big-O is a way around the
prohibition against requiring one. It's also a signal that the technical
leadership of the company may not be able to communicate in substantive terms.

It raises two questions: in what way does the terminology serve the company's
business goals, and when was the last time the company had to refer to Big-O
in order to solve a problem?

------
sashagim
I think a different way to ask this question will help you learn more about
the candidate. After the trivial answer has been given - ask whether the
solution would change if you knew the query will be done several times for the
same array, or could it be done more efficiently.

------
city41
I'm surprised no one has brought up testing, or am I just too obsessed with
testing?

I'd argue you should have good test coverage and a well thought out API such
that you can easily swap out the searching algorithm as needed to meet your
needs without worries of breaking the component or things that depend on it.
What if the function is handed a pointer to the array, and the caller doesn't
want the data sorted?

Beyond that, I'd strongly argue the answer is: whatever can get me the answer
with the most readable and simplest code. If this part of the program later
proves to be a bottleneck, then consider different algorithms. Which again,
good test coverage plays a roll in.

~~~
LockeWatts
Yes you are. You didn't answer the question asked, you avoided the question
and talked about what you found important, rather than what the interviewer
found important.

~~~
city41
See my reply to raylu. I suppose I'm biased because I feel questions like this
are completely ineffective. Whatever the interviewer finds to be important is
not necessarily going to get them quality candidates.

~~~
LockeWatts
But that's for them to decide. You as the candidate have one directive, and an
interrupt.

If you want the job, do whatever it is you think you will impress the
employer, unless whatever that would take convinces you you don't want the
job.

------
danbruc
Bad question. What is better when searching in an unsorted array - linear
search or sorting and binary search? Assuming better implies faster only the
size of the array and the (expected) number of searches matters.

If you want me to figure out if searching an array is the best option in the
first place, then ask me that question. It is an important skill to ask
questions - especially when you have to figure out what customers really need
versus what they have requested - but expecting that a simple interview
question with extremely narrow scope like this makes me question if an array
is the correct choice is ridiculous.

It's a bit like when your math test asks you to calculate some annual
interests for you savings account and you start to question if bringing money
to a bank is the best option in the first place.

------
tzs
One factor I don't think I've seen mentioned yet is how much lead time you
have between the construction of the array and the need to search it.

For example, suppose the array is built when the program starts, but you know
the need to search it will not arise until significantly later, and you also
know that the program will have significant idle time in between. You could
then use that idle time sort the array.

------
epoxyhockey
_What factors influence which approach, between a linear scan and a
sort+binary search, you should pick for an unsorted array?_

It seems like there are some serious grammar issues with this question. If the
interviewer cannot clearly ask a question, that may explain why many
interviewees "get it wrong."

------
hexagonc
At a minimum, you have to consider how long the arrays are and whether the
items in the array are sortable.

~~~
minimax
Do you frequently find yourself dealing with arrays of unsortable numbers? ;-)

~~~
hexagonc
Epic comment fail. I read original post but forgot that they specified numbers
by the time I posted my comment. As uncertainty set in, I reread the original
post and had a Homer Simpson moment. I upvoted you to teach myself a lesson.

~~~
WildUtah
There are unsortable or only partially sortable numbers, depending on your
number system; it's not a problem if you can restrict yourself to real
numbers, though.

But they never said that numbers were the only thing in the array.

~~~
minimax
True enough! Is it complex numbers you're considering? Now I feel like I'm
playing one of those puzzles where there is puddle of water and a dead man and
you're supposed to figure out what happened.

edit: Puzzles like this: <http://kith.org/logos/things/sitpuz/situations.html>

~~~
StavrosK
Decomposition and evaporation, I'm assuming?

------
cpdean
so link baity. "I MUST NEVER GET THIS QUESTION WRONG"

------
chrisallick
job sounds boring.

~~~
jat850
I don't know how serious you are being, but the purpose of the interview
question may not have anything to do with the job whatsoever, or it may.

Either way, questions of this nature are intended to filter out people who are
not interested in getting to the root of a problem, and solving that problem
practically.

The interviewee should be looking to determine constraints, limitations,
expectations, requirements, use cases, problem scale.

The interview for any job should be looking for someone looking for those
things.

In what way would any job that requires critical thinking be considered
boring?

~~~
rocky1138
I get what you're saying, but there's a line interviewing crosses when it
becomes less about trying to figure out if the person is right for the job and
more about trying to outsmart them by painting them into some sort of
technical corner.

~~~
rhizome
Dominance games.

