Hacker News new | past | comments | ask | show | jobs | submit login
An Interview Question Too Many Developers Get Wrong (openmymind.net)
32 points by bgrainger on April 24, 2013 | hide | past | favorite | 53 comments



* How many numbers are you looking for?

* How many times will you look for each number?

* How big is the original data?

* How sorted is the original data?

* What kind of access patterns do you have for lookups? (i.e. are the lookups themselves sorted)

* Are there perhaps special space constraints to consider that may have priority over time constraints?

* Where/how is the data stored/how is it backed? (perhaps you're reading from tape media or other non-random-access-optimized storage)

* Is the data a set or bag? (duplicate values allowed, do you care which one is returned, etc)


Probably a quicker way to get a developer to generate this same list of considerations is to say "explain the tradeoffs in adding an index to a column in a database table." We tend to search large tables much more frequently than large in-memory lists these days, so that's where any discussions you'll find on the topic are focused.


Dumb/honest question from someone who isn't as experienced with database mechanics as he'd like to be:

What exactly are the tradeoffs of adding an index?


Storage size of the index itself, insert/update performance. Also there may be very little gain from adding an index to a column with only few distinct values or a very skewed distribution (partitions may be a better approach in those cases).


you have to build it and store, so the footprint of this data structure has now increased.

If the array of numbers is mutable, the index also needs to be a modifiable structure (like a tree).


* Is writing much slower than reading (although I guess you sort-of covered that with your penultimate point)?


* cardinality of the array. Maybe there are a million entires, but only 3 distinct values. keeping an unsorted set of distinct values might, in this case, be a much better solution that a sorted index. No mention was made of data cardinality.


* Are the contents going to change?


Since they're evidently looking for pedantry, I present the following:

>What factors influence which approach, between a linear scan and a sort+binary search, you should pick for an unsorted array?

"A" search implies once. Linear will always be better, since sorting implies visiting every element at least once, followed by a lookup, where linear is at most once. You can of course come up with data structures and contents where this will not be true, but if you're asking for all solutions to everything, the only possible answer would be "it depends".


In my personal experience, if this question were posed to me by a project team, I would ask questions about usage and purpose (granted it would be more business-like such as given an array of 10 million transcript credits, does employee X exist for the Code of Conduct training). If knowing whether a particular number (or item) existed in a large array, maybe we then create a boolean and check during inserts/updates/deletes. Who knows, lets ask more questions and understand each other. Not every developer works on cutting edge Google Search algorithms in lonely cubicle all day by his lonesome.

As a developer, understanding why and coming to an efficient solution (in terms of development hours and customer satisfaction) is about problem decomposition and problem solving skills, not being an algorithm expert.


tl;dr

The easiest way to paint yourself into a corner on a technical interview question - especially a trick one - is to answer it without asking clarifying questions about boundary conditions, usage patterns, data set sizes, target hardware, etc.


and if you didn't know this going into an interview, you probably had no business going in the first place.

It's not a trick question, it's as much a part of quality software development to ask the right questions as it is to start hacking on the keyboard. The right answer to any interview question starts with "it depends, ..."


We'll probably stop asking it, but I think what's been addictive about it is how quickly people give an answer.


"No, I'm sorry, that answer is incorrect. The correct answer is, you don't need to sort the array at all because you are a polar bear on an ice floe."


Oh man, that interview question gives one hacker the freaks. After reading it, my biological reaction was like: wow, I've got stuff to learn before becoming proficient in Computer Science as a college dropout.


If it helps, this type of problem has come up precisely zero times ever. Use a built in function, loop through the array, and if in the future it's running slow, _then_ optimise


You don't need to optimize prematurely, but you should understand when/why/how to optimize something like this off the top of your head if you ever want to work at a company that deals with non-trivial amounts of data. If that's the hardest question I got asked at an interview, I'd never take the job.


Hi. I wrote the post. I'm releasing a free ebook next Tuesday where hopefully I can change your mind. (I also wrote The Little Redis Book and The Little MongoDB Book, so hopefully you'll give me a chance!)


How nice of you to share! Thank you Mr. latch (:


Depending on your tolerance for false positives, space and time requirements, size of the array and data type, either a bitmap or a bloom filter or other hash-based structure may be a much better option than sorting. I would expect a candidate to offer that suggestion.


How many needles you're looking for, and at what frequency (if ever again) you're going to need to run the search.

Building any alternative data structure from an unsorted array will require O(n) time. If you only need to find something once, then a O(n) linear scan time is optimal.


Sorting and binary searching is more code. Do you need the benefit enough to add the code?

Do you need to implement sorting/binary searching or find libraries, or do you have libraries handy?

I guess the "correct" answer is "lots of things could affect the choice, tell me more about the context".


Cache efficiency for linear reads CAN greatly outpace a large array where both the sorting method and the binary search itself might cause a high number of cache misses.


Personally, I'd use whatever built in function exists without giving it a second thought. Until it shows up as a hot spot in the profiler I don't care about performance.


I often see this kind of attitude expressed on internet forums as if it is the right answer in every situation.

All of the options expressed here will usually be built in to the language, there is no extra implementation cost at all. Pick a vector, sorted vector, set, map or hash-map.

Knowing and picking what is usually correct will not take more than a few seconds of extra thought.

Your attitude leads to "generally slow" programs that are the hardest to make faster.


Your attitude leads to "generally slow" programs that are the hardest to make faster.

Really? Even if they have proper encapsulation, separation of concerns, etc. (even unintentionally)? "Refactor the function or method later if necessary" is not the worst way to write software, and in fact I would say it's the dominant technique even when initial attempts are optimized.


This is a fun question to discuss, but I don't see its value in an interview. If you were on the fence about a candidate, it's hard to see an immediate "how many times do I need to search" response pushing them over, and a candidate that can implement sort+bsearch in an interview setting has cleared the "search an array for a number" bar.

Be very wary of questions that may just be a mechanism for demonstrating how smart your team is to candidates.


Some people just love to find ways of asking about big-O without asking about it.


You'd be surprised how many people interview as programmers without knowing what big-O is. Some of them aren't even bad and could answer the same question on the substance, but don't know the terminology for some reason.


Big-O is a interviewer's proxy for a CS degree. Most software engineering (particularly web dev) is not specialized (or complicated) enough to require a degree under US labor laws, so asking about Big-O is a way around the prohibition against requiring one. It's also a signal that the technical leadership of the company may not be able to communicate in substantive terms.

It raises two questions: in what way does the terminology serve the company's business goals, and when was the last time the company had to refer to Big-O in order to solve a problem?


I think a different way to ask this question will help you learn more about the candidate. After the trivial answer has been given - ask whether the solution would change if you knew the query will be done several times for the same array, or could it be done more efficiently.


I'm surprised no one has brought up testing, or am I just too obsessed with testing?

I'd argue you should have good test coverage and a well thought out API such that you can easily swap out the searching algorithm as needed to meet your needs without worries of breaking the component or things that depend on it. What if the function is handed a pointer to the array, and the caller doesn't want the data sorted?

Beyond that, I'd strongly argue the answer is: whatever can get me the answer with the most readable and simplest code. If this part of the program later proves to be a bottleneck, then consider different algorithms. Which again, good test coverage plays a roll in.


Yes you are. You didn't answer the question asked, you avoided the question and talked about what you found important, rather than what the interviewer found important.


See my reply to raylu. I suppose I'm biased because I feel questions like this are completely ineffective. Whatever the interviewer finds to be important is not necessarily going to get them quality candidates.


But that's for them to decide. You as the candidate have one directive, and an interrupt.

If you want the job, do whatever it is you think you will impress the employer, unless whatever that would take convinces you you don't want the job.


What does any of that have to do with the question? This is a performance question. I'll accept readability/simplicity as an afterthought, but if you answer in this manner you've missed the (quite clear) point of the question.


I disagree the point of the question is clear. I also strongly argue that questions like this are not productive and that most companies have completely broken dev interviews.

I disagree because I've been in interviews where a question like this, and all questions asked for that matter, have a testing expectation to them. If a developer answers this question without considering test coverage and what that means, some companies see that as a sign of a dev who doesn't appreciate tests or have a full grasp of the entire lifecycle of development. "Trick questions" with hidden undertones are very common.

Or is it a practicality question? I do not want to hire a dev who is concerned with squeezing every drop of performance out of function calls that make up 0.001% of the app's runtime.

Is finding out the candidate is aware that sort+binary search is O(nlogn) versus O(n) of linear search really enabling you to find a quality candidate? Maybe, but probably not. We don't ask questions like this at all anymore. The bulk of our interview process is sitting down with the candidate and writing a small program with them, end to end. Not perfect either, but far more effective than ambiguous questions.


Bad question. What is better when searching in an unsorted array - linear search or sorting and binary search? Assuming better implies faster only the size of the array and the (expected) number of searches matters.

If you want me to figure out if searching an array is the best option in the first place, then ask me that question. It is an important skill to ask questions - especially when you have to figure out what customers really need versus what they have requested - but expecting that a simple interview question with extremely narrow scope like this makes me question if an array is the correct choice is ridiculous.

It's a bit like when your math test asks you to calculate some annual interests for you savings account and you start to question if bringing money to a bank is the best option in the first place.


One factor I don't think I've seen mentioned yet is how much lead time you have between the construction of the array and the need to search it.

For example, suppose the array is built when the program starts, but you know the need to search it will not arise until significantly later, and you also know that the program will have significant idle time in between. You could then use that idle time sort the array.


What factors influence which approach, between a linear scan and a sort+binary search, you should pick for an unsorted array?

It seems like there are some serious grammar issues with this question. If the interviewer cannot clearly ask a question, that may explain why many interviewees "get it wrong."


At a minimum, you have to consider how long the arrays are and whether the items in the array are sortable.


Do you frequently find yourself dealing with arrays of unsortable numbers? ;-)


Epic comment fail. I read original post but forgot that they specified numbers by the time I posted my comment. As uncertainty set in, I reread the original post and had a Homer Simpson moment. I upvoted you to teach myself a lesson.


There are unsortable or only partially sortable numbers, depending on your number system; it's not a problem if you can restrict yourself to real numbers, though.

But they never said that numbers were the only thing in the array.


True enough! Is it complex numbers you're considering? Now I feel like I'm playing one of those puzzles where there is puddle of water and a dead man and you're supposed to figure out what happened.

edit: Puzzles like this: http://kith.org/logos/things/sitpuz/situations.html


Decomposition and evaporation, I'm assuming?


If you're on a system where the write times are several orders of magnitude larger than the read times, then I'd say those qualify as unsortable. Or at least unintelligently sortable.


so link baity. "I MUST NEVER GET THIS QUESTION WRONG"


job sounds boring.


I don't know how serious you are being, but the purpose of the interview question may not have anything to do with the job whatsoever, or it may.

Either way, questions of this nature are intended to filter out people who are not interested in getting to the root of a problem, and solving that problem practically.

The interviewee should be looking to determine constraints, limitations, expectations, requirements, use cases, problem scale.

The interview for any job should be looking for someone looking for those things.

In what way would any job that requires critical thinking be considered boring?


I get what you're saying, but there's a line interviewing crosses when it becomes less about trying to figure out if the person is right for the job and more about trying to outsmart them by painting them into some sort of technical corner.


Dominance games.


"but the purpose of the interview question may not have anything to do with the job whatsoever"

And yet it's asked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: