My Favorite Engineering Interview Question

derwiki · on Oct 28, 2010

Advice for new questions: instead of contrived ones, pick a problem you've actually encountered in your job and ask the candidate to solve it. I've never had to implement a datastore like this -- so unless that's what you're actually doing, it doesn't seem particularly relevant. Another plus is that if you're doing interesting work, you have an almost endless supply of questions to choose from. Candidates also seem to respond better when you wrap up the question with "this is something I actually worked on a few weeks ago."

FWIW, Ning turned me down back in April 2008 so feel free to ignore my advice :)

zeraholladay · on Oct 28, 2010

Asking relevant questions at an interview is a good idea, but I've noticed that a lot of technical interviewers ask questions it took them weeks to understand and fix when they use this approach. Understanding a problem is often more difficult than solving it ... knowing that the interviewer understands the problem and constraints is even more of a challenge.

I once had an interviewer give me the kobayashi maru without telling me. I thought the guy was a jerk. He could just have pulled a gun on me and demanded my wallet if he wanted to judge my personality. I declined the second round.

jemfinch · on Oct 28, 2010

Can you describe your kobayashi maru question? I'm curious.

zeraholladay · on Oct 28, 2010

Yeah, but it wasn't technical. The point of the question was to assess my entrepreneurial IQ, or so I was told after. Generally if an unemployed person's applying for a job, their entrepreneurial IQ is low at the moment and they need money. Nevertheless: I was to acquire a desk to work at, which needed to have certain dimensions for under a certain dollar figure. The internet is down, the local office supply company doesn't have it, etc, etc, until I'm starting to think the interviewer is nuts. OK, I'll sit on the floor. His point was that you need to generate options, which isn't bad advice. But it was a dumb interview question.

VladRussian · on Oct 28, 2010

the kobayashi maru isn't the impossible question itself. The kobayashi maru here is the situation of the jerk asking you the stupid question.

swaits · on Oct 28, 2010

Very dumb question, indeed.

tlb · on Oct 28, 2010

The point of the Kobayashi-Maru test isn't to test whether they're capable of passing impossible tests, it's to test their character facing death. If you're choosing fellows to go to war with, you want to choose ones that won't shit their pants or otherwise make your last moments alive together awkward.

swaits · on Nov 3, 2010

That's great for Star Trek. I'll keep interviewing technical candidates on a technical basis.

btilly · on Oct 28, 2010

Perhaps the interviewer wanted to see if you could think up something like http://glinden.blogspot.com/2006/01/early-amazon-door-desks.... - and in the process test your willingness to be in an environment with such jury-rigged solutions.

variety · on Oct 28, 2010

You can look up the Star Trek reference, but in an interviewing context, it refers to an question that is essentially impossible to answer and/or basically never occurs in real life and yet is somehow meant to judge a candidate's ability to "think outside the box" or under extreme pressure.

As well as, of course, to showcase the interviewer's vast wealth of experience and keen insight into the human condition that enables him/her to assess as much in 5 minutes from candidate's response to a completely contrived question.

elai · on Oct 28, 2010

Remember the "unwinnable" test in the recent Start Trek movie that kirk wins by cheating?

albertzeyer · on Oct 28, 2010

Maybe the test was how you react on this.

zeraholladay · on Oct 28, 2010

It was, but if you can show me a person that deals well with failure then you've shown me a CEO or saint.

rdtsc · on Oct 28, 2010

I feel the same way about little puzzles that companies like to present to the candidates, especially the ones with a "trick" answer.

I never worked in a place where I had to solve tons of small little puzzles all day, or have to memorize a set of obscure language features, and can't look them up online if I need to.

As you said, the best interviews are the ones by a white-board, working together to solve something resembling a real problem. Because, eventually that is what I will end up doing if I work there.

TheBurningOr · on Oct 28, 2010

As someone who was recently on the job market, I will say I wish more employers did this as well. As a candidate it gives you much more information about what you will actually be doing at the company too. There is just as much buzzword packing in job descriptions as there is in resumes these days and sorting through the interesting from non-interesting jobs is not always easy.

ryandotsmith · on Oct 28, 2010

True. During my latest interview, Heroku asked me to design a billing/invoice system. I won the position and my first assignment was to work on the billing / invoice system. When I have the opportunity of conducting an interview, I will most certainly use this approach.

elliottcarlson · on Oct 28, 2010

I truly apologize in advance - this is not meant to be flame bait or anything of that nature - but it bothers me when someone states they "won" a position. The interview process should be used to find a good match, both for the company as well as the candidate. When I am interviewed I am evaluating the company as well and will have no reservations turning them down if I feel I wouldn't be a good fit. If I have to win to get in to the company, then that alone would make me back off and possibly lose interest in the company.

variety · on Oct 28, 2010

Upvoted. Just for once, I'd like for an interviewer to present a question that they themselves don't know how to solve (or didn't know how to solve in the first few times they thought about it).

jsankey · on Oct 28, 2010

An idea I've tried a bit in the past (and would like to try more, given the chance): ask a question that you don't know the answer to. This shifts the power balance of the interview - you don't automatically have the answer over the candidate. You also don't have your mind closed by your expected answer(s). Both of these make the resulting discussion more comparable to how you will work with a colleague, which is what you are trying to test the candidate for, after all.

Doing this requires more effort and guts from the interviewer, but it's still easier than being on the other side of the desk. Finding a supply of suitable questions is probably the trickiest part - you could try looking up some problems from competitions just prior to the interview, or you could perhaps take a real problem you are trying to solve at that point in time (abstracted as required).

brianm · on Oct 28, 2010

I have done this before, usually if I am working on something interesting, and for which I don't like any solution I have yet hit on. In this case, I do the best I can to describe the problem, thinking thus far, etc, and turn the interview into a design session at the whiteboard.

I can only recommend doing it with a candidate you are more selling to then evaluating, and whom you think will be a good fit.

A strong candidate will jump in and you'll have a great session (I had one in particular with a great young erlanger whom we wound up losing to someone else, but the interview left a lasting impression and we kept in touch long afterwards).

sangaya · on Oct 28, 2010

This was the approach my current employer chose. They basically were like "Here's the challenges we expect you to tackle once you start. Let's discuss how you would approach each."

Essentially a high-level design session where they got to see my thought process and I got to ask lots of questions about the firm. The best part is that when I started they were pretty comfortable with my approach to certain projects as it was previously covered, and I knew what I was getting into. No big surprises for either side.

tsycho · on Oct 28, 2010

That's a very good idea. I would go further and tell the candidate that I don't know the solution myself, thus hopefully putting him/her at ease and making it clear that I am more interested in the thought and design process, rather than figuring out the "trick" in the question.

The tough part here is to choose a good question.

lrm242 · on Oct 28, 2010

These types of questions are terrible interviewing techniques. An interviewer who, even jokingly, goes into the room with an interrogative mindset is fundamentally failing at the core purpose of interviewing: finding good talent, preferably relatively undervalued talent. An interviewer should not be looking for someone who mirrors their way of thinking or or their approach to solving a problem. They should be looking for people with skills that complement the team, not duplicate it. Questions like these are typically signs of bad interviewing technique, but not always depending on how they are presented.

More often than not, however, people "design" interview questions to cause problems for prospective employees. You might think this is clever. You might think this helps weed out people who don't possess the ability to think quickly or react to feedback on their feet. All you really are doing is artificially limiting the prospective candidate pool.

Every dork with a chip on their shoulder can manufacture an a question to be unanswerable. Deep down inside what most of these bad interviewers really want is to design a question that creates one of two atmospheres in the interview room: (a) a submissive one, where the candidate is forced to placate the interviewer and reinforce their apparent intellectual superiority; or, (b) a confrontational one where the candidate and interviewer battle it out for dominance as alpha-geek. Either (a) or (b) can be seen as a positive depending on the type of interviewer that goes in. Ultimately, any interviewer that uses this technique is a bad one and any input they make into the hiring process should be seriously discounted.

The interviewer should adapt their interview style to the candidate based on reading their body language, communication skills, and overall demeanor in the interview room. Its the interviewer's responsibility to create the best interview experience possible. Some people are nervous; some don't think on their feet well; most rely on Google heavily to point them in the right direction; lots of them know more than they can demonstrate in the 5 minutes you give them to react to your stupid question.

Your job is find diamonds in the rough. People who are the best, not people who act, think, behave, or answer the way you think they should. Questions like the one in this blog post are tell-tale signs of alpha-geek interview dominance gone wild, and out of that you'll typically only get one style of candidate that makes it through the gauntlet, certainly guaranteeing you're not getting the best talent available.

jemfinch · on Oct 28, 2010

I don't understand your objection here.

The question presented is a simple, open-ended question with a number of acceptable solutions. It's not even remotely "unanswerable". It's at least mildly interesting as indicated by the number of responses at a similar question on StackOverflow: http://stackoverflow.com/questions/2573653/given-a-1tb-data-... . It's not far from the problems Ning solves. It's not an "aha" problem that you either get or you don't get. It's complex enough to reveal the candidate's thought process and problem solving methodology, but it's small enough for good candidates to get to a reasonable solution in the 45-60 minutes of an ordinary interview.

Nothing about the article indicates that the author has a chip on his shoulder, nor that he's a dork who wants to "battle it out for dominance as alpha-geek."

Nothing about the article indicates that author is looking for a specific solution reflective only of his particular prejudices in programming.

What in the world is the basis for your entire diatribe here?

lrm242 · on Oct 28, 2010

Seeing as you worked at Ning and likely participated in these types of interviews it's not surprising you don't get it. Surely Ning was a great place to work and I'm sure you found really top-notch employees through your well known tough hiring process, but I'd be willing to wager they all look very much the same when you blur your eyes a little bit.

My diatribe, as you so eloquently put it, is based on the article. The list of things that the author expects to see in the answer. The way the question is presented. The lead off with "interrogation". The "gotcha" follow-up regarding how things might fail. These questions are great at finding a very specific type of candidate, but the entire point of my comment is that this one candidate type is typically not indicative of the best engineer. Of course you'll disagree, I'm sure, seeing as you were part of this process.

brianm · on Oct 28, 2010

Asking how it will fail is a gotcha? To my knowledge and experience thus far, all systems fail. The question is "given this one, that we now have a good shared understanding of, which you designed with the tools you are most familiar with, how is it going to fail?

Not one candidate I can think of has been either surprised by this question, or felt stuck, fwiw. Again, I may be completely self delusional, but I hope not!

mquander · on Oct 28, 2010

Well, I'm eager to hear what you think the candidate type of the best engineer is, why that type of candidate would fail to come up with a reasonable train of thought given the cited question in an interview, and how you select for that candidate type.

If questions like the one posted find a type of candidate who can think well under mild pressure, who can quickly brainstorm sensible approaches to technical problems, and who can communicate their ideas and reasoning to their peers, then that sounds like a pretty good outcome to me.

jemfinch · on Oct 28, 2010

Wow. You seem angry, or frustrated. Did I say something that offended you?

I only participated in Ning's interviews as an interviewee. I worked for Ning for six weeks before the layoffs.

The only thing I'd notice when I blur my eyes a little bit is that Ning employees were better engineers than most I'd previously worked with. The same applies at my current employer, also known for its rigorous interviews. The similarities ended there.

You complain here about the things the author expects to see in an answer. He's looking to see what they think of that scale of problem and how their thoughts comport with their resume. He's looking to see what questions they ask to get more detail. He's looking to see that the solution is in reasonable bounds for hardware--that the candidate understands the limitations of the silicon we have to work with. He's looking to see whether they would reinvent the wheel or use proven technologies. He's looking to see what languages and technologies a candidate would choose to use. He's looking to see how well the candidate understands those tools. He's looking to see how cognizant candidates are of the failure modes of the system they're designing.

Maybe I'm dense, but I don't see any issue with the list of things the author expects to see expects to see. Can you be more specific, perhaps?

Likewise I don't understand the problem with the way the question is presented. He describes the data you have and the load you need to support. Is there something wrong with that presentation?

The crossed-out "interrogation" is obviously a joke. I'm certainly not known for my intuitive perceptiveness and I still saw that (and no, I never met brianm while working at Ning, to my knowledge).

How is asking about failure modes a "gotcha"? Every good engineer needs to consider how his designs and systems can fail. What in that question do you expect trips up good engineers?

The only "specific type of candidate" this question seems oriented toward finding is "good engineers." He's not asking for particular technologies. He's not even looking for a particular solution. He's just asking people an open-ended question about how they would design a system. All the best engineers I know would be able to answer that question in 30 minutes.

I don't disagree because I was part of the process, I disagree because I can't understand your objections. I hate to sink to personal attacks like you have, but have you considered that your opinion on questions like this may have been formed by your abilities? It seems a more likely possibility than that questions like this are fundamentally flawed in some way that you've as yet been unable to describe. Both companies I've worked with that have had rigorous interview questions of this sort have hired the highest quality talent I've ever had the pleasure of working with.

bjnortier_hn · on Oct 28, 2010

I would say that having "interrogation techniques" crossed out is a pretty good indication that the author's state of mind in an interview situation is aggressive and domineering.

Goladus · on Oct 28, 2010

I would have said the same thing until I read the rest of the article. In context I think it's meant primarily to be funny.

The author clearly maintains some expectations about the level of response from the candidate, but overall it's designed to be interactive. Chances are that unless the candidate gives a really superior answer, it's not going to be the only question asked.

jholman · on Oct 28, 2010

What is the meaning, in an essay, of a cross-out word?

Generally, I would say that any time an author takes the extra trouble to write in an extra word and cross it out, they are making some kind of meta-comment deliberately, rather than straightforwardly revealing that the crossed-out word is representative of their true state of mind. The crossed-out word is a meta-comment on what the author thinks about the shared idea-space of the author-plus-audience, usually to say "hey I know that we all know that this concept exists", usually to also say "and that this concept is an unmentionable", and often to also say "and I think that's a bit funny".

In this case, we all know that some people conduct interviews as interrogations (and some people think that indeed they should do so), and on the other hand some people think that this is offensive. And brianm wants you to know that he knows we all know it.

MAYBE it's on his mind because he's an aggressive domineering jerk who likes to interrogate people, but he's sensitive enough to be a little defensive about it. MAYBE it's on his mind because he's NOT aggressive or domineering, and he's frustrated by how easily interviews become interrogations, and his whole deal is trying to prevent that without making the whole interview meaningless. MAYBE something else. How're you going to find out? Read the article!

My point, tl;dr: If you see a word crossed-out, by an author you don't already know, and you think "Oh this author really means the crossed-out thing", I think that's a poor comprehension strategy.

jemfinch · on Oct 28, 2010

I would say it's obviously a joke. Do you always assume the worst of people?

variety · on Oct 28, 2010

Maybe he thought he was making a joke, but the point is that it comes off "Ha, ha, only serious":

  http://sunsite.ualberta.ca/jargon/html/H/ha-ha-only-serious.html

scott_s · on Oct 28, 2010

It does at first, but upon reading the entire essay I was convinced his intentions are honest.

Goladus · on Oct 28, 2010

> Questions like these are typically signs of bad interviewing technique, but not always depending on how they are presented.

Given that the author presents the exact question and includes a moderately lengthy discussion on how he uses the question, what is your evaluation in this specific case, rather than the general one?

Consider, for example, this comment from the article:

I’m looking for quite a few things as we go through the question. The first is their opinion of “a terabyte of data” and “5000 lookups per second.” Do they consider this to be a lot of data, or a fairly boring amount, same with the lookups per second. Leaning either way isn’t a failure, it is just information gathering for me, and referencing it against how you represented yourself in your resume, cover letter, and phone screen.

broofa · on Oct 28, 2010

Your reaction says lot more about your attitude than it does about the OP's. I found the article interesting and informative, with none of the passive/aggressive overtunes you mention.

Your lip service about finding diamonds in the rough and catering to candidates who are nervous or can't think on their feet leads to just one comment: "Put up or shut up."

The OP has given us a very good, concrete example of his interview process, the traits it reveals in candidates, and how he tailors it to meet individual candidate reactions. Until you do the same your comment here is nothing but five paragraphs of whining about how some people don't interview well.

To which I reply, "Duh."

We all know the 45-60 minute interview is imperfect, and that both interviewer and candidate are just trying to make the best of a very contrived interaction to begin with. So, please, lose the attitude and give us something we can learn from rather than bashing the OP with vague generalities.

brianm · on Oct 28, 2010

I appreciate the feedback, but don't think I fall into, nor describe in the linked blog post, the interviewer category you describe (though it is certainly possible I do and just don't realize it).

Do you consider this question to be unanswerable? As I stated, I'll accept pretty much any viable answer, my goal is to get folk to design something ~realistic (okay, this is a simplistic system, but we have finite time in an interview slot).

Honestly, I think the question I blogged about is pretty straightforward, and exists for purposes of framing a discussion about real systems and constraints. There is no answer I was fishing for at any point I used this question.

"Questions like the one in this blog post are tell-tale signs of alpha-geek interview dominance gone wild," is so far off base from what I am trying to convey that I think I utterly failed to convey my point, which was quite literally the opposite :-(

variety · on Oct 28, 2010

Part of what's problematic about this interviewing approach is that while your (secret) internal mindset may be "I'll accept pretty much any viable answer", the full-court-press interviewing style effects the exact opposite impression on the candidate -- they're left thinking, "gosh, I have to get this right, dammit, no room for slack, can't leave out any details AT ALL or it's back to another month of ramen and job boards trolling."

So yes, with your interviewing technique (orthogonal to the technical content of the question itself) you've succeeded in conveying a stance and mindset that is quite literally the opposite of what you intended.

jholman · on Oct 28, 2010

Is "full-court press" and "your interviewing technique" from personal experience with the OP? Or just from reading the article?

Because all I got from the article is that he has lots and lots of follow-up directions. I.e., that he is prepared.

variety · on Oct 29, 2010

From the article. The flip side of the fact that the interviewer has had time to carefully prepare while the interviewee has not is that what you get from that inherently contrived, lopsided discussion, kind of a long, drawn-out game of "gotcha." Fun for the <strike>interrogator</strike> interviewer, but tedious and buzz-killing for the interviewee.

It provides insight, yes, but a certain cost: many candidates fine this approach simply alienating, and more to the point, it upsets their thought processes (such that they don't really get a chance to shine).

As a candidate you can train for this kind of confrontational approach, of course, but I think lrm242 nailed what's most alienating about this interviewing style -- that ultimately the candidate is put into a position that's basically uncomfortable to be in, where they have to _placate_ their interlocuter according to some hidden criteria set -- and mostly they just want that particular thread to be overwith as soon as possible.

I just think we can get the same insight into candidates from different ways, where things are more on an even keel, and there's more "flow."[1] It takes more tact and creativity (and perhaps a bit more time), but it's not that hard, is more mutually enjoyable and (I think) ultimately provides a lot more insight.

[1] "flow" being of course a very important concept; e.g. in the sense of Murakami.

Retric · on Oct 28, 2010

Do you consider this question to be unanswerable?

Yes. Given a little time I could come up with 20 or more so ways of doing this. As to which one I would use that's not really answerable in an interview. The problem with questions like this is two weeks is a fairly long period of time. If it's literally all your working on then researching a few options and then asking some more questions is the best place to start. aka Is the access pattern random or would caching be useful? Is the data private? etc.

dbt · on Oct 28, 2010

I don't know about you, but when I am interviewing people I am not trying to find "diamonds in the rough", I'm trying to find the best person to join the team and contribute. That may or may not include experience, but it certainly includes the ability to solve problems.

This isn't an unanswerable question. I've asked and gotten great answers to a very similar question about handling data at scale. And what I'm looking for includes the opposite of a chip on the shoulder.

tlb · on Oct 28, 2010

Some interviewers are searching for talent, some are weeding out bozos. (The categories overlap: I know some talented bozos.)

palewery · on Oct 28, 2010

How do you evaluate problem solving skills of candidates during interviews?

seigenblues · on Oct 28, 2010

Man, I can't up vote this enough. I would also add that this is also usually symbolically equal to what VCs seem to do in their "interviewing"/wooing process. Well said!

garazy · on Oct 28, 2010

I couldn't agree more, when I read the "interrogate" introduction it put me off reading the rest of the article.

follower · on Oct 28, 2010

Did you read the rest of the article anyway? The "interrogate" wording may present a bad first impression but the remainder of the article is pretty reasonable.

barrkel · on Oct 28, 2010

I'm interested in why he's so against the "custom solution". Almost everything a DB will try and add in for key-value lookup is predicated on the idea that the requests aren't randomly distributed. The DB index will probably be based around b-trees, which will do a logarithmic-time search for the top few levels cached in RAM, but will fall over fairly miserably with multiple seeks as it has to page in leaf nodes and likely the nodes a level above those: I don't see it as realistic to expect a DB to do it all in a single seek.

On the other hand, if you have a perfect hash function for the keyspace into a hash addressing space (you'll want something a bit larger than 2^32), you could simply choose the device with the first few bits, then do a single seek into a (large) file on disk with the position indicated by the remaining bits of the hash addressing space, shifted over appropriately for the 128-byte stride of values. You don't even need a perfect hash: with a merely good hash function, you can cheaply read in more than 128 bytes from disk (OS will probably be reading in 4K anyhow), and use open addressing with linear probing.

You'll need enough devices to spread the load to get good latency for random seeks; probably a combination of mirroring and what amounts to sharding, using a portion of the key hash to select the device. But that's a relatively cheap numbers game.

But I'm not really seeing what a DB buys you here, in this specific scenario.

EDIT: Further investigation of tc and cdb (as linked in the article) suggests that tc (Tokyo Cabinet) may work in its hash form, but I'd rule out cdb for using two seeks, which I would expect to double the number of devices required.

brown9-2 · on Oct 28, 2010

I would imagine that for a certain number of candidates, they answer with a custom solution because they believe this is what the interviewer is expecting to hear (and asking for) when the question contains "you need to design, build, and deploy...". I think to some people, the immediate thought is that the interviewer would deduct points for originality if you said "well let's just use this tool off of the shelf".

This is why interviews are so tricky - it's hard as a candidate to juggle answering the question and trying to decipher what traits the interviewer is really trying to expose with this question.

For example with this one: is the interviewer trying to see if I understand my data structures and the size of this data to the point where I could design such a system from the ground up? Or are they trying to see how practical I can be given real-world time constraints?

brianm · on Oct 28, 2010

I agree, and I try to make clear that I really want to understand how the candidate would approach it -- if they genuinely would try to find a perfect hash and keep a hash -> offset map in memory, then mmap a big data file, that is fine. I will ask how they will find the perfect hash, etc. If you can describe well the approach you take, then this is fine.

Ofttimes, though, candidates have started designing a general purpose on-disk hash database. That is fine and dandy, and we always need a better database, but it is pretty orthogonal to the immediate problem.

Actually, ofttimes engineers have implemented general purpose on-disk hash databases as part of j. random project, so that path certainly does occur on real projects :-)

cs_loser · on Oct 28, 2010

Were I asked this, my general approach would be: (1) Given the available building blocks (disks, ram, cpus, etc) what strategies are workable? An example of a strategy would be "store it in a btree and keep all but the last two levels in ram; therefore we'll need enough ram for the top of the tree and 2x seeks for each lookup." (2) Find existing software that implements a workable strategy if at all possible, or at least the complicated components of it and code the rest yourself.

You can think of two extreme answers to this as "buy oracle" and "buy some transistors". Those are both bad answers. The first is a bad answer because, unless the candidate can answer the follow-up question "how does oracle work?" it probably just amounts to wishful thinking and they almost certainly won't be able to say when it will fail. The second is a bad answer because you can't do that in two weeks by yourself.

Take for example, Foursquare's recent outage. From what I read online, it seems like they went with "use MongoDB" as a strategy without actually understanding what Mongo was doing under the hood. Presumably they didn't figure out how it would fail (the author's follow-up question, natch) and thus they discovered that the hard way. So they chose bad answer one. On the other hand, had they decided to implement their own database, they'd know all about its characteristics, but they wouldn't have launched yet.

variety · on Oct 28, 2010

Or, they simply didn't have failover in place at all.

brianm · on Oct 28, 2010

I have gotten solutions which implement custom stores which I have had faith that the candidate could build, but not many. I am not dead set against it, but I will press for an estimate and a good explanation of it if they want to go down that route.

I try to emphasize that I want to know what they would actually do, on the job. Other questions (typically other interviewers) will explore algorithmic stuff, and folks I work with have used this question with that intent -- I use it to see how they think about building systems.

Another thing I didn't mention in the blog post is that up front I tell them exactly what I am looking to learn. I am not going on a fishing expedition, or trying to trick anyone. I find that kind of "second guess the interviewer" thing to be asinine, so strive to avoid doing it myself.

lrm242 · on Oct 28, 2010

How often do you decide exactly what you're going to do to solve a problem for your job in the span of one hour? Oh yeah, and make sure you provide an estimate and a good explanation, within that hour, as to why you've made that decision. Give me a break man, you're looking for people who tell you what you want to hear because then you've found someone who thinks like you--and that, in your mind, is the only correct answer.

ora600 · on Oct 28, 2010

The idea is that you don't have to write a DB - it is already there, usually even deployed and monitored by your ops team. Even if it gives absolutely no other value, thats a big plus.

In terms of randomly distributed requests: I'm surprised the topic of "working set" is not part of the question. Even with huge amounts of data, if the working set can be stored in RAM and only rarely swap data in and out, its a much different problem than 5000 disk reads a second. Lots of databases are optimized to efficiently maintain cached data in memory and swap as little as possible and as efficiently as possible.

ojilles · on Oct 28, 2010

Requests are evenly distributed, so caching will only be "effective" if it approaches the size of the whole dataset.

nikster · on Oct 28, 2010

I must say the problem bores me to death. I don't think I'd want to work where they ask me to solve this. However, I am confident I'd design a solution that's near perfect in the two weeks given, it just would require some experimentation on the data storage side - the code is trivial.

msie · on Oct 28, 2010

Sigh, I'd rather start my own business than run the gauntlet of interview questions again.

enneff · on Oct 28, 2010

Really? I found my interview experience at Google a lot of fun. I would happily do it again.

seiji · on Oct 27, 2010

When I see "the data never changes" http://cmph.sourceforge.net/ springs to mind

brianm · on Oct 28, 2010

Oo, very nice. This is going to get explored Thank you for the link!

DannoHung · on Oct 28, 2010

I'm confused... why specify a 1 TB disk? It doesn't seem that the problem's solution has any dependence on the aforementioned disk aside from the data it contains unless I'm misreading how he's explaining the answer?

ntoshev · on Oct 28, 2010

Yeah, I also thought he expected to serve it all from a single disk. Hopefully in real life it gets clear in the discussion.

It's a problem when asking interview questions: there are many ways in which the question may be unclear and the interviewer is mislead that the applicant can't answer. Another problem may be that the interview interferes with the thinking style of the applicant: you try to discuss it with him while he needs time to think by himself, or just the opposite: he thinks better in a discussion and you stay quiet while he sweats in stress.

user24 · on Oct 28, 2010

> there are many ways in which the question may be unclear and the interviewer is mislead that the applicant can't answer.

yeah, when I was interviewed for my university, they asked me to design an algorithm to work out the n'th term of the fibonnacci sequence. They repeatedly said I didn't need to use recursion or loops.

When I finally gave up, they said: well here's one way:

f(n) = f(n-1)+f(n-2)

I just looked at them and quietly said "well... yeah... I mean... obviously". By that point I was too depressed to say "but that's frickin' recursive, wtf?!?" but that's what I was screaming in my head.

Amazingly I still got offered a place.

edit: still bugs the crap out of me. Did I mis-hear them? Were they being deliberately misleading? Was it some kind of test to see how I reacted to an impossible problem[1]? Were they really expecting me to come up with some mathematical algorithm to calculate the nth term without recursion or looping? Argh!!!

[1]: actually it's not impossible: http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.ht...

but I don't have a background in mathematics and they knew that.

ntoshev · on Oct 28, 2010

They were likely trying to help you and wanted to say they don't care whether you write an iterative or recursive solution.

user24 · on Oct 28, 2010

I'm pretty sure I asked for clarification several times... It's possible I suppose.

Here's the problem it started from:

Eleven lily pads are numbered from 0 to 10. A frog starts on pad 0 and wants to get to pad 10. At each jump, the frog can move forward by one or two pads, so there are many ways it can get to pad 10. For example, it can make 10 jumps of one pad, 1111111111, or five jumps of two pads, 22222, or go 221212 or 221122, and so on. We'll call each of these ways different, even if the frog takes the same jumps in a different order. How many different ways are there of getting from 0 to 10?

(source: http://www.comlab.ox.ac.uk/admissions/ugrad/Sample_interview...)

dedward · on Oct 28, 2010

How the applicant seeks clarification is just as important as how he answers.. he's going to be presented with problems all the time. When I pop the open-ended technical questions, there really is no right or wrong answer - it's just there to judge someone's real ability - I have some questions you can't fake your way through. You don't have to have every little detail right, but I can tell from how you answer whether or not you are afraid to ask for clarification, and whether you know the material, and to a good degree how well you know the material. It's a conversation - not a one-way thing (at least in my case). I can tell if someone's nervous - and that's okay, too, but the job is going to be stressful, so after I explain not to be nervous and don't worry about getting every detail right.... but if you can't even begin to describe the systems I'm asking about, you don't belong on the team... same goes for if you don't seek clarification.

user24 · on Oct 28, 2010

yep, that totally threw me too. It kinda implies that you're going to be handed a physical disk with some sort of k/v store in it and you need to write some software to access it.

I immediately started thinking about seek times and elevator algorithms and that kind of stuff, which really threw me off course.

cperciva · on Oct 28, 2010

I immediately started thinking about seek times and elevator algorithms and that kind of stuff, which really threw me off course.

Not necessarily off-course. If you have 32 GB of RAM and you don't mind an average latency of 10000000 ms, a single 1 TB disk works just fine,

madaxe · on Oct 28, 2010

Because when you specify a disk, you realise he's guiding you in the direction of a specific answer.

Use the filesystem. It's what it's there for, particularly if the data never changes. Nice big cache, xfs, nested hierarchical datastore using folders based on keynames, 20 lines of code, job done. 5000 lookups a second is trivial throughput for a decent disk with a decent readahead cache.

defen · on Oct 28, 2010

I don't know of any hard drives that can do a seek+read in 0.2 milliseconds. Keep in mind that the reads are distributed randomly - you're almost certainly going to need more than one disk if you have less than a terabyte of RAM for your datastore.

codeslinger · on Oct 27, 2010

Why have SSDs killed this question? Seems like you could tweak the amount of data on disk and/or the size of the values and/or the requests/second requirement and keep using it. Also, there are no 1TB SSDs available at this point, so you'd still have to assume spinning platters for this, no?

drv · on Oct 28, 2010

I haven't thought about it deeply, but my first guess would be that the seek time of hard disks is what makes serving 5000 requests per second difficult; consumer-grade hard disks can perform something in the range of a few hundred (randomly distributed) IOPS at best due to seek time. SSDs make seeks (almost) free, so even one decent SSD should be able to service 5000 requests per second, assuming you can get a big enough one.

If multiple storage devices (hard disks or otherwise) are allowed, RAID (or just splitting the data across multiple disks) would ameliorate the seek problem somewhat, since seeks can then happen in parallel. I wouldn't see this problem as requiring a distributed solution, unless the bottleneck is the bus or storage controller rather than the single-disk seek performance.

xtacy · on Oct 28, 2010

Multiple disks, across different physical servers (instead of RAID) would be good too.

You could also try increasing the throughput by having an in memory cache. The index/hash table can also be in main memory. Depending on the keys, locality can play a crucial role in prefetching data.

jemfinch · on Oct 28, 2010

This may come as a shock, but you can put more than one SSD in a server :-P

jjcm · on Oct 28, 2010

I think that the parent thought that the system had to use a single SSD. The first time I read the question I thought that I had to use the hard drive that the data came on, and that it was all I was allowed to use. I didn't realize that the point was to design a system to simply serve the data that was given to us. After reading it, it becomes pretty clear that a simple raid5 with a few SSD's makes this problem moot.

slashclee · on Oct 28, 2010

Amazon seems to disagree with you.

http://www.amazon.com/OCZ-Express-Solid-Z-DRIVE-OCZSSDPCIE-Z...

aphyr · on Oct 28, 2010

Goshdarnit. I just sat down and started talking to myself about hash functions, filesystems, the wire protocol, and partitioning scheme for half an hour instead of making dinner. Nerd sniping?

pjscott · on Oct 28, 2010

Why fret about the wire protocol? ZeroMQ will send messages plenty fast enough, and it's really very simple to use; just a few lines of code.

Damn you, now you've got me doing it. Nerd sniper. ;-)

nirajr · on Oct 28, 2010

I love to ask something that is simple to start with, and then build on it interactively with the candidate.

One of my favorite questions is - Give me a normalized database structure that you'll implement if you were to build gmail - incorporate conversations, messages, multiple message participants and labels.

Then, depending on the candidate, I build upon the question,and go into various optimizations possible, the ways caching would be implemented, sharding/splitting/de-normalization would be done with load, etc. etc. With good candidates, its always a very interesting discussion. And even the bad ones can leave the interview thinking they know something and don't need to feel bad :)

ericflo · on Oct 28, 2010

This. A million times this. I do the same thing starting with a simple kernel, but keep growing it, throwing in monkey-wrenches, see how the candidate reasons and adapts.

Good candidates will start recognizing and acknowledging trade-offs, whereas bad candidates will settle on one clear path.

Good candidates will speak about the ideas they're trying to get across, whereas bad candidates will drop into talking about specific brands of technologies.

nirajr · on Oct 28, 2010

Definitely.

One candidate when faced with some questions on how he'll handle load said he'll switch to Oracle :) Promptly asked to leave.

Good ones clearly show that they have the ability to deal with multiple tracks of ideas, know that trade-offs need to be taken care of, in general, have an approach that with some ingenuity, almost all problems are solvable. Thats the attitude I (and I guess you too) look for.

pepsi_can · on Oct 28, 2010

I was unsure on how to answer this question since most of my experience is in application development.

I'd like to not be embarrassed when asked this question. What advice would you give someone wanting to learn enough to answer this question?

Are there any projects, books or other resources one could undertake?

wazoox · on Oct 28, 2010

Jon Bentley's "Programming pearls", definitely. This problem is somewhat comparable to the first "pearl" from this book. Anyway one of the best ever programming books :)

bialecki · on Oct 27, 2010

I like this question, but I'm wondering what a good interview looks like if someone is completely blind-sided by the question because they don't have a lot (or any) systems experience? I'm imagining an engineer who's experience ends with MySQL so knowledge of stuff like tcb is a definite no. Maybe that's not an issue, because they wouldn't be interviewing for this job if that was the case.

derwiki · on Oct 28, 2010

... not an issue if the questions are relevant to the job. On the other hand, recruiting is hard and it's easy to let a great candidate slip by for any number of reasons.

tbrooks · on Oct 28, 2010

Ning cut 40% of it's staff in April. Why are they hiring new outside developers?

jemfinch · on Oct 28, 2010

Because some of us went on to better places :) Ning was recognized for its selectivity in hiring; though it did come as a shock, everyone that I knew in my short time there landed on their feet at similar or better operations (Google, Amazon, Etsy, etc).

In the intervening months since the layoffs, there's also been some attrition (as there always is) in those who weren't laid off. I know of at least two people who weren't laid off, but left for Facebook shortly after the layoffs.

I also know that they've already hired back at least one of my former coworkers since the layoffs.

grammaton · on Oct 28, 2010

This guy is taking a lot of heat for bad interviewing technique, and this may or may not be the case - but it's hard to say, given that we don't know what kind of a position he's interviewing for. Given the job requirements of the position, this may be a perfectly reasonable question - i.e. if the job requires someone to build and maintain large custom data stores, it's pretty acceptable practice to ask questions that reveal how much they know about large custom data stores. Personally I wouldn't be over the moon crazy about this question, but it's still a hell of a lot better than being peppered with random "what is X?" type questions that test nothing about your ability as a developer, and only what you currently directly know. That is to say, those questions where you end up looking like an idiot if you didn't happen to read the same man pages as the interviewer. In the types of questions this guy is asking, you have to basically guess what they're looking for - so it can end up being a bit of a mind reading exercise, which is why I wouldn't be terribly crazy about it - but it's still by far the lesser of several evils, simply because it's a direct demonstration of how a developer thinks about development, instead of how much rote minutae they've mastered or whether they can find a way to stick eight kittens into six holes without any broken legs of flying fur, etc....

metageek · on Oct 28, 2010

> In the types of questions this guy is asking, you have to basically guess what they're looking for

Don't guess, ask. Having asked questions like this before, I can assure you that a reasonable interviewer will actually be pleased that you understand the alternatives.

(And, of course, if you get an unreasonable interviewer, you probably don't want to work there anyway.)

jtchang · on Oct 28, 2010

I've never been asked this question in an interview so when I saw it I started thinking about how I would answer it.

The first thing I did was work out the actual bytes/sec (throughput) needed for this many requests. Also given the 2 week timeframe I would question really hard what the data set looks like. Since the requests aren't randomly distributed does that mean you can narrow it down to a small enough portion to fit inside RAM.

I think the question is a good one because it does touch on a lot of aspects of infrastructure and systems design.

follower · on Oct 28, 2010

> Since the requests aren't randomly distributed

Does this follow when the original article says:

  "The keys are not distributed evenly within the keyspace."

and:

  "Requests are distributed evenly (and randomly) across the keys"

olegkikin · on Oct 28, 2010

Easily solved with SSDs. Sort the keys, value is stored right after the key.

On each request do a binary search. It takes 32 seeks in the worst case.

Even current consumer level SSDs can handle 45K IOPS, so you'd get 1400 complete searches/sec from one SSD.

You might need to spread your data over 4 SSDs, but you can search in parallel, so it will even be faster.

If you want more speed, replicate your drives and load balance.

There are better ways (with perfect hash functions), but this is the easiest, and requires no additional storage.

caustic · on Oct 28, 2010

In your case it would be better to use interpolation search[1][2] instead of binary search.

[1] http://en.wikipedia.org/wiki/Interpolation_search [2] http://sna-projects.com/blog/2010/06/beating-binary-search/

mhewett · on Oct 28, 2010

Expecting a good answer to this question in 30 minutes without any warning is totally unreasonable. Expecting anyone to implement a deployable, generalizable, tested solution to this in two weeks is totally ridiculous. I wouldn't work at a place that asked questions like this or made development calendars with schedules like that.

tlb · on Oct 28, 2010

Yet, companies who can pull off those feats are the ones who survive. A 1 TB database, even 5 years ago, shouldn't take more than a week or two to build properly. The solution I've used in the past, a Netapp file server with hashed directories of key-named files, takes zero programming and a couple hours to install. (You can easily read 5000 files / sec over NFS from one of their file servers.)

parbo · on Oct 28, 2010

I did this in three weeks:

http://bitbucket.org/rogueops/vinzclortho/wiki/Home

I worked on it on average two hours per day. It's not that ridiculous to believe that it would be (near) production quality in two weeks of fulltime work.

mhewett · on Oct 28, 2010

It's not hard to get "near production quality". It is hard to get to production quality.

ckoenig · on Oct 28, 2010

Hmm my first thought was ... well ... WTF Then I thought - man I'd like to work for this corp. but alas I think I'm to stupid for that. Some moments of reflection later I'm now convinced that I don't want to work there at all.

The question is somewhat similiar to "give me the fastest sorting algorithm and prove that it is" Everyone with a CS-degree or something similar will know this prove and Qsort from his very first years and I claim almost everyone will have forgotten the prove. (I can remember that you can prove this by looking at the "descision" tree passed on the 1on1 comparision which will have depth of about the needet size ;) - but this is it) I think most people will get the prove (even without google or wiki) in a resonable time (some hours?) but within interview-time and in this situation? In my case no way.

warmfuzzykitten · on Oct 30, 2010

It's everybody's favorite interviewing technique: Ask a fairly esoteric question that you are thoroughly familiar with and sit there and feel superior while the candidate struggles to get to an answer you will accept. Three kinds of candidates do well: those who are smarter than you are and spit out answers that surprise you; those who have heard the question and act like they're thinking it through for the first time; those who are unfamiliar with the area but have a personality you find congenial, so you coach them to a solution. In the end, it's all about ego.

VladRussian · on Oct 28, 2010

Using general purpose database in such case is just an approximation of a solution, ie. if the database supports index-optimized type of table and smart enough to cache index header blocks quickly enough.

Because of extreme primitiveness of this case, custom solution would work better here than general purpose db. Sort the pairs by key and place in the tree, with the root node containing 2^11 keys (512K), second level - 2^11 nodes of 2^11 keys each (512K each node size), 3rd level - 2^22 nodes of 2^10 keys (256K each node size). Keep root and all the second level nodes always in memory (1G RAM required). Given a key to find, look it up through the root and the second level nodes in memory - this would result in the index of the 3rd level 256K node to be read from disk. Time-wise reading a 256K block from disk is just one IO - pretty much the same time as reading 1 byte.

5000 lookups/sec at 1 IO/lookup - 20 disk array of 250 IO/sec disks - 15K ones would do it. 2 arrays of 400GB disks is still ~2 times cheaper than 1 TB SSD array. RAID5(6)-ing the disks will leave us with 6TB. Short-stroke the first 1TB (70Gb x 20) to get much better speed than 250 IO/sec (or just use less than 20 disks to start with) and enjoy the rest 5TB for non-IO intensive purposes.

follower · on Oct 28, 2010

I'm trying to work out if this is a thinko on my part or a typo on your part:

First you write:

> a system that lets you look up the value for a given key

and then later on you write:

> if they think the problem of looking up a key by value has been solved before

Did you mean to say "looking up a value by key" or am I misunderstanding it?

(Given that no one else has commented on it makes me question my question but I'm trying to get over my concern of being wrong on the internet and asking to be sure. :) )

brianm · on Oct 28, 2010

The second one is a typo on my part :-/

follower · on Oct 28, 2010

Thanks for clarifying that.

sagarun · on Oct 28, 2010

These days i read "The Guerrilla Guide to interviewing" by joel before attending/conducting an interview. http://www.joelonsoftware.com/articles/fog0000000073.html . The OP's question doesn't have place there :-)

ericlc · on Oct 28, 2010

Who is Joel?

die_sekte · on Oct 28, 2010

Joel Spolsky. Former Microsoft software engineer, now has a company that makes FogBugz. One of the great software opinionators, like Paul Graham.

pdamerval · on Oct 28, 2010

Hello, I would probably make a very poor candidate, but I wanted to ask this - the question gives a 1TB hard drive, and fills it literally to the brim with 4 billion sets of 256 bytes. And then you say that 4 or 5 years ago this relied on some sort of distributed system, but now it can be done on one box. What changed? If the drive is one terabyte, and is used to store exactly one terabyte of data, aren't you going to need to put your database overhead somewhere? Am I completely misunderstanding the question? I find it interesting though. It's an easy problem without the storage space limitation. With it, it's fascinating. Are we thinking to compress the values and/or the keys? Or am I off the mark?

josephyeo · on Oct 28, 2010

It seems to me that the problem could actually be solved by using a 64GB SSD. Given that the storage is 1TB and each of the keys and values are 128Bytes (and hence the disk is full to capacity), and assuming the keys are unique, which it would need to be to have all the keys unique, then we can safely say that every possible value in the 2^32 space is used. Therefore, we could read the data from the 1TB disk, and for each key we find, use the value of the key as the address of a memory location on the 64GB SSD. Net effect is we essentially remove all the key info from the hard disk and compress the storage into 16GB. ie, no need for the hard disk, just a 64GB SSD.

expeditious · on Oct 29, 2010

If you have more data than, say, GDBM, can swallow in one bite, how could you break this data apart and then find the right chunk amongst multiple servers when given a key?

Also, what does the interviewer mean by this:

> I generally ask these folks if they think the problem of looking up a key by value has been solved before, especially given the two weeks to be live in production requirement.

Is that a typo? s/key by value/value by key/ ?

uwittig · on Nov 1, 2010

I'm in recruiting myself. If I would ask questions like this we would have been out of new employees very soon. Questions like this only intimidate the applicant and are much to complex for interviews. Also you seem to have the idea that you know the only "truth". This is ridiculous.

gawker · on Oct 28, 2010

Great thought question. Being a junior developer, I haven't the slightest clue on how to answer it. Having been exposed to merely algorithms in school, would it be safe to assume that this wasn't really targeted at software engineers but more towards system designers?

kplcjl · on Oct 29, 2010

Sorry, I'm not a hacker, I wouldn't know the first thing about setting up an interface to a hard drive. However, the first thing that pops out at me is that you just described a 512 GB addressing scheme. You want half your disk to be unaddressable?

myelin · on Oct 28, 2010

Hah, I remember Andrey asking me this question in my Ning interviews :)

jtemplestein · on Oct 28, 2010

My favorite: "Write a script that will save you one minute of time every day"

closely followed by

"Draw a picture of your job/Facebook/the future /whatever"

swaits · on Oct 28, 2010

A picture? Are you hiring artists? Seriously. What do you get out of that?

jtemplestein · on Oct 28, 2010

A picture is a useful abstraction of reality. It forces people to think about what they'll 'answer' before they do. Some people might draw a diagram, some might draw a story, who knows. It tells a lot about people and how they think.

For some people this may seem like a fun task and they'll get excited about it, some may not find it useful and argue about it and others (the ones you want to avoid) will frown upon it and say 'Fine. I'll do it anyway'.

Most of the people I've interviewed are well accomplished engineers, but that doesn't always mean I want to work with them.

Why do you think it is NOT useful to draw a picture?

jemfinch · on Oct 28, 2010

I think it's not useful because you're not hiring me to draw pictures. I'd pass on any interview where crayon drawings of my future had any impact whatsoever on the likelihood of my hire.

timcederman · on Oct 28, 2010

It takes a certain personality type to like a task like that, and that doesn't correlate strongly with being a good engineer.

swaits · on Oct 31, 2010

It's not useful because it has nothing to do with what you're hiring them for. Frankly, if I were asked to draw a picture, I'd probably walk out. If that means you don't want to work with me, then I probably wasn't going to enjoy working with you either.

I've been hiring game programmers for a long time now. I think there are plenty of other ways to evaluate the non-tech personality/team-player side of candidates. Start with talking to them.

I ask very blunt questions in this area. Will provide examples if folks are interested.

nirajr · on Oct 28, 2010

Great questions to ask early employees for startups.

clarrs · on Oct 29, 2010

I would walk out of this interview!