> problems on DP are pretty standard in most product-company-based hiring challenges
This is sad and a little surprising to me. I've always thought of DP problems as being obviously terrible interview questions, even among the different types of algo questions (which are already heavily criticized). Candidates who have learned DP in school usually find them really easy, and candidates who haven't usually don't even have a chance at solving them, so really they just test whether the person has learned DP, rather than other algo problems where you can at least try to claim that it's a proxy for "general problem solving ability". And DP comes up almost never in the real world (and when it does come up, you're often still better off taking a heuristic/inexact approach), so testing if the candidate knows DP is also almost completely useless.
If you're an interviewer at a company that asks DP questions, please reconsider. There are almost certainly alternatives that are more fair and higher-signal.
"and candidates who haven't usually don't even have a chance at solving them"
I am one of those candidates, and I don't know why it is called Dynamic programming. To me a vey naive understanding of DP is this - it's just a simple cache mechanism to store intermediary results so that you don't have to recompute them and waste resources.
In the real world we always think about and do such optimizations, be it I/O, disk access or db access. I would love to understand how DP is any different.
> [...] I don't know why it is called Dynamic programming.
According to Richard Bellman, who came up with the name:
An interesting question is, Where did the name, dynamic programming, come from?
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington
named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the
word research. [...] What title, what name, could I choose?
In the first place I was interested in planning, in decision making, in thinking. But planning, is not a
good word for various reasons. I decided therefore to use the word "programming"
[...] it's impossible to use the word "dynamic" in a pejorative sense. Try thinking of some combination
that will possibly give it a pejorative meaning. It's impossible.
Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.
According to Richard Bellman, who came up with the name:
An interesting question is, Where did the name, dynamic programming, come from?
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research. [...] What title, what name, could I choose?
In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word "programming"
[...] it's impossible to use the word "dynamic" in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.
Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to.
Problem is when some text copied and pasted here incidentally starts paragraphs with two or more spaces.
And maybe the the commenter does not know why it formats weirdly.
> We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word research.
Amusingly, Engine Charlie [1] was an engineer. You'd think an engineer understands the value of research :)
DP is a general problem solving approach. It's about recognizing which parts of a naive solution have repetition which can be eliminated to improve the time complexity of the solution. Another poster aptly described it as finding states which can be collapsed into one. However you want to think about it,
- DP is a general problem solving approach
- DP is not limited to any particular implementation detail (like my parent poster attempts with "it's just a simple cache mechanism")
- DP is not a "trick" that you can learn and then solve any DP problem
If we have a DP solution, we can write it in many ways. We can write it top-down with recursion and memoization, or we can write it bottom-up without memoization. Whether we use memoization in our solution or not is a minor implementation detail. Both solutions are DP. Other posters seem to make a distinction between these two solutions as if they are entirely different and as if the recursive solution is not dp - this is also incorrect. The challenge in DP tasks is about improving time complexity by recognizing repeated sub calculations.
In "Algorithms" [Dasgupta, Papamdimitriou, Vazirani] they state that the memoization (e.g. querying a hash table) can have significant overhead leading to a large constant factor in the big O analysis. On the other hand, the bottom up approach using a table solves all the possible subproblems including ones that are not needed and ones that the recursive approach would avoid. So from a big O perspective the recursive top-down and the table bottom-up approaches are the same but there can be significant constant factor differences.
Then mention this in the interview. I ask an algorithms question that would likely be hated on HN but when candidates tell me the practical difference (API cleanliness, maintainability, performance, extensibility) between approaches that ultimately don't impact asymptotic runtime I love it.
Yeah I thought the recursion+memo wasn't actually "dp" until I looked it up. Recursion without the memo is not DP however since you hit exp running time/memory and the whole point of DP is to avoid this.
The tricky thing with dynamic programming is realizing that it fits—you have to understand how the problem you're solving decomposes into overlapping subproblems that are worth caching. Once you know what the subproblems are, implementing the algorithm is mostly a matter of getting some fiddly indexing logic right.
Take string edit distance for example: the key to the dynamic programming problem is seeing how the exponential blow up of possibilities actually has a rich overlapping structure that you can cache in a table. If you didn't know string edit distance was amenable to dynamic programming, you might not realize it at all even if you are constantly thinking about caching results. Once you see the structure, the algorithm becomes relatively straightforward.
Absolutely, but I attribute the ability to decompose a problem into overlapping subproblems as an understanding of recursion, so I had implicitly assumed such an understanding.
It's a dp problem, and you understand how to recurse just right, and you understand memoization, so you'll be able to solve it, right? Link your solution when you reply please.
In my "real world" we normally don't care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available. I'm not saying big-O is useless or CS wizards are never helpful. It's just that you need one or two of them for a large team of normies, IME.
I have a problem with this notion that knowledge of algorithms is required to be a good engineer though. Case in point: Senior algorithm nerd on my project is going nuts over algorithmic complexity and delaying an early delivery to another team so they can begin using our new feature. In our case, n is fairly small no matter what, so 2n vs n^2 really doesn't matter to us. The code he's optimizing isn't even the long pole in the tent. It calls other routines which take 2/3 of the total time anyway. We could just deliver now and improve later when we have real world data on how people want to use our feature, but nope, we're wasting time on endless rewrites in search of perfection which may not matter if the other team can't deploy our stuff in time.
>> Senior algorithm nerd on my project is going nuts over algorithmic complexity
This is me, but luckily where I work I have people who can keep me in check because we generally do design reviews before anything big is built.
However, I have been in situations at previous companies where big(o) was ignored to take short cuts up front, because the "data was small" and suddenly scaling to even just 100 users starts to break things because of poor design decisions when it gets into production.
I guess the lesson here is more the importance of design reviews. Also n^2 is HUGE even for small data if there is IO or api calls involved. Any public api you provide, that is n^2 is not a good idea because you never know who may end up using it for what.
Right. In my case, the operation was an extra memory comparison. For something already in the cache.
Sure, constraints can change and your assumptions about n<10k may prove unwise, but that's our call to make as engineers. YAGNI. If you know n is never going to grow, then why waste time on it? We're not paid to write pristine code. We're paid to solve problems while hopefully not creating new ones. Pragmatism and all that.
> In my “real world” we normally don’t care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available.
Big-O is just a formal word for how much time you’re going to take, a way to figure out if you’re likely to take more time than available. So, it sounds like you do care about it, you’re just saying it doesn’t matter if you’re formal about it?
If your team’s n^2 worst case really is in the noise, then you’re absolutely right, the senior is prematurely optimizing.
But without more context, I have to assume there could be reasons your senior “nerd” might be right. Is there a noticeable difference to the user in run time with the largest n you can possibly have? Is the other team you’re delivering to going to call your n^2 routine from one of their own, one that has it’s own O(n) or O(n^2) outer loop?
I’d say big-O notation isn’t difficult, doesn’t take long to learn, doesn’t require school, and knowledge of it doesn’t make someone a wizard. It’s easy. It’s a low enough bar, that I do expect everyone on my team to feel comfortable looking at a function with loops and being able to tell whether something is O(n^2) or O(n^3).
The difficult cases are determining complexity of an architecture full of lots of interconnecting O(1..n) functions, when the complexity isn’t all in one place. I watch unnecessary O(n^3) algorithms get made even on teams of people who all know big-O.
> Big-O is just a formal word for how much time you’re going to take, a way to figure out if you’re likely to take more time than available
This is a common misunderstanding about big-O. It's not about how much time you're gonna take but it's actually a measure of how complexity affects time growth as the data grows.
"Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. "
The quote from Wikipedia is factually correct, but it seems you misunderstood my comment.
> It’s not about how much time you’re gonna take
It most definitely is about predicting running time, or memory consumption. Big-O is literally a straightforward way to write the formula for best/avg/worst case time or memory of a program, modulo a constant. The second part of your sentence contradicts the first part.
I’m not sure I buy that there’s any common misconception about Big-O either, I’ve never seen much evidence of that. Anyone who’s learned Big-O in school, or has used it at work, or read past the first sentence on Wikipedia, or commented about it on HN, knows what Big-O means and that for small n, O(n) can be bigger than O(n^2). It’s just not that tricky.
I like Wikipedia a lot, and I agree this one’s right. Why do you dislike it, and why is that relevant?
I'm not the person you're replying to, but it seems to me that there may be an example of a common misconception about Big-O in your comment here:
> Big-O is literally a straightforward way to write the formula [describing] best/avg/worst case time or memory of a program
Technically, Big-O notation should only be used to provide an upper bound on the best/avg/worst case[0]. And typically people only use it to discuss the worse case or average case, just because talking about the upper bound on an algorithm when its inputs are restricted to its best possible inputs is typically much less useful.
But providing an upper bound is not quite "writing the formula" (giving a complete picture), because you haven't specified the lower bound (Big/Little-Omega) - very loosely speaking, if you only ever talk in terms of Big/Little[1]-O, you've only provided half the picture. Of course to be fair, Big-O of an algorithm is the side of the picture that is way more likely to bite you in production!
Disclaimer: not a mathematician, so salt to taste :)
The “worst case” is the part referring to the upper bound. Big-O is referring to the “order” of the run time, meaning the largest term in the formula for large n.
Again, everyone knows this if they know Big-O at all. I am intentionally not being picky and pedantic with my words, because the context of this thread was people complaining about unnecessary formality and unnecessary and unhelpful over-concern about academic correctness rather than practicality. There is no widespread misconception, but there are people who like to wax philosophical and try to demonstrate their superior knowledge...
My side intention was to detail why @rafiki6 might have been incorrect without knowing it, since they claimed correctness and complained about downvotes.
Not entirely sure why I am getting downvoted here. It's factually correct and relevant to the discussion. I'm starting to think someone is actively trying to downvote my comments.
If you really do need to know that stuff, you can read up on it when it's needed. Been programming professionally for over 15 years and have had to use this stuff once that i can remember. Knowing how to design and index a database well is way more useful.
To give the other side of the coin here- you at least need to know the basic logic behind it if you're going to be able to recognize when reading up on it might be needed.
If you have never considered the benefits of caching certain computational results in a recursive algorithm, than you probably wouldn't be as quick to recognize when that technique would be useful.
Exactly. And also, even if you somehow recognized that "you need to use a DP solution here", it's not as easy as "read up on it" (like GP says). DP is not a trick that you can just learn once and then apply everywhere. It's not a npm package that you can install that solves your problem for you.
> In my "real world" we normally don't care about things like big-O complexity. We worry about doing dumb things and not taking more time than we have available.
Any work in any kind of scale setting has to implicitly deal with this. You might not realize it immediately, but if you are dealing with a network of 100k+ nodes, or some database with 10M+ entries, or anything of that sort, there is a huge difference between doing something in O(N) or in O(N^2) or O(2^N). Just a nested loop where for each node you need to gather information from each other node is completely out of the question (quadratic vs. linear). Or where you try all combinations of something (exponential). Or where you look up something (linear search vs. logarithmic). I deal with such problems every day. And my job isn't anything special. You may call those "dumb things", but under the hood it's just asymptotic complexity, aka "big-O".
It could also be that your "real world" does not contain scale settings. But then please don't generalize; the industry is full of scale problems to be solved on a day-to-day basis.
> Any work in any kind of scale setting has to implicitly deal with this.
Obviously. And most experienced engineers know this. But a lot of us never deal with n > 10k or so. I've worked a lot in embedded, and even when n is large, it's usually just moving around a payload. Even when I've dealt with n>>10k, say writing a network server, I've rarely been concerned with complexity. I focus on doing the minimum amount of work possible. It's basically the same thing, without the academic formalism. The main rule of engineering seems to be "don't do stupid things()."
When n is guaranteed to be very small it does not matter. Even an exponential solution would be fine in these cases. The 'algo nerd' on your team ought to know this.
Sadly, Perfectionists share a trait when it comes to building software, obsession to their craft at the expense of everythng else, shipping times included.
I think we all derive joy from writing 'perfect' software...but then the rules are changed and perfection becomes garbage :)
Like the other piece of software I was working on for this project. Nice little test framework with a beautiful OOP structure... which became a noose as soon as I wanted to add a new feature. Now it's this frankenbeast with a weird appendage glued on.
DP means you map the problem to a grid, and your cache connections are your neighbors on the grid.
Memoize (aka caching) was invented later than DP, and it’s easier and the same complexity. You don’t have to map your problem to a spatial layout.
One big difference is that DP often involves windowing your cache. While the problem might be on a 2d grid, the storage for your cache might be just two rows, and the algorithm will sweep down rows only once. So, a DP solution might be far less memory than a straightforward memoize, unless you’re doing work to clean up memoize as soon as it’s no longer needed.
Another big difference is that with DP you’re pre-allocating, while with memoize you’re typically using dynamic memory allocations by default. That obviously doesn’t have to be true, but makes a fairly naive DP solution faster than a fairly naive memoization.
What makes dynamic programming special is that it exploits optimal substructure, where a problem can be broken down into smaller sub-problems, the solution to which must be part of the optimal solution to the larger problem.
While caching can be applied anywhere, dynamic programming can only be applied where optimal substructure is present. For example, finding the shortest path between all points in a graph. Likewise, finding the longest path doesn't have optimal substructure and so dynamic programming cannot be applied in that case.
Yes, basically it is exactly that. (For why it is called like that, see other answers; but that's also somewhat irrelevant.)
But the main task is to identify how to iteratively / recursively solve the problem and what to cache (and then you should also be able to tell how efficient your solution is). This is the tricky bit. It doesn't matter if you call it DP or whatever. But it matters to be able to come up with efficient solutions for such problems. And this is what companies want to see.
> In the real world we always think about and do such optimizations, be it I/O, disk access or db access
That's they key. You've come across caching as a way to optimize for a given metric. Folks who haven't heard of DP aren't aware of using DP techniques such as memoization, for example, to speed up repetitive tasks [0]. Similarly, you might not be aware of many other techniques you haven't come across [1][2] how could you possibly solve those problems in the exact same way? For instance, QuickSort and its variants were PhD thesis worthy, once upon a time (if not now).
[0] Not everything is intuitive, DP certainly isn't, imo. As a rule of thumb, naive solutions to a problem are often times intuitive.
It's incredibly surprising to me. Our interview problems call for things like DFS, tries, stacks, etc. I've never seen a candidate so comfortable with these much more basic topics that I could imagine them writing a DP algorithm (much less proving it correct - we never ask that) in 45 minutes.
We are taught to prefer problems with several independent hurdles, each of which can be reasonably thought through even if not familiar with an algorithms textbook. From my memories of algorithms class, DP problems usually had one very tough "aha moment" to crack. Not a good way for the candidate to show problem solving skill, just more of a binary "did they get it or not?"
> Candidates who have learned DP in school usually find them really easy, and candidates who haven't usually don't even have a chance at solving them
is directly related to
> DP comes up almost never in the real world
If you haven't studied DP at school, done competitive programming, or Leetcoded for interview prep you are not likely to have encountered it doing everyday work. And even naturally talented coders are going to struggle with unfamiliar types of problems.
Unfortunately, I don't see a way out of this. It's probably true that programmers who are good at solving DP problems are generally quite good, which is why DP problems get asked at interviews. It's a good signal for ability to learn difficult things, which is a desirable quality in employees. This means that good programmers who don't know DP are forced to learn it or accept that they won't clear some interviews. Absent some other way to clearly signal technical ability, I think DP problems are here to stay.
> It's probably true that programmers who are good at solving DP problems are generally quite good, which is why DP problems get asked at interviews.
Yeah, no. (Source: I've seen a few interviews in my time. On the order of 5k of them)
DP problems are absolute shit at giving a signal, even if the candidate knows DP problems. Because "knows DP" is pretty much the only bit of information you get, with possibly a slight seasoning of "doesn't write utterly horrible code".
The reason they're asked is the reason most interview questions are asked: The interviewer picked them up somewhere, is now familiar with them, and will ask it till the cows come home. (It also allows you to feel all smart and academic when asking it, but that's rarely if ever the main motivation)
Here's an entirely novel concept for evaluating if somebody can write code that solves actual problems. We could ask people to, IDK, write code that solves actual problems. They're there for a day. A good coder can solve some pretty interesting problems in a day.
I know, I know. Heresy. Who'd ever evaluate people by looking at how they do their actual work if you can instead recite shibboleths on a whiteboard?
(Yes, I'm bitter about the tech interview process)
I was in a team with a very talented competitive programmer guy. He came up with solutions for some pretty tough problems I couldn't wrap my head around.
Fantastic programmer right?
The other 99% except hardcore algorithmic optimizations, like structuring code, creating simple code, naming variables, splitting code into classes/functions and refactoring? No clue about any of that. All he produced was "read-only" and had many glaring problems.
It wasn't only during competitions either, he used the same style for assignments. Turns out he got shafted by the assistants and then he copied answers from another student (yes he got caught).
At my first company, there was this one file in the solution that we called DaVinci.cpp because it was over 20000 lines long and written by one developer who had long since retired. The thing is, the code was complex and worked so no one ever refactored it.
> The other 99% except hardcore algorithmic optimizations, like structuring code, creating simple code, naming variables, splitting code into classes/functions and refactoring? No clue about any of that.
So...you're saying a student wrote code like...a student? Color me shocked.
You can coach talented programmers to write neat code, design classes, and name their variables right. It's part of turning a junior engineer into a senior engineer. It's far harder to teach someone how to solve programming problems.
> Here's an entirely novel concept for evaluating if somebody can write code that solves actual problems. We could ask people to, IDK, write code that solves actual problems. They're there for a day. A good coder can solve some pretty interesting problems in a day.
My last interview was structured like this. Remote. Able to work on it outside of work hours (so no missed office days). Tracked a shared repo. Was paid a retainer up front for n hours worked on the task at an agreed-upon hourly rate.
Needless to say it was a great experience.
I've also had interviews in the past where the interviewer didn't have the correct answer to a question they asked me, and just got confused when I tried to explain my answer more fully. In the end they said I was incorrect, due to their incorrect answer (that they must have taken down from another source). I didn't get that job. Probably for the best.
There are also some people who just shouldn't be interviewing others. They just don't have the emotional intelligence to manage the situation.
> I've seen a few interviews in my time. On the order of 5k of them)
That's actually pretty cool. May I ask how?
> Because "knows DP" is pretty much the only bit of information you get
Yes if the candidate meets with only one interviewer or if every interviewer asks only DP. I hope most companies don't interview that way - that would truly be a broken interview process. Obviously you ask about other stuff in the other interviews.
> The interviewer picked them up somewhere, is now familiar with them,
Lol, isn't that one reason every interview question is asked? I've never asked a question whose answer I wasn't familiar with. I know how you meant that though.
> It's pretty hard to get someone solving actual problems in a single day. Do you have some examples of how this might be done?
Solve a small problem - of which millions can be found in issue trackers.
Here is an example I just read on Github:
Somebody reports that reading a JSON file results in a value that clearly is an integer being misidentified as "Not an integer". The source code is a parser written in C and the code is quite readable.
Give the candidate an hour to try to figure out what is happening (obviously, the language and context must be a match for the job).
Important - I think: It would be good if the problem is unsolved and the interviewer him-/herself does not know the answer. Also: Don't send them into a room alone, watch them. I know that adds stress, but learn how to lower it, make it clear you too don't know the answer. If the candidate really can't cope with someone watching, well, okay, let them try alone, I just think if the atmosphere is right this is valuable. There is no right way to approach solving such a problem, but without and not meant for judgment, I think this is just interesting to see how different people approach problems. Of course, if the interviewer has string opinions about how it should be done that is a bad method and I myself would not like to be interviewee in that case.
> How long it took somebody else in your team to solve the same problem
I don't think that is useful. Variance is too great, even for a constant single person.
I think that anything that attempts to cast this issue as some objectifiable measurement problem is on the wrong track. Measurements work - on a statistical scale. Even then its hit or miss depending on how good the chosen values to be measured are. Just like public health issues. As soon as you try those approaches that work just fine on a large scale on individual problems you are toast. The results of those individual decisions can be measured using statistical methods, but don't try them on individuals.
> How long it took somebody else in your team to solve the same problem
I don't think that is a good measure. I know many developers who want to solve everything fast because fast is productive in their eyes. The problem is they end up having a bug or three and spend some back and forth time with QA. So, now they didn't actually solve the problem fast and they wasted QAs time. I prefer to get a more holistic understanding of the code before fixing the issue and not get any bugs back from QA. In retro, when the QA tells the team how many bugs there were, I am proud when I'm not a part of that count.
I disagree. If we're considering the debug/issue fix approach (which I quite like, because it displays code literacy and clarity of explanation in a context like that one would find in a typical workday), if I use the same task/tasks for each candidate, I'm implicitly (and tremendously) biasing myself toward a universe shaped by the tasks I chose. What I'm really looking for is a more universally distinguishably trait of lucidity and perspicuity (coupled with sufficient technical skill, which, again, bias and variance from your tests' structure is always an issue in accurately scoring), in any case.
I think this would be great. It would give me, the interviewee, a chance to feel out how it is to work with the person interviewing me. Will they crap on every solution I propose? Or will we have some back and forth and find a solution that suits us both?
Also, you don't have the added stress of writing all the code from scratch, having to whiteboard recall boilerplate that a good IDE handles for you.
See another comment above for explanation how I got to that count. (In essence: reviewing lots of interviews)
> Do you have some examples of how this might be done?
I do :)
We did it at a previous company. We brought in several candidates at the same time, declared them a team, and gave them a half-finished video game (since it was a games company). With the goal of turning it into a better game by the end of the day. We also embedded one employee with that team for the whole day, and other reviewers would drift in and out, observe how people worked together.
The key components there:
* Some groundwork is already done, you pick up an existing project. (Also tests a really important skill - reading and understanding code)
* You work in a team - more output, and tests for your ability to work in a team.
* You have a source of experience that can answer all your questions around tooling and libraries, and you can lean on them to get some of the hairier stuff done. (Also tests if you're willing to let go of the desire to show off to instead make things happen for the team)
And yes, we'd hire the whole "team" of around three people if they were all good, and we'd pass on everybody if they were all bad, and everything in between.
It worked pretty well, gave really good results, and both interviewers and interviewees enjoyed the process. It's also something you can't really learn by rote in advance, because you do what you'd be doing every day anyways.
I won't lie, it sounds like an interesting process and I wouldn't mind participating once if I'm qualified. I wonder if you could answer some more questions about it. Sorry, I know it's asking a lot, I understand if you don't answer:
1. What if you want to hire a primarily Python-writing programmer to write Ruby or C#? They're not going to be super-productive on day 1 - maybe they'll struggle to run gem install or whatever. Even though they'd have made a perfectly fine employee given a week's worth of ramp-up, they may end up flunking the interview (Maybe it's a less of a problem in games because everything is C++? I don't know). How do you calibrate these performance variances due to lack of familiarity with the specific tech?
2. How do you ensure that the odd toxic candidate doesn't ruin the team dynamic? It's hardly fair to reject everyone if they'd have done well without the one asshole.
3. It seems odd to expect people to collaborate in a situation where (for all they know) they are actually competing for the same position(s). What if you don't actually have enough open positions to hire everyone?
4. How do you track contributions by individual candidates? Commits? Observations by the embedded employee?
5. Do you tell candidates upfront they're being graded for co-operation as well as productivity? (and is that how you're actually grading?)
1) Games are somewhat easier, because C++ is pretty common. It'd also work for larger companies, because you can group candidates by languages. I don't have a good answer for companies that can't do that.
2) That's why there's an observer with the team the whole time. If there were a truly toxic person, they'd have the power to pull them. I didn't see it happen (likely because everybody is on their best behavior during interviews), but I did see a few large egos - and they were pretty much always shunted into a corner of the project.
3) You obviously shouldn't do that if you can't afford to hire everybody. Which means it's likely not the right process for smaller companies.
4) Both by the embedded observer, and by looking at commit history. But it rarely comes down to that - what you want is "smart, and gets stuff done", and that fairly clearly separates out without counting commits. Or maybe we just got lucky.
5) All we told candidates is that we'd expect them to have a working game by end of day. Grading was less by cooperation, and more by how much of a positive contribution people were. We e.g. had candidates that up-front said "look, I work best by thinking quietly, and I'm an expert on X - would you mind if we carved out a subproject around that, and we reintegrate from time to time". And we did hire them - because a person who knows what their style is, and how to collaborate with a team with a different style, is a huge win.
On a lot of companies’ interview loops, if one interviewer says “no hire,” it’s an overall “no hire.” If that person is asking DP, and you don’t know it, then you will fail the whole interview.
> Here's an entirely novel concept for evaluating if somebody can write code that solves actual problems. We could ask people to, IDK, write code that solves actual problems. They're there for a day. A good coder can solve some pretty interesting problems in a day.
You mean solve an actual business problem that the interviewing company has? The ramp-up time to be productive in any real business that I've been associated with is significantly greater than the time available in a single interview (usually an hour or so). This is why toy problems are used.
Another issue is that it's not really fair to ask an interviewee to solve your company's problems for free. In fact, without some sort of contract, the interviewee technically owns her own work, so even if she manages to solve a real problem, the company really shouldn't use it.
(BTW, 5K interviews at one hour per interview is 625 eight-hour workdays. Assuming 50 work weeks of five days each in a year, that's 2.5 work years devoted to nothing but interviewing in your career. Yes?)
> I've seen a few interviews in my time. On the order of 5k of them)
Sorry, your "5k" doesn't pass the smell test, unless you can give some convincing reason why 20 years of one interview per day would be believable.
So, I'm calling BS. You just seem bitter about interviews that you don't like and are trying to con people into giving your opinion more weight than that of others.
Besides, if other companies are doing interviewing "wrong" as you think, then just do it "right" in yours and have a competitive edge by hiring all those brilliant coders that other companies are too stupid recognizing. That would be a reason for you to be joyful instead of bitter, no?
If you interview at a traditional tech company, you go through 5-8 interviews in a day. If somebody were to review, say, around 4 of these packets every week, they see about 20-30 interviews every week. That's ~1000-1500 a year. The 5K mark comes faster than you think.
And if you work at a company that size, changing the interview process is a bit more work than "oh, we just won't do that".
It's not like DP is the only way to solve problems even designed for a DP solution. So an interviewee might have a general technique that applies for most problems but when faced with a DP problem their method takes more time. Now in an interview, taking more time is generally bad and this candidate would be judged lower. However, in real life scenarios, since DP problems are so rare, the interviewee's method might actually be faster.
Basically, giving any test question that is dependent on someone knowing something they could look up in a book is not as effective as testing how well someone reasons about an unknown type of problem.
For what it's worth, I use DP problems extensively for a main reason: in my experience I have found an extremely high correlation between folks who do well on DP problems and folks who are good overall engineers.
I have never had a candidate that did very well on DP problems that didn't end up being an overall great programmer (though I certainly have had folks do well on DP problems who were deficient in other ways, e.g. communication, etc.), though I have had some 'false negatives' with candidates who did poorly on DP problems who still ended up being productive employees.
Also, I wouldn't use DP problems for some types of roles, but in general I find that the ability to think recursively is overall highly indicative of programming ability. I'm not the only one who thinks this way: https://www.joelonsoftware.com/2006/10/25/the-guerrilla-guid... . Joel is talking about pointers specifically in this quote but I think it equally applies to recursion:
"I’ve come to realize that understanding pointers in C is not a skill, it’s an aptitude. In first year computer science classes, there are always about 200 kids at the beginning of the semester, all of whom wrote complex adventure games in BASIC for their PCs when they were 4 years old. They are having a good ol’ time learning C or Pascal in college, until one day the professor introduces pointers, and suddenly, they don’t get it. They just don’t understand anything any more. 90% of the class goes off and becomes Political Science majors, then they tell their friends that there weren’t enough good looking members of the appropriate sex in their CompSci classes, that’s why they switched. For some reason most people seem to be born without the part of the brain that understands pointers. Pointers require a complex form of doubly-indirected thinking that some people just can’t do, and it’s pretty crucial to good programming. A lot of the “script jocks” who started programming by copying JavaScript snippets into their web pages and went on to learn Perl never learned about pointers, and they can never quite produce code of the quality you need."
> For some reason most people seem to be born without the part of the brain that understands pointers
Well, that's just false. Everyone can understand pointers given both time and interest. I realize Joel likes to think him and people like him are "special" and born with innate super powers, but we have overwhelming evidence that it isn't true.
I certainly wouldn't recommend hiring someone to write embedded systems that don't understand pointers, but if you are hiring a generalist, a good engineer can figure it out if it becomes a job requirement down the road.
These are the reasons literally everything Joel says should be taken with the biggest grain of salt you can find. He seems like a smart dude, but like a lot of smart dudes that are aware of their intelligence he highly overestimates (and overvalues) his particular opinions.
My advice: don't be like Joel and use anecdata to drive your theories on what constitutes a good interview. You're already writing exactly like he does, and that's not beneficial to you or to those you're interviewing.
> Well, that's just false. Everyone can understand pointers given both time and interest.
I'd like to believe that's true, but I've spent quite a lot of time trying to explain pointers and recursion to people and either they get it right away or practically never. Sometimes they end up with some cargo-culted half-way understanding that lets them solve most problems but they still don't seem to understand what's happening. Joel is a bit pompous, that's true, but there definitely are people who cannot understand pointers when taught in a normal CS course, and if 4 years of university didn't teach them pointers I don't think they'll get them while programming in the field.
As an aside, the fact that some people can't grok some things shouldn't be controversial. I myself can't understand lots of things and probably would never understand them in the way that people who grok them instantly do. People have different modes of thinking, and I do not doubt that there exist people who cannot grasp pointers and recursion in the same intuitive way that I do. That doesn't make them less smart in general, but it does mean they're a bad fit for e.g. a C programmer position.
Speaking for myself, I was the person Joel described except that I didn't switch majors. The pointer wall first struck at the tail end of my HS years and then crushed me early on my Freshman year of college. It took me a couple of years but I finally wrapped my head around pointers. It could be that I really do fall into your half-way camp, but I don't think that's the case.
I was a terrible student more interested in playing games than doing any actual work, so that I'm sure was a large part in the problem being "years" instead of "months", but either way it was something I learned and not something I was just born with.
I do agree that for any person there will be things that they get right away and things that they'll never get. But I think those vary more than people realize.
It's not controversial that different topics are difficult for different people, but the idea of "innate" understanding is absurd. And the idea that code quality has a direct causal relationship to code quality is even more absurd. He certainly didn't qualify that statement to imply C programmers have lousy code when they don't understand pointers, but how many people that struggle with pointers want to jump on that ship anyway?
End of the day, learning its a personal thing, and some people struggle in some topics and not in others, but under no circumstances do I think there's an innate ability to understand pointers and some people are never going to have it. That just sounds like C programmers trying to pet their ego.
> I'd like to believe that's true, but I've spent quite a lot of time trying to explain pointers and recursion to people and either they get it right away or practically never.
Have you thought about may be you don't understand pointers and just think you do ? May be you should not start teaching and explaining about pointers before you understand it well. If you can't pin-point why your students don't understand and just say they won't understand at all (may be weak math background or something else), then I'm sorry you lack the knowledge to teach or use pointers.
A good excuse for struggling with recursion is that the person is a kernel programmer. There, on a very limited stack, recursion is so dangerous that programmers develop a strong reflex to get rid of recursion. If you tell them to produce something recursive, at every line of code their mind will be screaming DANGER and trying to fix the problem.
Such people are better tested on problems involving cache lines, TLB entries, memory-mapped IO hardware, DMA, atomic instruction opcodes, and interrupts.
I guess I'm more worried about false negatives (i.e. people doing poorly who would do well on the job) than false positives. I have worked with excellent coworkers who I think would likely do really poorly on DP questions because they didn't do contest programming or didn't happen to learn it in their education.
Recursion is a great thing to include in an interview (in moderation), no objections there. Many real-world situations are naturally recursive, and a good grasp of recursion helps build a strong intuition for many aspects of programming. DP (specifically designing recursion with overlapping subproblems) is a much more specific technique that I think isn't as beneficial for programming ability, though I certainly don't doubt that people who know DP well are often also good programmers.
That article by Joel goes into detail, but I agree with him that false positives (hiring someone who turns out to be a dud) are MUCH MUCH worse than missing out on someone who is potentially good (especially the rarer case of someone who is potentially good but still did poor on the interview).
> For some reason most people seem to be born without the part of the brain that understands pointers.
If you can understand P.O. box numbers, you can understand pointers. They're the same concept at the level of modern application programming, where the OS, MMU, and cache are comprehensively lying to your code to present the illusion of a completely flat address space. At that level, all a pointer is is a box number, and you can say things like "Get me the thing in box number 23" or "Get me the thing in box number 42 and the three boxes after it, put those four things together into a number and then get the four things at that box number and the three boxes after it" and, while the second thing is involved, it's straightforwards and it is precisely the kind of double-indirected thinking Joel is talking about.
To get a bit deeper, you can use the old "Paging Game", also known as the story of the Thing King, to introduce virtual memory:
This is wrong in almost every specific detail when it comes to modern systems, as it dates from circa 1972-1974, but you only need to change a few numbers around and it's good enough in a lies-to-children sense. The only substantial change that would be required is if you want to mention ASLR.
And that's it. That's it until they actually get intimate with a specific hardware architecture and OS and learn very specific, rather ephemeral details. Pointers just aren't all that special.
Yes, and then you have to make people understand that box [number 23] is itself in a box and 23 can change at any time. And you can ask for the thing in box [the thing in box [the thing in box [the number that was passed to this function]]]. It's a double and sometimes triple indirection and understanding that pointers and the things pointed to are the same kind of data is the hard part. It's numbers all the way down.
You're making a strong point here, which is that every real world analogy of pointers breaks down since everything is data. But I'm still not sold on the idea that pointers are intrinsically a "make it or break it" topic for people to learn.
In fact, I'm resistant to that sort of conclusion in general. I don't really think there are things that people of average intelligence can't learn, given enough time and pedagogy. And I say that as someone with a background in mathematics, which is one area most typically associated with people "not getting it."
If people don't learn pointers, it could be because they're not sufficiently motivated or don't have it presented to them the right way. It could be that the overuse of analogies does them a disservice, and they really just need to pick up a book. Whatever the case may be, I'd hesitate to say it can't be learned by any population of people.
Your reference to Joel and how you rely so much on your intuition tells me that you tend to have "I'm better than other" type of feeling just because you understand DP. This is the exact toxic I would want to avoid in the workplace. Same with Joel's reference, I'm shocked that he thinks understanding pointers is not a skill (may be I should interview him on pointer arithmetic and see just how good he is). My experience has been that these interviewers who say they are so good in DP, pointers, recursion and throw this aura of 'false negative' and know it all show true colors after the interview. They make wrong decisions on when to optimize, are clueless about scale and distributed systems, can't write complex sql, just don't know how to turn on little knobs to solve the problem so write huge unnecessary code. All this is probably because they never thought more than DP/recursion (which I would avoid at any cost in my code base).
I’m not seeing the connection here between DP problems or pointers indicating great engineers. I’ve worked with people who were great at programming problems but couldn’t actually solve real world problems without step by step guidance, folks who cannot communicate at all with the team or with product people, folks who have a terrible work ethic, etc. The ability to solve DP would be nothing but noise if we were to go back and time and hold these interviews differently... and to be honest, we almost always had at least some recursion or DP problem. All it really did was bias our hiring towards grads fresh from school.
Regarding understanding pointers... you may as well use the candidates understanding of object oriented programming or knowing how to test their code. Pointers seems like an arbitrary litmus test, and it comes off as you feel like understanding pointers make you special when it really doesn’t.
There are several "hard" things in CS/software-development that I keep having to convince myself I do, in fact, understand, every time they come up in a discussion like this, because I find them really simple and wonder whether maybe I missed something. One's pointers. I don't get what's so hard about them. I mean I get why they might be tough the first time you're introduced to them but it's exactly the same kind of reasoning/structure that's used all over the place in programming, so if one can't understand pointers I'm not sure how one's programming at all, really.
DP questions are often pretty darn awesome. The solutions are so frequently elegant and really make you think hard about problems. When I fail them I learn so much... it's practically a win-win. I've been completely stumped by DP questions even after having done tough DP problems in multiple classes in school so I most definitely don't think those who've seen them in school find them really easy, unless for some silly reason you're asking exactly the same classical DP questions that everyone sees in school, like say Levenshtein distance. Even small twists can make the solution no longer be obvious, e.g. for edit distance consider an affine gap penalty (used in computational biology).
In contrast, I find that the ability to formulate a problem using DP coorelates with whether a person is able to think in a mathematical way. In many dynamic programming problems you need to first define a function P(i,j) or some such, and then define it recursively; when proving the correctness, you use induction. The ability to grasp this recursion and induction is correlated with the level of abstract mathematical thinking a candidate is capable of.
That said, if a candidate goes to LeetCode or some other site to practice all the DP problems, then it's possible that the candidate can only solve DP problems without gaining other abilities.
If someone can learn to solve just DP problems by practicing in leetcode then DP is a useless metric or proxy of someone’s ability to reason mathematically.
You should have said “the ability to solve DP problems is a good metric of the ability of a candidate to solve DP problems“. I would have agreed with you in that case :)
> If someone can learn to solve just DP problems by practicing in leetcode then DP is a useless metric or proxy of someone’s ability to reason mathematically.
Metrics that can be gamed can still have a lot of signal in them because the cost of gaming the metric is higher than the benefit of gaming things.
Also, if being able to learn completely arbitrary things is correlated with intelligence and correlated with job performance, then people who are willing to jump through that arbitrary hoop are telling you that 1) they think they're more capable (faster learners) than average 2) they're willing to put in the effort to prove it to you. Maybe you're not a getting a pure signal of mathematical ability, but some combination of drive + ability + learning fast, and you care about all three.
Dynamic programming is far from completely arbitrary. But even if it were completely arbitrary, it would still serve a useful signaling function as long as people agreed on which hoops they're supposed to jump through.
A peacock's plumage serve little useful purpose except to show how fit he is; but that purpose is essential to the survival of the species.
> If someone can learn to solve just DP problems by practicing in leetcode then DP is a useless metric or proxy of someone’s ability to reason mathematically.
I'm not a fan of DP but I don't follow this logic. Does practicing maths problems for an exam make the exam a useless metric of mathematical reasoning?
I don't know anything about dynamic programming in the manner that it is taught. I also don't get why it's even a thing (which is to say I haven't studied the topic, and haven't yet seen a need to do so).
Someone recently gave me the egg dropping problem in an interview. I had never seen it before, but the answer eas obvious, and I verbally stated the solution in under a minute. I asked the interviewer if I could out the Log N solution, which I believe was essentially a binary search on a logical list of length N using a bit of looping or recursion. He asked me to go through the process of explaining it, and I really didn't see the point. He knew the answer, and he knew the answer, and he asked me to implement the simplest solution first. To me, that was the solution I started, but that wasn't good enough. The interview went downhill from there.
I later googled the answer, saw people explaining some overly complex matrix-y shenanigans, and I tuned out.
Talk is cheap, code is what matters. But maybe the interviewer wanted to save time not writing an incorrect solution if he believed your approach was wrong and you could talk yourself out of it? Did you get to step through how your solution would arrive at the number 14 given 2 eggs and 100 floors? And if you wrote it out, would it produce that number? (Personally I have no idea what you mean with the binary search approach, what is your logical list since doing it on the list of floors is wrong? But I've not thought about the egg dropping problem in a long time and have only seen the DP approach.)
It might of course be that the interviewer only knew one way to solve it, any other way was therefore wrong. Or admitting they don't understand a demonstrated other way shows weakness, so is also wrong. I'd be perfectly happy to receive rejections from companies with those sort put in charge of interviewing.
A funny trick is to ask this question but first give the candidate an infinite amount of eggs. Once he succeeds, ask the question with only 2 eggs. That might explain the binary search answer.
The approach would have been easier way for me to answer the question. I wrote something like that to guesstimate the number of git commits on GitHub a project I didn't have time to clone a few years back (I think it was Linux). Basically I started at page 1, then went to 2, 4, 8, 16, etc. When I hit a 404, I went back 50% of the way to the previous number (e.g. if it was page 1024, the previous was 512, then I went to 768). If 50% was a 404, go to back midpoint of lower range (e.g. midpoint of 512 and 768). It if was legit, go to midpoint of upper range (e.g. between 768 and 1024). Wash/rinse/repeat. It was a fun exercise to burn some time on a slow day. If someone actually wants to see the code, reply to this comment and I'll find it, or write it again.
Now I want to go back and study up on that dynamic programming stuff again! Yay inspiration!
I don't know what version of the problem you solved, but generally you are limited by the number of eggs you are allowed to use. Say you have a hundred floors and 1 egg, binary search would drop an egg from the 50th floor, have it break (assuming the key floor was <50), and then promptly fail.
I asked a dp question in an interview. It was useful. Here's why. I know the solution, so I frequently guide the interviewee towards the correct answer and offer my help. Plus I put my nice guy hat on. I have seen a supposedly good candidate refuse my offers of help and then proceed to spaghetti code on the board a series of disjointed while loops that is obviously going to go nowhere.
I don't want to work with such a person. They still got the job, because connections, but will 100% never be on my team.
> "I know the solution, so I frequently guide the interviewee towards the correct answer and offer my help. Plus I put my nice guy hat on."
I wouldn't want to work with a person whose idea of a useful job interview is:
"I'll pick an esoteric bullshit problem, put my nice guy hat on, and guide you towards the solution as I bask in your admiration of my largesse."
It's like when self-declared "nice guys" approach women: when you have to make an explicit point about how nice you are being right now, you're probably doing something very wrong.
I think its a win win. They get to pick someone who is more suited to their needs (problem solving that needs some algorithms thinking) and you get to work at a place where things are more rote.
Do you think that "I'll put my nice hat on and guide you to the solution, if I feel like it" is an optimal way to measure "problem solving that needs some algorithms thinking"?
Well, of course one can be a dick about it, but nothing in the parent post indicated that he was. May be you had a bad experience in the past and projecting it on the author of the comment. I tend to / try to assume good faith.
Picking up loosely specified directions and filling up the gaps is indeed an important skill to have. If I had the time to describe the solution in the minutest detail to a co-worker, I would have the time to write it myself. Often, its a luxury that I dont have. In such situations I would certainly appreciate the skill that a co-worker can fill in the necessary detail from pointers and rough directions.
A whiteboard DP puzzle is totally different from any real-world engineering situation of “picking up loosely specified directions and filling up the gaps.”
No point in arguing this further. These tests are fundamentally just a performative power trip masquerading as objective measurement — the perfect storm of software engineer bias.
I think you are projecting your own biases here. OP said "DP question", there is no mention of a 'puzzle' anywhere. As I said in another comment of mine on this page, I have had to use DP in each of my last three years in real projects. If i were to hire someone for help with these, I would definitely be looking for comfort with DP styled thinking in a candidate. In order to do so I do have to simplify and abstract out the problem statement.
An alternative point of view is that you set them up with an unsuitable question in the first place. Also, the disadvantage of stories such as yours is that we only hear one side. How did your "help" look like? We don't know.
For my taste the brevity and content of your story indicates a lack of introspection. It seems to me that the only option you consider is that the interviewee gets all the blame.
The reason for my suspicion is that independent of what happened there you should at least be aware that the story is not suitable to be shared in this form, because readers here only get your (very short too) story. Sure, you will get agreement - from people who jump to quick conclusions without actually having a basis for it. You are not even trying to give sufficient and/or unbiased information. I know this is just a forum and casual conversation, but to compare with science, which has the same issue of communicating one's ideas to other people, the equivalent would be a paper that only publishes the authors conclusions.
If I was given a new problem like this, the first thing I would do is google to see if it has been solved before and the approaches other people have taken. Not try to think up shit myself. With 15 years of tech experience I have found it to be a far superior approach to solving problems.
Some people may well not want to ask for help with a task during an interview seeing as its obvious that some interviewers may well mark you down for asking.
I interview regularly and I'm really keen to know if the person knows about time-space tradeoff. DP 'like' algorithms lend themselves well to evaluate this. Note that I say DP 'like' because I'm essentially looking for them to say memoization and implement it. And this doesn't really take a crazy DP optimization algorithm to evaluate this concept. And note that this time-space kind of tradeoff is something we indeed encounter regularly; e.g., caching (to avoid n/w call or computation), pre-computation etc.,
So the concept underlying DP is quite useful and it's premature to dismiss it as something that never comes up in the real world.
That said, I understand that it's easy to push this and go down the path of more and more esoteric algorithms.
I actually disagree. Of the problems that are commonly asked in programming interviews, DP is perhaps the one that tests "general problem solving" the best.
Many other problems require problem-specific tricks/uncommon tricks. On the other hand, there's rarely a DP problem (that comes up in interviews) that relies on a problem-specific trick.
> On the other hand, there's rarely a DP problem (that comes up in interviews) that relies on a problem-specific trick.
That's a pretty big parenthetical. DP problems as a class are full of clever tricks for elegant solutions that took decades of research. Any practical use of DP is almost certainly going to use problem-specific tricks as well. Which makes DP tricky, especially in interviews.
Yes, there's a ton of DP problems that require clever tricks. I've had to show these tricks to friends who believed that DP problems were too formulaic.
Could you give an example of DP problems that show up in interviews that require problem-specific tricks? I'm contrasting DP problems with other interview questions like: Find whether a linked list has a cycle in O(1) memory or implement a queue with 2 stacks.
Highly disagree with this. There are pretty significant approach differences between 2D DP, 1D DP, Tree DP, string operations, etc. Determining what you optimize and cache (if you do top-down, which I prefer) is often problem-specific.
The way the optimal solutions are laid out the differences between the most efficient fibonacci (O(1) space), "House Robber" independent set problem, and various problems on top of string distance seem not to have much in common between them.
Disagree. The general concept is the same - "Where can you collapse states into one". Someone who understands DP well should find tree DP natural, and might even come up with it themselves without learning about it beforehand. I'm not sure what you mean by string operations - most string DP problems don't require special treatment.
I agree that there are significant approach differences. However, the high level approach of:
1. Find a DP state that allows you to collapse states.
2. Find DP state transitions
3. Solve the problem
is constant among problems. The most common complaint I've heard about interview problems is that they relied on some "trick" to solve - DP problems don't have these tricks.
Fibonacci isn't considered Tree DP. One example of a tree DP problem is: Given a tree with N nodes, how many ways are there of coloring each node with black or white given that no two white nodes are adjacent.
It's partially targeted for competitive programming, though, so be warned that things like Tree DP/subset DP/convex hull optimization will likely not show up in interview questions (although they are pretty cool!)
Well, A. Not exactly because there's no restrictions on when you can color nodes black, and B. Graph coloring in general is NP-complete while this problem has an O(n) solution.
The good thing of these type of question is it can be practiced, and it's pretty much "standard". Once you comfortable with it you can essentially interview anywhere.
To my recollection, dynamic programming problems were fasionable interview questions prior to the current wave of machine learning. That's to say I'm pretty sure they were being asked at Google/Facebook/etc around 2012 at least, likely earlier.
What a big load of rubbish and gratuitous generalization !
Just because it has not come up in your work or you may have passed opportunities to recognize its applicability at your work, does not qualify you in any way to make such a broad comment.
DP has served me very well in each of the last three years in three different projects. In two of these projects I had joined the project midway. In every case the key was to recognize that DP applies where others had not and in doing so had left significant benefits on the table. In one case the 'others' was me myself. I had overlooked a potential application the first time.
So much for "almost never in the real world"
> candidates who haven't usually don't even have a chance at solving them
The claim above sounds a lot like "If all the votes were counted then candidate B would have been elected". The whole point of an election is counting the votes.
Is it such an unfair expectation that someone applying for a CS job ought to be familiar with one of its fundamental techniques ? If one is aiming for a commodity position, then perhaps yes.
It does depend on the job, and I don't doubt that there exist jobs where DP in the interview process is reasonable, especially if it's a "CS job". I'm particularly talking about software engineering interviews, to be clear, not research positions or anything like that. Even in software engineering, I could imagine DP being important if the product relies heavily on high-quality string diffing, for example. I'm certainly interested to hear your real-world applications if you're able to share. I guess I have seen memoized DAG traversal come up, so in that sense I've seen DP, but it's not quite the same as "take this non-recursive-looking problem and make it recursive so you can memoize".
Two situations come to mind when I think about DP being used in real-world applications (both just things I heard about): a talk I heard about how Google News compares news pages before and after the publisher makes an HTML change, and DNA sequence alignment problem. Both of them went like this:
1.) We want to find a perfect solution, but the search space is exponential size.
2.) Hey look, by using dynamic programming we can reduce the complexity to n^2 time and space.
3.) Wait, n^2 time/space is still too inefficient for many real-world use cases. Rather than trying to find a perfect solution, let's settle for a good enough solution by rethinking the approach and applying some heuristics. By putting some clever thought into it and sacrificing a little bit of quality, we can get something that takes O(n) time in practice.
So I think an initial goal of looking through an exponential search space for an optimal solution (often what DP is solving) is often misguided anyway, since often "optimal" isn't actually so important.
Appreciate this more nuanced take. I think 'software engineering" job and "CS job" arent quite complementary. I do disagree that 'optimal' is often unimportant. It does matters a great deal when it matters, especially if that is the core of the project.
Dont know how much I can talk about the problems. One of them was definitely a string related problem but not about string diffing. The other two were on optimizing cost over all possible partitioning of a set (different kinds of sets in the different projects) -- hence my embarrassment at not having noted the same structure in the two different problems right away. It was one of those moments, "Wait a minute. I am doing this wrong. The sets are different but in an abstract sense I am doing the same thing in the other project over a different set".
Without DP it would have been very hard to come close to the optimal solution by local explorations unless one got lucky and started of by looking around a promising place in the search space.
I am sure DP can be put to some use when trying to optimize the placing of VMs on hosts. If done well the savings in dollars can easily hit 6 figures or more. These are real problems with real money at stake.
This is a comment to demonstrate the differences between DP and memoized recursion to people in the sibling comments.
When I was learning DP vs Memoization I thought Floyd-Washall algorithm to find shortest path length between all pairs of nodes in a graph is a good example of DP that wouldn't work the same way with memoization.
In FW algorithm, because of the order of filling up the table and discarding old values on the go, we are able to run the algorithm only using O(n^2) memory, but if we were to run it in a memoized recursive fashion, I would guess you will have to do with O(n^3) memory.
Another example is when a recurrence in DP only depends on the previous row (if we are going row by row) and you can only keep around O(n) memory swapping two rows representing the current row and the previous one. In a recursive black-boxy memoization method you will have to do with a full O(n^2) memory usage.
Finally, there is a technique to do DP on a table using O(n) memory vs O(n^2) and recovering the optimal path by recursively splitting the table into halves and only storing O(1) rows at a time. This technique is more complicated to explain in an HN comment tho.
Update: forgot the simplest example: fibonacci numbers. In the top-down approach, you would memoize results based on the index of the fib number, so you would need an array of results caching O(n) values. But if you build it up bottom-up, you can get away with only using two variables: previous and current and the do something like:
There's a "simple memoization" version of fibonacci that takes more space than `fibsViaUnfold`. I think the term "simple memoization" (and equivalents used in these comments) are a bit too imprecise to be useful.
DP is a mathematically term describing certain problems. It does not state that you need to save all solutions to sub problems. If you do have a DP problem, the fastest way to solve it is with recursion and memoization as needed because of the properties of the problem.
I am not arguing with this. In my comment my intend was to show that the distinction between the memoized top-down approach and bottom-up (the approach that is usually taught as DP in a classroom) is significant in some memory optimizations.
Dynamic Programming and memoization are definitely related techniques, but they are emphatically _not_ the same. Memoization is a black-box approach that can be applied to a generic recursive top-down, depth-first algorithm. Dynamic Programming is about rewriting the recursive top-down algorithm in a bottom-up, breadth-first manner.
This is one definition, but I don't think it's the common one. The more common definition is that dynamic programming refers to solving a complicated problem by breaking it up into simpler overlapping subproblems that can be solved independently.
Solving it with recursion/memoization vs. bottom-up is merely an implementation detail, while DP refers to a class of algorithms.
> The more common definition is that dynamic programming refers to solving a complicated problem by breaking it up into simpler subproblems that can be solved independently
I dont think that's sufficient? I thought DP also implies you actually reuse the answers from subproblems.
"There are two key attributes that a problem must have in order for dynamic programming to be applicable: optimal substructure and overlapping sub-problems. If a problem can be solved by combining optimal solutions to non-overlapping sub-problems, the strategy is called "divide and conquer" instead"
It is a good write up, but it underscores the mystery of how DP ever got named as a technique.
Memoization is a great general-purpose technique that covers pretty much all the obvious cases for DP to a reasonable standard. Its black-box nature allows for effective and well compartmentalised application.
DP is an extremely complicated and invasive technique that requires rewriting algorithms to optimise for space & time use.
It doesn't seem like the advantages of DP are enough to outweigh the neat design of memoisation. I'd rather see an interview question on implementing memoize in Python than writing a DP algorithm.
Dynamic programming is a mathematical term for a certain kind of problems that have certain properties and can be solved in a certain way.
Memoization is specific form of caching for functions.
That is all there is to it. Your link seems to conflate memoization with a global store that you must apply in an all or nothing fashion for functions. You do not, and calculating the n'th fibonacci number is a great example of how to use memoization without using O(n) space.
I have always felt that dynamic programming is only confusing because of the name. The concept of caching previously calculated values in order to save time by avoiding recalculation (at the cost of using more space) is intuitive.
Richard Bellman coined the name, and according to legend it was because the phrase 'dynamic programming' was so anodyne that not even the most officious bureaucrat could object.
In Bellman's own words[0]:
"An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities."
Not much. There is the top down and bottom up approach. Classic memoization (top down) needs the full past results to be cached. But sometimes you can save memory by doing it bottom up and only using the previous result of the smaller problem to get the larger problem.
In most interviews you should be good by just doing top down, which is more intuitive. But sometimes the bottom up approach will be the follow up question.
Does the term only apply to recursive solutions? Caching is often useful in iterative solutions as well; does that count?
I would pretty much never start an interview response with a recursive solution in an interview because I pretty much never end up with a recursive solution in my work. I have very rarely been asked to do something recursively having already done it iteratively.
After spending a lot of time interviewing candidates. I have pretty much come to the conclusion that DP problems dont give good signals about candidates problem solving skills. In my experience, less 5% of candidates can genuinely crack problems using DP. Most of them either give me memorized solution or just plain give up.
If you are interviewing for a regular CRUD job aka web application. There are soo many other problems which can give much more refined signal about candidates skill. Please for love of God, dont ask DP. Unless you actually use it at work.
In the first problem illustrated, I wanted to also get the list of coins. I made this attempt, but this hardly seems elegant. Any comments to make it better?
n=10
denom = [1,3,4]
dp = [-1 for i in range(n+1)]
dpl = [[] for i in range(n+1)]
def f(n):
if dp[n]!= -1:
return dp[n]
ans = 10**10
if n<=0:
return 0
mini=n
for i in denom:
if (n-i)>=0:
new = f(n-i)+1
if new <= ans:
mini=i
ans = new
dpl[n].append(mini)
if mini!=n:
dpl[n] = [item for sublist in [dpl[n], dpl[n-mini]] for item in sublist]
dp[n]=ans
return ans
if __name__ == "__main__":
ans = f(n)
print(ans, dpl)
To be more elegant, you can remove the "dp" array. If you want to keep track of the full list, then you only need to keep track of "dpl". Here is code that I wrote and it works:
n = 10
denom = [1, 3, 4]
dpl = [[] for i in range(n+1)]
def f(n):
if dpl[n]:
return dpl[n]
if n <= 0:
return []
ans = list(range(n)) # this is the max size possible
for i in denom:
if n-i >= 0:
new = f(n-i) + [i] # append i to the end of the array
if len(new) <= len(ans):
ans = new
dpl[n] = ans
return ans
if __name__ == "__main__":
sol = f(n)
print(sol)
Dynamic Programming is a concept from control theory in which the goal is to optimize a specific type of equation. Here, "programming" doesn't mean the same word as in "computer programming". It's just a carry over word from the field of optimization and mathematical modeling [0].
Goal is to optimize some problem that can be framed as having the following properties: an optimal solution to the overall problem will result in optimal solutions for all "subproblems" and these "subproblems" are not all disjoint. If they are disjoint, then don't need to use DP, but you can use a simpler divide and conquer approach (example: you don't need DP to solve mergesort)
Such problems are everywhere. Lots of path planning, shortest path type problems are DP based (high level ex: shortest path from A to C going via B necessarily will imply that the path from A to B is the shortest and B to C is the shortest). In reinforcement learning, similar ideas are crucial. Relatedly, in game theory, these types of problems arise in the form of subgame perfect equilibriums.
Thinking graphically, I believe all DPs can be represented as directed acyclical (hyper)graphs. So, the optimal solution on this graph will mean that the subgraphs in the solution also have optimal solutions, and that these subgraphs overlap.
In computer science, the optimizations tend to work incrementally as a 'wavefront' from the smallest subgraph (usually some basecases, starting points) -- using the solution to the current subgraph in the larger subgraph, etc. Eventually, the wavefront covers the relevant parts of the graph. This is what people tend to call "bottom up".
I think many people struggle with understanding "Dynamic Programming." Hopefully the following clears it up for you.
Dynamic Programming can be done using Memoization (top-down; aka recursively) or Tabular method (bottom-up). So what's the difference?
When you see top-down, it means to start from the root of the problem and recursively descend to the base case. As you pop up the stack, you will either calculate and store the result or look up the value in the cache. e.g., in the Fibonacci sequence, check to see if fib(4) was already calculated. No? Calculated and store it so the next time you come across this, you can use the result and not worry about processing fib(1), fib(2), fib(3), etc.
When you see bottom-up, think about filling out a table from the upper left corner column by column then row by row. To speed performance, you'll look at prior values, the value in the column above vs. column to the left vs. or column diagonally to the left. I know this sounds a bit strange, but if you solve the following problems, you'll see a repeating pattern.
Edit Distance
0/1 Knapsack
Rod Cutting
Longest Common Subsequence
Longest Path in Matrix
Coin Change
I've been writing about this in detail. Eventually I'll publish my writing to help others. I solve ~20 problems using Memoization and the Tabular method. As I solve each problem, I compare the solutions with prior problems showing the pattern. What I want to do is help people spot "patterns" vs. memorizing algorithms that are very problem specific.
It seems that many dynamic programming solutions can be arrived at by starting with a recursive formulation, adding memoization, and then optimizing the cache based on specialized knowledge of the problem. For example:
1. Q: How can you compute Fibonacci number f(n) recursively? A: f(n) = f(n-1) + f(n-2)
2. Q: How can you memoize the results of your recursive function(s) to dramatically reduce the number of calls? A: Make a cache keyed on n that stores f(n).
3. Q: Given specialized understanding of the problem, how can you minimize the size of the cache? A: Notice that you don't need to keep all O(n) slots; you only need to keep two ints.
Can every dynamic programming solution be explained this way? Or is there a good example of a dynamic programming problem for which you really need to make a leap that can't be sensibly reached through this sequence of three questions?
2. The meat of the problem is understanding how to formulate it such that this "recursion" is possible.
Point two is often non-trivial so your "just" -- while technically correct -- isn't true in practice. If you solve a variety of medium and difficult DP problems, it will be clear why.
Based on my understanding, the key difference with DP is that you build the solution bottom up instead of top-down.
For example, in case of computing the factorial of 10 recursively, you would start at fact(10) and move down to the base case, fact(1).
With DP, you would start with fact(1) and compute succesive results based on computed ones, all the way up to fact(10).
Following on from this, you usually build up the solution in an array, sequentially, instead of navigating down the tree of solutions and storing as you go (memoization).
I've done a lot of programming contest stuff, and at least when speaking casually, I've always heard DP as referring to either the top-down approach or the bottom-up approach. They have their tradeoffs (though usually top-down is better because it's easier to implement), but they're both DP. It may be that you're technically not supposed to use the term "DP" for the top-down variant, but in practice people use the term to refer to both.
I'd say that in competitive programming, bottom-up is actually preferred.
Bottom up is faster, usually shorter to code, and allows certain kinds of optimizations you can't do with top down (sliding window, convex hull trick, etc).
Top down frees you from needing to think about order of computation, and also allows a different set of optimizations from bottom up (divide & conquer).
TBH it's been a while since I've done these contests seriously, but I remember the "order of computation" problem to be hard to think about for less trivial cases (like with a 3-dimensional table). But maybe I worried about it too much.
And just for fun, while we're listing these things:
Another advantage to bottom-up that I've seen is that sometimes top-down causes your recursion to get so deep that you run out of space in your call stack. Another advantage to top-down is that you're only filling in the subset of the table that's needed for the specific problem you're solving. Certainly someone with a lot of experience should be able to do both.
Some examples where order of computation isn't that trivial is when you're doing DP on trees or a DAG.
+1 to the "filling in a subset" part. Usually it's not too relevant because it's merely a constant factor change, but occasionally it'll be very important.
Sort of. "I know DP" doesn't just mean "I know what memoized recursion is". It means "I know how to take a problem, recognize that DP might help, frame it recursively with highly-overlapping subproblems, and use memoized recursion to implement an efficient algorithm for it". Especially in advanced cases, it take a lot of skill to look at a problem in just the right way that memoized recursion makes it computationally easy.
For example, sometimes you might have a problem where the simple DP approach takes n^3 space, but with some trickery you can get it down to n^2.
Yep, it's still certainly DP, in the same sense that 1 + 1 is addition, but understanding 1 + 1 doesn't necessarily mean that you've mastered addition.
Just trying to understand the concept/name. I'm clearly not the only one confused.
Say I, at runtime, build a table of function pointers. Based on some attribute/test of the input that suggests a specific type of function is faster for that type of input.
If it's faster, but not exponentially faster than a naive solution...is that "dynamic"?
Nope, DP is a bit more than "build a table" or "cache results". It's specifically about framing a problem recursively, and it needs to be the sort of recursion that has overlapping subproblems. For example, fib(8) and fib(7) will both call fib(6), which means that it's useful to cache fib(6). Some problems (like counting the number of nodes in a tree) are recursive but don't have overlapping recursion, so DP doesn't help with those.
(Sorry, I didn't mean my "addition" comment to sound condescending! I just meant that memoized recursive fibonacci is a simple example of DP, and the fact that it's (relatively) simple doesn't mean it's not DP.)
The name "Dynamic Programming" was coined to be deliberately confusing by an early researcher in the field for the sake of maintaining research funding in a hostile environment [1]. Don't worry too much about the components of the name.
DP is clearly more related to maths than programming. If you are looking for a mathematician it make sense... But if you are looking for a good real world developer it's completely wrong!
If we solve it naively, complexity will be O(1.6^n).
Now we can solve it in DP using two ways:
1. Top Down: Instead of recursively computing the same subproblem, just store the value of this computation and look it up when needed. That's it.
2. Bottom Up: One cannot come up with bottom up representation directly, unless we identify its recursive pattern and subproblems. Once we have this recursive pattern, just plot a graph for some values and we can identify overlapping subproblem and its overall graph.
Constructing graph for fib(5), we can see solutions from fib(4) and fib(3) are needed. Therefore, we need to find fib(3) and fib(4) before even solving for fib(5). Once we identify this graph, we can do bottom up, where we solve base solution (trivial or first node in graph) and construct our solution based in it.
Therefore, easy approach is to solve it top-down. Once it is done, we can identify subproblem graph and construct bottom up solution.
This is all well and good for fibonacci but it feels like the difficulty of the problem becomes exponentially greater when you start running into more complicated optimization strategies. Text justification is an example:
Coin changing - my understanding is that if each denomination of coin is at least twice that of the next smaller, then greedy is optimal. Note to self - see if you can prove it.
That is the case for current US coinage, 1, 5, 10, 25, 50, 100, cents.
I spent 2-3 weeks going through all DP problems on leetcode after work. I got that Tetris-effect where I was starting to hallucinate/dream DP problems and solutions as I would fall asleep at night.
I did this specifically for interviewing. As I've said before, these kind of interviews screen for unusual geniuses or people with the motivation to spend many hours studying, which are two types of acceptable hires.
Thanks for taking the time to read my post. :) Like I mentioned, this article only lays the groundwork. The next article I intend to write on is DP+Strings (for example, finding the longest substring of a string which is a subsequence of another, etc)
I'm starting with classical problems and I'll soon diverge into non-classical problems, as I've mentioned in my article too.
In any case, hope my other articles on my blog added more value to you in comparison!
The knapsack problem here is of particular interest since despite the optimization problem being NP-hard the solution can in fact be found in O(n * W). This feels similar to how the theoretical best comparison sort is O( n log n) but radix can do this in O( n).
Consider Fibonacci. Memoized recursion uses O(n) memory, because you don't garbage-collect anything. Bottom-up dynamic programming is O(1) memory, because you only need to remember the largest 2 subvalues you've computed.
It is basically memoized recursion. However, that does not mean that it is easy. To find out the optimal substructure - that is, building the solution to the problem from the optimal solution of its subproblem decomposition - is where the actual difficulty of the technique lies.
Exercises from a standard algorithm textbook like Kleinberg and Tardos should give enough practice in solving these problems. Not all of them are straightforward.
So, in all of these examples, the author basically makes a set of all possible solution into a tree (where moving from ancestor to child is that tree correlates to making a possible decision) and then recursively traverses this tree to find the best option?
Note that the memory requirements in the Coin Change solution is O(N), and that is not optimal. You should be able to get away with O(max(denom) - min(denom)) (you don't need to cache everything).
I agree with other's opinions that asking a Dynamic Programming question is a weak signal for a candidate's competencies in general software development work like frontend, backend, mobile applications, and the like.
However, I would like to counter a common opinion that eventually follows in similar threads and some of my social circles: "Algorithms is an undergraduate course in which students learn specialized solutions to esoteric math problems, all of which they ultimately forget when they spend real time working in the industry, so the knowledge shouldn't be relevant in an interview."
It is fair that if you don't exercise what you learned, you will gradually forget it, but I believe its still important for a candidate to cherish algorithm design and analysis because I consider it a great toolbox of the trade.
For all the concepts and the techniques I learned from my undergraduate course in data structures and algorithms, I utilized them as the basis of my Software Engineering Toolbox.
What is my Software Engineering Toolbox? It is a collection of algorithm design concepts and techniques that I can employ anytime I am faced with a novel problem or a problem whose standard Stack Overflow solutions are inadequate.
The Software Engineering Toolbox contains the following: Arrays, Linked List, Stack, Queue, Hash Table, Binary Search Tree, Priority Queue, Set, Trie, Sorting, Binary Search, Divide-and-Conquer, Backtracking, Dynamic Programming, Range Query Trees, Graph Algorithms, Bit Mask Optimizations, Square Root Bucket Optimizations, and Multiple Pointer Optimizations.
First, I rarely implement my own data structures from scratch; all the programming languages that I use provide great standard libraries. Yet, I always remind myself the use of these data structures, because you would be surprised with the amount of people you can meet who tries to answer a problem that boils down to a set membership with a HashMap<Integer, Boolean> when they can just use a HashSet<Integer> or with the amount of people who manually treat an Array as a Stack or a Queue when those data structures are readily available.
Second, I rarely implement my own sort or search functions from scratch; again, all the programming languages that I use provide great optimized functions. I treat Sorting and Binary Search as techniques that lend themselves to optimizing the locality of a data set such that you can easily answer basic statistics, find the bucket for a token in a ring, or merge data sets. These are simple techniques developers should readily know to exist when optimizing their code.
Third, why do I have Divide-and-Conquer and Backtracking in my toolbox? I believe that no matter what problem you face, you should be able to bruteforce it. You can't always tell someone that you can't implement something because you didn't find a Stack Overflow answer or you didn't deduce a collage of standard library functions or third-party libraries to solve your problem. Using these techniques, you can at least arrive at a pretty weak solution which is still a solution. To actually address Divide-and-Conquer and Backtracking in relation to bruteforcing, these techniques allow you to easily traverse through a search space to filter for a certain combination or permutation of items that satisfy a customer's constraints. Furthermore, Backtracking is a relatively easy to medium difficulty technique that is the basis for a lot of the Graph Algorithms people keep balking at!
Fourth, Dynamic Programming. To be honest, I rarely utilize it, but I appreciate it because the common subproblem types of 1D, 2D, Range, and Sub-Trees taught me how to order subproblems successively to solve other problems, which applies beyond DP. I discourage people from trying to pattern match Dynamic Programming problems and solutions, and I encourage them to truly digest CLRS and understand its 4 rules for Dynamic Programming to consider possible dependencies and structures for various combinations and permutations of the problem parameters to identify what the optimal substructure really is.
Finally, the remaining things in my toolbox are included in my toolbox because they are useful in my work experience with real-time network anomaly detection and streaming analytics. For example, topological sorting distributed tracing events into a rooted tree that I encode into a bit vector using a left-child right sibling binary tree. Not everyone will do this, but with my toolbox I never worry much about facing new frontiers of problems or being tasked to create libraries and tools for myself and others to use instead of being at whims of someone else on the Internet.
Overall, I hope people can look back at their courses in algorithm design and analysis and say, "Yeah, the problems and the solutions were really weird, but the techniques hidden away within them are actually GENERALIZABLE and are a fundamental basis to build new things and solve complex problems!"
Nonetheless, I don't want anyone who is weak in algorithms design and analysis to feel discourage. Play to your own strengths whatever they maybe, or you can always strengthen them; it's never too late.
Finally, my Software Engineering Toolbox has way more stuff like actual "engineering" stuff like automatic formatters, linters, fuzzers, automation, tests, mocks, coverage, "Infrastructure as Code", and blah blah blah. :")
I would like to close by saying that a good engineer knows the right tools for the job. :)
The problem with DP problems (for me) is there seem to be a really large set of unique DP problems and coin change, knapsack, and grid DP problems like number of paths are only a small subset of them. What's more is that the rest of the problems can't be easily based on the approaches used for these...there might be dozens of problem classes to understand!
I never much enjoyed dynamic programming and I do think it’s a poor choice for timed interview questions, but I did become more interested in it when I realized there are patterns to the cache strategies that can be used to group problems. For example like the usual matrix raster fill approach for e.g. 0-1 knapsack, or sepatately using two cursors to fill the upper triangle of a matrix like for longest common subsequence and for optimal order of associative computation (like matrix multiplication).
This is sad and a little surprising to me. I've always thought of DP problems as being obviously terrible interview questions, even among the different types of algo questions (which are already heavily criticized). Candidates who have learned DP in school usually find them really easy, and candidates who haven't usually don't even have a chance at solving them, so really they just test whether the person has learned DP, rather than other algo problems where you can at least try to claim that it's a proxy for "general problem solving ability". And DP comes up almost never in the real world (and when it does come up, you're often still better off taking a heuristic/inexact approach), so testing if the candidate knows DP is also almost completely useless.
If you're an interviewer at a company that asks DP questions, please reconsider. There are almost certainly alternatives that are more fair and higher-signal.