
Dynamic Progamming: First Principles - foxh0und
http://www.flawlessrhetoric.com/Dynamic-Programming-First-Principles
======
psykotic
I'm fond of this old RAND report from Dreyfus, which is worth skimming if
you're mathematically inclined: Dynamic Programming and the Calculus of
Variations,
[https://www.rand.org/content/dam/rand/pubs/reports/2006/R441...](https://www.rand.org/content/dam/rand/pubs/reports/2006/R441.pdf)

One important takeaway is that dynamic programming in the Bellman formulation
is a discrete analogue of Hamilton-Jacobi theory in how it writes down an
equation for the optimal value as a function of a given endpoint rather than
writing down an equation for the path as with the Euler-Lagrange equations.
(You can reconstruct the path from the value function after the fact by
gradient descent.) The relationship between Hamilton-Jacobi and Euler-Lagrange
is the classical version of wave-particle duality. A concrete example in
geometrical optics is the eikonal equation, a Hamilton-Jacobi type PDE, versus
the geodesic equation, an Euler-Lagrange type ODE. Not coincidentally, one
common numerical method for the eikonal equation called the fast marching
method is a dynamic programming algorithm, very similar to Dijkstra's
algorithm for shortest paths.

It should be mentioned that any "local" equation like a PDE or ODE cannot
describe a globally optimal solution without strong assumptions such as
convexity. In fact, satisfying the Euler-Lagrange equation isn't even
sufficient for local optimality without further qualifications (Weierstrass
conditions). But the Bellman dynamic programming equation, being recursive,
can describe globally optimal solutions.

~~~
mlevental
understand basically all of you've said here except this:

>The relationship between Hamilton-Jacobi and Euler-Lagrange is the classical
version of wave-particle duality.

in what sense are two formalisms (hamiltonian and lagrange) related to the
relationship between time and freq space for fourier solutions to pdes?

also holy shit i would pay pretty good money for a service that typeset these
old monographs using latex.

~~~
kbenson
Seems like in a world where there's on-demand photoshop editing through a gig
economy, something like this should be feasible, if not already exist. Maybe a
separate cost per page for text, diagram or both, and a system to pay other
people to review pages for correctness (mechanical turk style, or just
actually use mechanical turk), and you have a nifty way to update old scanned
documents while helping put food on some poor grad student's table.

------
santaclaus
> Memorisation

It is memoization, not memorisation! Although for two weeks in algorithms
class I did in fact think our prof was just pronouncing memorize in a cutesy
manner.

~~~
quotemstr
Honestly though, we should just switch to using "memorization". It's a less
obscure word that communicates the intended meaning better, IMHO.

~~~
Retra
Obscure words are very well suited to esoteric subject matter where precision
is needed. A particular brand of function optimization is exactly that.

~~~
quotemstr
What precise meaning does "memorize" denote that "memorize" wouldn't?
Sometimes jargon is just jargon.

~~~
TuringTest
"Memoization" is only used in the context of caching results of previous
function calls.

So if you see the word, you instantly know that this is the context. You
wouldn't know that if you see "memorization".

~~~
jjaredsimpson
Exactly. Memorization is the process of committing something to memory. Things
like spaced repetition, flashcards, using mnemonics, etc. Memoization is the
technique of caching expensive function calls and returning those cached
values upon subsequent invocations.

Seeing the word memorize implies a process that an actor is undergoing and
says nothing about the data being memorized. Seeing the word memoize implies
the existence of an expensive function and implies a process of repeated calls
to the function. It also implies the function is pure, i.e. you can't memoize
calls to fread().

~~~
kbenson
Memoization implies caching specifically, which implies its own things, such
as cache size and eviction methodology. Not something that's necessarily
thought of when using the term "memory".

Memoization is close enough to "memorization" that you pretty much know what
it is without having heard of it before and easily rememberable, while being
specific enough to the actual concept as to be easily searchable and imply
specific concerns of its own. That's a win-win in my book.

~~~
lemming
Sadly, the older I get the more I am aware of my memory's aggressive cache
eviction policy.

~~~
kbenson
You and me both. :

But that's part of it, you generally don't really associate that with memory
until you get older...

------
krat0sprakhar
While dynamic programming is taught in almost all algorithms classes, I think
I finally grokked it after implementing it in a few practice problems. Would
strongly recommend giving a few exercises listed here a shot:
[https://leetcode.com/tag/dynamic-
programming/](https://leetcode.com/tag/dynamic-programming/)

~~~
baddox
It’s odd that there would be algorithms classes that don’t require some
implementations.

~~~
beardbandit
When I took Algorithms II in college at a top 10 CS program, our entire class
involved ZERO coding or programming of any sort.

~~~
megaman22
Same. Probably the worst class I ever took was my algorithms and data
structures class. I swear the professor transcribed most of CLRS onto the
blackboard over 10 weeks verbatim, and we never so much as touched a keyboard.

~~~
Bromskloss
Is paper programming in pseudocode bad, though? I mean, computing science is
not about computers, and all that.

~~~
farresito
Honestly, it's hard to completely grok a concept until you have solved at
least a couple or three problems about the topic, no matter how many examples
or lines of pseudocode you have seen. I think that's his complain, and I have
to agree with him. I also took a class some years ago that was purely
theoretical and I didn't really learn much besides some general concepts.

~~~
Bromskloss
I was actually referring to solving problems yourself in pseudocode on paper.

In any case, I'm happy with a computer science course that doesn't involve
actual computer implementations, and it's not just because it drives home the
point that they are different things, but also because you don't have to get
bogged down in the practicalities of wrestling with a particular language or a
particular toolchain.

------
et2o
Favorite examples for applications of dynamic programming? Mine are sequence
alignment/BLAST in bioinformatics, but I'm sure there are many of which I am
not aware in other fields.

~~~
thechao
Line breaking. The Knuth-Plass algorithm I implemented is my all-time favorite
piece of code.

~~~
baddox
That’s one of my favorites too, although I remember it being one of the easier
implementations from my college course. We just did it with fixed width fonts,
of course, but the results were still so impressing.

------
wht
In O.R. graduate School, Professor Gene Woolsey told us that he'd rise from
the grave and stand on our desks screaming 'No!No!No!' if we ever actually
used it to solve a practical problem.

IIRC, his complaints were about the speed of formulation, difficulty to
understand and communicate the model to others, and the processing required to
regenerate answers when the model changed.

I believe Optiant used Dynamic Programming for supply chain optimization.. So
people do or did use it for practical problem solving. ..I think.

~~~
mattkrause
What were you supposed to use instead? My impression is that dynamic
programming is sometimes sluggish but often better than the alternatives.

~~~
throwaway7645
Most optimization problems use Linear Programming if the problem has
linear/continuous variables and Mixed Integer Programming if binary values
(turning something on or off) is part of the solution. These are used in
production pretty much everywhere. LP (linear programming) problems are
extremely fast too. My company has a HUGE LP & MIP problems and the LPs are
still only a few minutes to solve on nice hardware. MIP can be much trickier.

~~~
gugagore
I don't think it's that common to solve, say, shortest paths by solving the
LP/MIP formulation (unless maybe the problem at hand isn't strictly a shortest
paths problem). Is it?

~~~
xoroshiro
For something as well known as shortest path, no. One thing about LP/MIP
formulations though, is that it's arguably easier to extend. Not to say there
aren't tricks to modifying existing algorithms, but most of the time, there is
no need to spend all that time, when adding in a few variables or constraints
will do the trick for this one problem that you will probably not need to
revisit for quite a while.

------
mratsim
For the last part about economic optimization, I would not approach it with
Dynamic Programming. As evidenced by go and many games, doing the "local" best
move does not guarantee the best result in the end. If brute-forcing is
untractable, the state-of-the-art is using Monte-Carlo Tree Search as
evidenced by its dominance in board games, Magic the Gathering, Poker, robots
competitions, RTS, etc ...

~~~
sacado2
Constraint solving works pretty well in these domains too, and AFAIK, top-
level artificial players in board games is a mix of constraint solving and
MCTS.

------
charlysl
I learned it as part of speech processing, first for Dynamic Time Warping and
then as the Viterbi and Baum Welch algorithms. Together with Hidden Markov
Models it's a thing of beauty how it is used to model speech.

All of this is explained very intuitively in speech.zone

In the wikipedia entry there is a fun (really) explanation of why its creator
called it like this.

------
HHest
Please note, the last name of the author referenced in the article is "Trick",
not Tick.

[http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html](http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html)

~~~
matt4077
Together with memo(r)ization, and the typo in the title, I'm starting to thing
the letter 'R' is having a bad day.

------
tommsy64
There appears to be a missing return statement in the Fibonacci Number example
in the if block. Should look like this:
[https://repl.it/NLIf/1](https://repl.it/NLIf/1)

------
molikto
Why use monospace font?????

~~~
pingiun
I agree, it's hard to read

------
misja111
> Tail recursion [3], a variant of traditional recursion implements
> memoisation, which uses memoisation very economically.

I don't understand this part, can anybody explain?

~~~
cdax
In "traditional" recursion, each recursive call is pushed onto the function
call stack. This means that your recursive solution will eventually run out of
memory when the number of recursive calls is more than what the function call
stack can handle.

e.g., consider the following recursive solution for calculating the sum of all
positive integers up to a given `n`:

    
    
      fn sum(n):
        if n == 0:
          return 0
        else:
          return n + sum(n - 1)    // A
    

When n = 1000, your function call stack will have to hold ~1000 calls
(fn(1000), fn(999), fn(998), fn(997), and so on...), before it's able to
compute the sum. This is because the return statement in line A above, needs
to keep track of the variable `n` from the calling function in order to be
able to compute the return value. If only there was some way to eliminate that
variable `n`...

That's where tail-recursion or tail-call optimization comes in:

    
    
      fn sum_tail(n, pre_sum):
        if n == 0:
          return pre_sum
        return sum_tail(n - 1, pre_sum + n)    // B
        
      fn sum(n):
        return sum_tail(n, 0)
    

In this solution, the return statement at line B does not depend on any
variable from the calling function (it simply passes that value on to next
function call), and so the calling function can immediately be popped off the
stack, saving stack space and making your solution more memory-efficient.

It's useful to know that whether any stack space can actually be saved,
depends on whether your language of choice implements tail-call optimization.
e.g., JavaScript recently started supporting tail-call optimization after ES6
[1], and Python does not support it [2].

[1] [http://2ality.com/2015/06/tail-call-
optimization.html](http://2ality.com/2015/06/tail-call-optimization.html) [2]
[https://stackoverflow.com/questions/13591970/does-python-
opt...](https://stackoverflow.com/questions/13591970/does-python-optimize-
tail-recursion)

~~~
misja111
Thanks for the great explanation of tail recursion.

But I still don't see how tail recursion implements memoisation? One might
even argue that tail recursion is the opposite of memoisation; while tail
recursion saves memory because it eliminates the need to remember previous
results of function calls, memoisation on the other hand uses extra memory to
save results of earlier function calls.

~~~
vanderZwan
rishabhparikh is correct, but since the point is quite subtle it might help to
be a bit more elaborate.

The crux of the misunderstanding is here: _[tail recursion] eliminates the
need to remember previous results of function calls_.

Emphasis: _results_. The thing is: tail call recursion essentially means
"finish _all_ intermediate computations and pass the results of these to the
next function call".

In the non-tail call version, the sum function has no results until you hit
the deepest level of recursion; it is the equivalent of writing the sum out in
full, which causes all that extra overhead:

    
    
        sum = n + (n-1) + (n-2) + ... + 0
    

You have to recurse until you reach n == 0, and only then does the whole sum
"collapse".

The `pre_sum + n` bit in the second example is what fixes this. All
intermediate calculations are finished and then "stored" by passing them as
arguments to recursive calls. This gives the functional equivalent of a for
loop:

    
    
        sum = 0
        for i in n..0:
          sum += i
    

This is why it can be considered memoization: each recursive call builds on
the "stored" finished result of the previous call.

~~~
misja111
> This is why it can be considered memoization: each recursive call builds on
> the "stored" finished result of the previous call.

Ok, but isn't this then also the case for non-tail recursive calls? And
ordinary recursive call builds just as well on the finished result of the
previous call and no result is really "stored" in either case.

~~~
ohkaiby
No, if you look at vanderZwan's explanation for the non-tail recursive call,
you'll see that it doesn't build on a finished result of the previous call.

> You have to recurse until you reach n == 0, and only then does the whole sum
> "collapse".

------
fmihaila
Glaring typo in the title: it should say progRamming.

------
BucketSort
Obligatory Demaine lectures -
[https://youtu.be/OQ5jsbhAv_M](https://youtu.be/OQ5jsbhAv_M)

~~~
yandrypozo
that's the best set of videos about DP !!

------
yipopov
Would it have killed him to use a proportional font? We have the technology to
accomplish that now.

