The author just mentioned dynamic programming. Usually in dynamic programming, e...

rfurmani · on Aug 8, 2011

The "obvious" selection of stages does work. When you are at position j you check all i<j until you find one where there is a segmentation up to i, and the substring [i,j) is a word. Memoization is just syntactic (semantic?) sugar on top of this and he provides the code that basically implements the above. In programming competition circles it's common to just say "it's dynamic programming" when the stages are semi-obvious, and given that he uses memoization I'd say there's a strong chance he's been involved in Topcoder

NY_Entrepreneur · on Aug 8, 2011

Yes, from a fast reading, your solution appears to be correct, and so does 'memoization', but, still, in spite of common practice in computing, it's not a dynamic programming algorithm because what 'dynamic programming' is goes back to R. Bellman and his student Dreyfus and the book Dreyfus and Law.

There may be a problem with what you outline: That substring [i, j) is a word is not so impressive! In addition we need to know, for i < p < j < q, is [p, q] a word. That is, even if [i,j) is a word, we can't conclude yet that it will be a word in the final, feasible 'segmentation' of the whole string. Indeed, even if [j, n] 'segments', we can't yet conclude that those segments will be in the final, feasible segmentation of the whole string [1, n].

Maybe in the core of the work what we want to know is, for each i = n, n - 1, ..., 1, is there a 'segmentation' of [i,n]. To tell this, at i, consider j so that i < j <= n and ask if [i, j) is a word AND at the same time if [j,n] can be 'segmented'. If so, then say that [i,n] can be segmented. So, in this, first we ask if [j,n] can be segmented since we have already addressed that issue and recorded the result, true or false.

So, the break with dynamic programming is that at stage i, we have to look at the results of not just stage i + 1 but at the results of each j for j = i + 1, i + 2, ..., n, which breaks the basic 'recursion' that is at the core of dynamic programming. That is, a key feature of dynamic programming is that at stage i, to find what to do, need only the 'state' at stage i and, for each possible state at stage i + 1, what to do at stage i + 1. That is, at stage i, do not have to look at stages i + 2, i + 3, etc.

Also for your approach, you are not working just at stage i but also with [j, i) for 1 <= j < i which is again a break with dynamic programming where we regard each i = 1, 2, ..., n as a stage.

No matter what some common practice is, just saying dynamic programming without being explicit about the stages, states, state transition functions, and the recurrence is sloppy to the point of risking often being wrong. This is not nearly the first time that computing took something solid and precise from applied math and made a mess. Messes, even if common, are not good. You will see a lot of care and discipline in Dreyfus and Law along with essentially all the applied math literature on dynamic programming, back to Nemhauser, Bellman, and others.

It's a cute string exercise with some cute solutions, but it's just not dynamic programming.

rfurmani · on Aug 8, 2011

I'm confused. Are you saying that if you have a DP table A then to compute A[i+1] you are only allowed to use A[i]? This question is classic DP. A[i] stores the segmentation of the first i letters if it exists (or it can just store a boolean, depending on implementation). Then to compute A[i] you look at the j<i see if A[j] is non-empty and if the [j,i) substring is in the dictionary. If you find a valid j you set A[i] accordingly, otherwise leave it empty.

NY_Entrepreneur · on Aug 8, 2011

There is what 'dynamic programming' is, and there is the 'word segmentation' problem of this thread.

The problem is interesting, and so are algorithms to solve it.

My view is that algorithms to solve the problem are not really dynamic programming.

If strain, then can regard the code I include below to solve the problem as dynamic programming, but can do such things quite generally.

For what 'dynamic programming' is, see the original books by R. Bellman or:

Stuart E.\ Dreyfus and Averill M.\ Law, {\it The Art and Theory of Dynamic Programming,\/} ISBN 0-12-221860-4, Academic Press, New York, 1977.\ \

Dimitri P.\ Bertsekas, {\it Dynamic Programming: Deterministic and Stochastic Models,\/} ISBN 0-13-221581-0, Prentice-Hall, Englewood Cliffs, NJ, 1987.\ \

George L.\ Nemhauser, {\it Dynamic Programming,\/} ISBN 0-471-63150-7, John Wiley and Sons, New York, 1966.\ \

Dimitri P.\ Bertsekas and Steven E.\ Shreve, {\it Stochastic Optimal Control: The Discrete Time Case,\/} ISBN 0-12-093260-1, Academic Press, New York, 1978.\ \

E.\ B.\ Dynkin and A.\ A.\ Yushkevich, {\it Controlled Markov Processes,\/} ISBN 0-387-90387-9, Springer-Verlag, Berlin, 1979.\ \

Wendell H.\ Fleming and Raymond W.\ Rishel, {\it Deterministic and Stochastic Optimal Control,\/} ISBN 0-387-90155-8, Springer-Verlag, Berlin, 1979.\ \

For the word segmentation problem, for some positive integer n, we are given a string of length n of characters of the alphabet, and we are given a dictionary of words. We are to show that the string can be partitioned ('segmented') into a sequence of words in the dictionary or show that no such partitioning exists.

A partitioning is a 'feasible segmentation'. We are only looking for a feasible segmentation and not all feasible segmentations.

Consider the string

     'catsdoll'

Okay, the 'substring'

     'cat'

is a word. But that word will not be in a feasible segmentation because the string that remains

     = 'sdoll'

has no 'feasible segmentation'.

Generally, even if there is a feasible segmentation, just because we have found a word does not mean that that word will be in a feasible segmentation.

Generally we suspect that somehow the solution will be 'iterative', 'incremental', maybe 'recursive'. So, in software terms, we suspect that we will have a do-loop with

     i = 1, 2, ..., n

or

     i = n, n - 1, ..., 1

For 1 <= i <= j <= n, we let s[i,j] be the substring of characters k of the given string for i <= k <= j.

Somehow when we get out of this look we want a feasible segmentation or a claim that there is none.

Let's consider a loop with

     i = 1, 2, ..., n

Okay, let's let b(i) = j > 0, i > j, if s[1, i] has a feasible segmentation and let b(i) = 0 otherwise. When b(i) = j, then s[1, j] has a feasible segmentation and s[j + 1, i] is a word in the dictionary.

In the loop, the pass for i gets the value of b(i) and in this uses the values b(j), 1 <= j <= i - 1.

If we come out of the loop with b(n) = 0 or n = 0, then we conclude that there is no solution.

Fine.

But it's not really dynamic programming. The obvious candidate for 'stages' would be i = 1, 2, ..., n. But in stage i, we have to consider essentially 'stages' 1 <= j < i, and it's a strain to consider that dynamic programming. That is, in dynamic programming, for the work at stage i, we're supposed to need to use only the results of the work in stage i - 1. Otherwise we have not really decomposed the problem into 'stages' with the usual recurrence relationship between stages. Or if we do the work i = n, n - 1, ..., 1, then to do the work at stage i we're supposed to need to use only the work at stage i + 1.

But if define the stages, decisions, state transition functions, and optimal value functions in relatively tricky ways, then we might be able to regard the problem as dynamic programming.

Below is some corresponding code. The code is in Rexx which is an elegant old 'scripting' language developed something over 25 years age by Mike Cowlishaw at IBM. It is fair to regard all the elementary values as just strings. If in addition, the strings are legal base 10 numbers, then Rexx can perform arithmetic with up to 50 significant decimal digits.

One special feature is a.x for any string x regarded as an 'index'. Then a.x will be a string.

The code:

     /* WORD01.RXS --                                      */
     /*                                                    */
     /*   Solution to 'word segmentation' exercise at      */
     /*                                                    */
     /*        http://news.ycombinator.com/item?id=2859182 */
     /*                                                    */
     /* Created at 15:40:00 on Monday, August 8th, 2011.   */

       exec_name = 'word01.rxs'

     /*   Read dictionary into 'stem' variable dict. from  */
     /*   file in_file:                                    */

       in_file = 'dict.dat'

     /*   Set the 'default' value of 'stem' variable       */
     /*   dict.:                                           */

       dict. = 0
       Do Forever
         If Lines(in_file) = 0 Then Leave
         word = Strip( Linein(in_file) )
         dict.word = 1
       End

     /*   So, now, a string word is in the dictionary if   */
     /*   and only if dict.word = 1                        */

     /*   The given string to check for a feasible 'word   */
     /*   segmentation' solution is:                       */

       input_string = ''
       input_string = 'a'
       input_string = 'b'
       input_string = 'up'
       input_string = 'downup'
       input_string = 'down'
       input_string = 'downzzzz'
       input_string = 'aintgottaword'
       input_string = 'gotword'
       input_string = 'catsdoll'
       input_string = 'therecanbeanotherreasontocontactbefore'

     /*   Then we find the length of this given string:    */

       n = Length( input_string )

     /*   Here is the description of the main logic:  For  */
     /*   i = 1, 2, ..., n, we find b.i so that            */
     /*                                                    */
     /*             /  0,  if Substr( input_string, 1, i ) */
     /*            |       has no solution                 */
     /*            |                                       */
     /*     b.i = <                                        */
     /*            |                                       */
     /*             \  j > 0, otherwise.                   */
     /*                                                    */
     /*   When b.i = j > 0, then                           */
     /*                                                    */
     /*        Substr( input_string, 1, j )                */
     /*                                                    */
     /*   has a solution and                               */
     /*                                                    */
     /*        Substr( input_string, j + 1, i - j )        */
     /*                                                    */
     /*   is a word in the dictionary.                     */

     /*   At in b.jm1 below, it is convenient for the      */
     /*   logic to have:                                   */

       b.0 = 1

     /*   Pass through this look determines if             */
     /*                                                    */
     /*        Substr( input_string, 1, i )                */
     /*                                                    */
     /*   has a feasible word segmentation:                */

       Do i = 1 To n

     /*   We preemptively set b.i = 0 and then in the      */
     /*   loop on j change the value if appropriate:       */

         b.i = 0

     /*   This loop violates the spirit of a dynamic       */
     /*   program with stages i:                           */

         Do j = i To 1 By -1
           jm1 = j - 1
           If b.jm1 = 0 Then Iterate
           word = Substr( input_string, j, i - j + 1 )
           If dict.word = 0 Then Iterate
           b.i = j
           Leave
         End

       End

     /*   Since in the loop on i above we had i = 1 To n,  */
     /*   it is essentially inevitable that here we will   */
     /*   have to work from i = n down to 1:               */

       If b.n = 0 | n = 0
         Then
           Do
             Say "There is no solution."
           End
         Else
           Do
             Say 'There is a solution:'
             k = 0
             i = n
             Do Forever
               j = b.i
               If j = 0 Then
                 Do
                   If i = 1 Then Leave
                   i = i - 1
                   Iterate
                 End
               k = k + 1
               segment.k = Substr( input_string, j, i - j + 1 )
               If j = 1 Then Leave
               i = j - 1
             End
             m = k
             Do k = m To 1 By -1
               Say Format( m - k + 1, 5) segment.k
             End
           End

kragen · on Aug 9, 2011

I just put a bottom-up reformulation of the dynamic-programming solution at http://canonical.org/~kragen/sw/inexorable-misc/wordseg.c, in the function "segment". The stages are the prefixes of s: the substrings s[0:0], s[0:1], s[0:2],... s[0:n], where n is the length of s. The state of some stage s[0:i] is a finite map from all of its segmentable prefixes s[0:j] {j∈[0,i)} to segmentations of those prefixes. (Only one segmentation per prefix.) The transition function may leave the state unchanged, or it may add a pair to the map, mapping s[0:i] to some segmentation, which it can do if ∃j:∃w∈dict: (s[0:j] || word) = s[0:i], in which case the segmentation is (state[s[0:j]], word).

(In my program, the state[] table is represented simply as a vector of integers: seglen[i] is simply the length of the last word of state[s[0:i]], or 0 if s[0:i] is not present in the finite map. This is sufficient to efficiently reconstruct a segmentation.)

This is completely different from your formulation where you're thinking about i+1, i+2, etc. It's not surprising that you think that your formulation isn't dynamic programming!

Now, this is a solution by forward induction or "bottom-up dynamic programming", and I wrote it that way because it's easier to see the mapping to dynamic programming. But if you solve the problem by backward induction or "top-down dynamic programming" instead, you may be able to solve the problem a lot more efficiently, because you can avoid computing most of the table entries. And that's what happens if you just write the recurrence out directly as a recursive function and then memoize it.

NY_Entrepreneur · on Aug 9, 2011

You may have a sufficiently tricky formulation to have a dynamic programming solution. But your states and stages are a bit strange!

Much of why we use dynamic programming is that the work at stage i needs only the work at stage i + 1 (for the backward iterations), and here in some problems we can get some huge savings in computing. Also this 'framework' does well handling uncertainty.

Yes, the usual way is to do find the solution from the end and then use it starting at the beginning. In the code I posted on this thread, I found the solution starting at the beginning and then used it by starting at the end and then printed out the words in the reverse order in which I found them.

anonymoushn · on Aug 9, 2011

Just to clarify, would you say that you cannot solve optimal matrix chain multiplication using dynamic programming, because each state depends on a non-constant number of other states? If so, what is the proper name for the commonly known algorithm used to solve optimal matrix chain multiplication, which is described in CLRS as "dynamic programming?"

NY_Entrepreneur · on Aug 9, 2011

CLRS can write what they want, but they can't rewrite what R. Bellman wrote.

Part of the problem of 'what is dynamic programming' is that if permit the 'states' to keep growing in 'complexity' with the stages, then nearly anything can be called dynamic programming. A big example is stochastic optimal control with an arbitrary exogenous stochastic process. In that case, at each stage, the state is just the whole past history of the process, and then for estimates of the future we take the conditional expectation given the history. So, to make stochastic optimal control, and discrete time stochastic dynamic programming, 'meaningful' we ask for a Markov assumption -- the past and future are conditionally independent given the present -- on the exogenous stochastic process.

The key, central point about dynamic programming is the computational savings we can get, when there is good news, from the basic recurrence relation. There at state i, get to do all the work using just the work did at state i + 1 for backward recurrence and state i - 1 for forward recurrence. If can't do this, and if the 'complexity' of the states keeps growing along with the stages, then are into essentially just arbitrary problems with nothing special, and little hope of gains, from dynamic programming.

How general? There is a big AI theme that the problem is solved: Just use dynamic programming! Alas, the state variables at each stage are the past history of the universe and, thus, a bit much to handle on Windows XP for now!

Again, once again, still again, for, what, a dozen times on this thread, the algorithm of the word segmentation problem is not really dynamic programming because at each stage need to use the results of the work not just at the last stage but at all the previous stages. Yes, can patch up the situation by saying that all the work of all the previous stages is now part of the current state, but if do that then the AI problem is also solved by dynamic programming because can just say that the current state has all the past history of the universe and then we don't have much for dynamic programming being anything different from everything.

For CLRS and "matrix chain multiplication", etc., to know if a solution algorithm is DP, the burden is on the proposer, not me. If someone describes the problem and writes out a clean description of why their solution is DP, then we can consider it.

But clearly what has happened is that essentially every algorithm in computer science with an outer loop from i = 1 to n is called DP, and that's not helpful.

eru · on Aug 8, 2011

At university we defined dynamic programming as reducing a problem to finding paths in directed acyclic grahp. By that definition, there's a simple way to map the problem to dynamic programming. And yes, it's sloppy, if you don't mention the `stages' or equivalently, how you'd (lazily) construct the graph.

NY_Entrepreneur · on Aug 9, 2011

One famous application of dynamic programming is finding shortest paths in directed acyclic graphs. Maybe should credit E. Dijkstra. As I recall, that is not regarded as the most efficient such algorithm, but I haven't considered that problem for years.

eru · on Aug 9, 2011

As far as I know Dijkstra's algorithm is the fastest path finding algorithm in general directed graphs without cycles of negative length. But you have to employ fibonacci heaps (or so) to get the best run-time.

If you have some heuristics to guide your path finding, e.g. if you are trying to find paths on a map, then you can do better than Dijkstra's algorithm. See A*.

eru · on Aug 10, 2011

P.S. There's also Bellman-Ford.

anonymoushn · on Aug 8, 2011

The article actually includes a solution that uses memoization, which is equivalent to DP.

ominous_prime · on Aug 8, 2011

I wouldn't say "is equivalent to"; rather "is a form of". While it is a dynamic programming technique, unless it's specified people tend to think of the bottom-up approach to be what's implied by "dynamic", with memoization being a special case. Yes, it's semantics, but I think that where the GP was coming from.

NY_Entrepreneur · on Aug 8, 2011

Yes, I can go along with the claim that the article uses memoization. But that 'memoization' is "equivalent to DP" is not correct, not even close to correct, really is just nonsense. What dynamic programming is has been solid, clear, and fixed all the way back to Bellman, and memoization just is not the same thing at all.

Yes, a memoization solution to the segmentation problem might work in a loop on i for i = n, n - 1, ..., 1, and the usual backward recurrence in dynamic programming does also, but that similarity does not mean that the loop is dynamic programming.

For more, dynamic programming has stages, states, state transition functions, and a basic recurrence relationship. The recurrence does the work at stage i just from the work at stage i + 1; the string problem does NOT do this. In the case of uncertainty, we also need essentially a Markov assumption,

Again, the string problem is cute, and there is at least one cute solution, but it's not dynamic programming.