Hacker News new | past | comments | ask | show | jobs | submit login
Programming Puzzles (arxiv.org)
93 points by pramodbiligiri 30 days ago | hide | past | favorite | 56 comments



A bit off-topic...

The best online programming puzzles were the ones ITA Software posted to entice people to apply for jobs there.

What was effective about them was that, reading the problem, you wanted to know what a solution would look like, and knew the only way to know was to solve it. Having solved it, you wanted to know how your solution compared to others'. The only way to find out was to send it in. Having sent it, you might get an invitation to visit, and then how could you not?

After Google bought ITA, the puzzles were up for a while longer, then seemed to vanish. If they are archived somewhere, I would like to know where.


(from the sibling archive link)

  > Roll Your Own Chat Server
  > Implement a simple standalone TCP-based chat server, using the following protocol:
  > ...
  > When implementing the server, aim for scalability and robustness. (Many submissions fail due to lack of robustness!) 
  > ...
  > Please do not use an existing networking framework
Oh man. You weren't kidding, that's high-precision nerd-sniping.


I suspect the earlier puzzles were most of the better ones.

What I sent in was the Add-a-gram (on that page, stashed behind the "Puzzle Archive" link). I sent one solution, and when they called, they admitted others were faster, so I redid it that evening ten times faster and sent that. What they never told me, even after I had been there for years, was that my second solution was faster than theirs or any others they had on file. It looks like they retired it right after I started there, maybe because I had shown it didn't actually require backtracking, which they were trying to filter for–ITA was mostly a Lisp shop, so they wanted people who would always reach for recursion first. My solutions were relentlessly iterative.

Some years after I left ITA, they posted the Bitvectors puzzle, which I had to solve, in a fever, but never sent off. (At the time I think it had a different name.) That solution also didn't require backtracking.


I just don't get how anyone can use recursion for anything serious. Especially when it can always be unrolled into a more comprehensible / representative of what's actually happening iteration.


Lots of problems have vastly simpler recursive solutions. Try putting a recursive descent parser into a loop. Plenty of tree walking/formatting/analyzing algorithms are much cleaner using recursion. Recursively walking directory structures is cleaner using recursion. Many puzzles, such as Tower of Hanoi, are cleaner with recursion. Many sort algorithms, like merge sort, quicksort, etc., are cleaner with recursion. I could find plenty more algorithms where recursion makes the solution cleaner, simpler (hence less error prone), and sometimes even faster to execute.

And some algorithms use mutual recursion to solve problems - often putting those into a non-recursive formulation results in incredibly messy code. Mutual recursion is powerful enough that F# (among others) added an entire keyword to the language simply to support it.

And since many languages do proper tail recursion, there is no penalty for using it properly.


In CS, you encounter lots of problems that are naturally solved by recursion not because such problems are especially common, but because they are favorites of CS professors.

Language parsers do a lot of recursion because the languages we invent are invented by CS people who like problems that can be solved that way, and because language itself is one that lends itself to that.

It is hard to tease out which recursions we find because professors just like it, and which ones real-world problems demand. Many commonly cited as the latter turn out, on close examination, to be the former.


>It is hard to tease out which recursions we find because professors just like it, and which ones real-world problems demand

As someone that works on real world problems all over the spectrum (google my name), I can say with extensive expertise that a significant number of problems are vastly better solved with recursion.

I didn't cite most of the list above because a textbook said so. It's because I use and design algorithms all over the map in all sorts of languages for a living, precisely for real world problems.

I take it you don't use this tool when appropriate do you?


In fact I always do.


Yet you've filled this thread with demonstrable fallacies regarding performance, overflowing stacks, and inability to see cleaner code.

Odd.


In fact, I wrote none of those. I don't know what "inability to see cleaner code" even means, if anything, so I could not have made claims about it.

But I wonder, do you really believe that overflowing one's stack is, universally, not possible?


>In fact, I wrote none of those

A quote from you: "You encounter it in the recursive case, painfully, when your stack overflows."

And we're done.


> but because they are favorites of CS professors.

I work on compilers and compilers are full of graphs. Graphs typically have a lot of algorithms that are naturally recursive. It's not because of some theoretical fetish.


But CS professors love graph problems. So, it's not so easy to say.


Manifestly, recursion is never strictly necessary, and is furthermore almost always slower, or anyway no faster, than you could do maintaining state "by hand".

But the recursive code can be much shorter, and easier to understand, because our languages have built-in apparatus to support it. They didn't always; original FORTRAN could not do recursion, and used no stack. The stack isn't free at runtime, but it's pretty cheap, and its fatal failure mode is rarely encountered nowadays.

In my work, I have always found the iterative version cleaner and faster, but that is an artifact of the kinds of problems I (and most people) solve. Having the iterative version enables more flexibility in details, because everything is explicit. E.g., escaping at the end can be much faster, and patching around errors actually possible.

Computer Science education loves to dwell on recursion far beyond its real-world usefulness, in part because CS professors carry a fondness for the Lisp of their youth, but also because arbitrarily complex problems that benefit from recursion are cheap to invent and easy to describe. In real life, recursion in any pure form rarely goes, usefully, more than three levels deep, but we like to make data structures that do.

At ITA, they did airline trip computations that would naturally recurse on the number of stops, but no one wants more stops, and time spent on possible trips with more than two is almost always wasted.

In a sense, recursion is like object-orientedness: an elegant pattern that fits many problems, but makes a poor basis for a world view or, exclusively, a language. Lisp and Smalltalk are not sidelined because of ignorance.


Ok. A fun exercise is to write a topological sort of a graph in both iterative and recursive styles.

I've used this as an interview question at least 100 times, so I've got at least 50 hours of watching people try to program this and with extremely high probability your iterative solution will be ugly and hard to follow and the recursive one very tidy. Of course, you can screw up the recursive one, but it so straightforward it almost writes itself.


I think in this particular case I'd go for the iterative one -- it's impossible to screw up (at least in the "keep track of the indegree" variant, I don't know if that's what you're thinking)

But the point you're making stands: recursion is first and foremost a thinking tool. Code as you think it need not be the same as code as you type it.


On an interview I would, of course, go straight for recursion. In production, and where performance matters, I might try both.

A more meaningful interview question would be to present a recursive topo sort and ask for the iterative rendition.


> Manifestly, recursion is never strictly necessary, and is furthermore almost always slower, or anyway no faster, than you could do maintaining state "by hand".

Recursion is required in procedures like Ackermann's Function [1]. You can still simulate or write functions that avoid the syntactic fact that the procedure definition refers (either directly or indirectly) to itself, but you'll always be papering over the fact that it cannot be implemented with a fixed number of state variables.

[1] https://en.wikipedia.org/wiki/Ackermann_function


> Recursion is required to implement non-primitive recursive procedures

This is very untrue. While loop and a stack.

> cannot be implemented with a fixed number of state variables

Of course it can't, any function outside DSPACE(O(1)) can't. The fact that you have an implicit call stack doesn't change the space complexity.


> This is very untrue. While loop and a stack.

>> You can still simulate

> any function outside DSPACE(O(1)) can't

The space complexity of the algorithm is only incidentally related to the number of unique variables required to represent it's state.

You said it best:

> The fact that you have an implicit call stack doesn't change the space complexity


As I said, and as you admit, contradicting yourself: "You can still simulate..."

We could better say that recursion is papering over a fundamental fact that the iterative form reveals. You encounter it in the recursive case, painfully, when your stack overflows. Assuming infinite memory is the fatal flaw in most modern systems.

Ackermann's function never appears in the real world.


>You encounter it in the recursive case, painfully, when your stack overflows.

People who write high performance code don't worry, because they write the recursion properly, and tail recursion doesn't use the stack.

If your understanding of recursion leads to stack overflows then you're missing a significant tool in your toolbox.

There is no more need to overflow a stack using recursion than there is to overflow a stack or heap trying to store state when unwinding a recursive function into an iterative one.

However, the recursive solution is vastly simpler in many cases since you don't have to fiddle with state, which is mutable and error prone, and can instead use immutable concepts (in any language) to write correct code more quickly with zero penalty in execution speed or space. It makes you a more valuable employee to learn these techniques.


Have you heard of this Turing fellow? He produced a result according to which a function such as Ackermann can be calculated with a read-write head moving over an unlimited tape. No recursion there.


I believe you can fairly easily spot recursive search tree implementations in the wild, in production code.

Compilers and interpreters often use recursion, and that goes to as many levels as the program requires.

Garbage collectors that use recursion to traverse the reachable objects are not unheard of.

Reference counting smart pointers are recursive. E.g. when we call object->dropReference(), and the refcount goes to zero, that may trigger otherObject->dropReference().

In C++, recursion is built into the language; e.g. destroying an object can trigger a cascade of recursive destructor calls. The object may have members which are smart pointers; those get destroyed, and may cause their referenced objects to get recursively destroyed.

Have you heard of a "recursive descent parser"? GNU C++ uses one (a huge source file written in C++, well over a megabyte long). This will recurse as deeply as the program's nesting goes; C++ programs often go to more than three levels of nesting. (There are some non-recursive hacks mixed in there, like some operator precedence parsing involving an explicit stack: Shunting-Yard or similar?)

https://github.com/gcc-mirror/gcc/blob/master/gcc/cp/parser....

Let's switch over to embedded. Have you heard of BusyBox? BusyBox provides scaled down system utilities for embedded systems. It is very widely used.

BusyBox's "libb" internal library contains a function called "recursive_action" for walking file system trees. This is actually recursive, and frequently goes more than three levels deep in actual use:

https://github.com/brgl/busybox/blob/master/libbb/recursive_...

This is used by BusyBox programs like mdev (udev replacement) lsusb, lspci, chmod, netstat, grep, tar, ...

Also, HN isn't a good place to exhibit Lisp condescension/ignorance.


You seem to be pretending that I said recursion was never useful. I can assure you I have never said any such thing.


I would never in a million years have guessed that anybody believed file system trees appear in nature. Or programming languages.


> In a sense, recursion is like object-orientedness: an elegant pattern that fits many problems, but makes a poor basis for a world view or, exclusively, a language. Lisp and Smalltalk are not sidelined because of ignorance.

In fact, in this case, the specific ignorance that Lisp isn't a "all recursion, all the time, everywhere" language family. It includes multi-paradigm languages.


You are welcome to your opinion as to why Lisp is sidelined.

There could be many reasons, such as dislike for its syntax. But that is not ignorance, just de gustibus


What I'm saying is that: the specific hypothesis that Lisp is brushed aside by some people because it has nothing to offer for repeated calculation other than the technique of recursion is necessarily a hypothesis about ignorant people. The hypothetical people hold an incorrect belief that is easy to dispel with readily available facts, which is a good working definition of ignorance. Other hypotheses may involve purely de gustibus matters; that one doesn't.

(Your wording strongly suggests that you not only believe the hypothesis but that you don't find those people ignorant.)


Okay, here's the very recent problem I had to solve. There is a huge dump of a VM heap, represented as a key-value map: the values are either simple atomic values or compound structures that have addresses (keys into this map) as their parts. There is a list of "entry points" into the heap: addresses of the GC roots in the heap. The problem is to reconstruct the values stored in the heap. E.g., reconstructing starting from "heap(10)" in this heap:

    heap(10): cons(int(1),heap(18))
    heap(18): cons(int(2),heap(20))
    heap(20): cons(int(3),nil)
should result in

   [1, 2, 3]
Here's the thing: recursive walk is totally straightforward:

    def restore(v):
        if v.kind...
        elif v.kind == 'tuple':
            return Tuple([restore(v[i]) for i in range(v.arity)])
        elif v.kind == 'cons':
            return List(restore(v[0]), restore(v[1]))
        elif v.kind...
        else:
            return UnknownValue()
But since Python has a limit of 1000 nested function calls, restoration of long lists fails with StackOverflow, and so I had to rewrite it using manual stack. The result is sort of ugly, in fact, I had to use two stacks: one for temporary reconstructed pieces of data, one for "what to do next". Essentially it's implementing a variation of the shunting yard algorithm and while the resulting code is kinda clear in intent, it's still three times as long as the recursive version and is not obviously correct. In fact, I had to tweak it several times (including the introduction of the second stack) before I stopped seeing some (not all!) lists reconstructed in the reverse order.


If you are used to Lisp-like constructs, tail recursion can give solutions that are both idiomatic and efficient.

[0] https://en.wikipedia.org/wiki/Tail_call


I.e., the compiler converts the recursive code you wrote into the iterative form you could have written, and runs that instead because it's faster.


I find this argument interesting because, to me, it leads to endorsing a view that we should be using goto and conditional jumps instead of for/while loops and similar structured programming elements. I mean, the former are more explicit, the latter implicit, since the compiler converts the latter into the former. Having maintained a fair amount of unstructured code, I'd rather stick with the higher level languages since they allow me to express my intent more clearly. Which pushes me to using recursion where it's appropriate and the language has appropriate facilities to handle tail call elimination.


We make languages and compilers the way we do for reasons.

But it is a mistake to confuse what automation hides from you, for practical reasons, with fundamental qualities of the universe or of computation.

Another common example of the error is describing an algorithm as N log N, assuming that the numbers involved occupy constant space and multiply in constant time. In real-world computation, space tends to log N, and multiply in close to (log N)^2 time and power consumption. We get away with it for the same reason recursion works: the problems we apply our algorithms to usually fit in conveniently constrained space, and our multiplier hardware takes the same time and power for small numbers as for the biggest ones we can conveniently represent.


Which would you prefer?

  def msort(xs, lo=None, hi=None):
    if lo is None:
      lo = 0
      hi = len(xs)
    if hi-lo < 2:
      return
    mid = lo + (hi-lo) // 2
    msort(xs, lo, mid)
    msort(xs, mid, hi)
    merge(xs, lo, mid, hi)

  def msort(xs):
    s = [[0, len(xs) // 2, len(xs), "step 1"]]
    while s:
      lo, mid, hi, step = xs[-1]
      if step == "step 1":
        if hi-lo < 2:
          s.pop()
        else:
          s[-1][3] = "step 2"
          s.append([lo, lo + (mid-lo) // 2, mid, "step 1"])
      elif step == "step 2":
        s[-1][3] = "step 3"
        s.append([mid, mid + (hi-mid) // 2, hi, "step 1"])
      else:
        s.pop()
        merge(xs, lo, mid, hi)


The one that doesn't blow up the stack.


Because you get a stack for free.

It's very natural for depth first search of a tree.


Unpleasant fact best learned early: the stack is not free.

The "very natural" appearance is an artifact of the languages we have (very unnaturally) constructed.


The machine stack given by modern CPUs is blazingly fast, because microarchitectures typically have a return stack buffer that allows the processor to perfectly branch-predict returns. Manually coding recursive algorithms means you have to switch over the return state, which involves a different predictor that will be less accurate.

Given that, and machines typically have an architectural stack pointer register with dedicated instructions to push and pop to it, the manual datastructure you come up with to simulate a stack is almost certainly going to be slower.


It is always an error to assume you know which of any architectural alternatives will be faster. You have to measure, if it matters.


Please do. This is my prediction based on years and years of optimizing machine code and measuring on recent microarchitectures. I look forward to seeing your handwritten assembly proving your overgeneralized claims about recursion being inherently slower.


I did not, in fact, claim that recursion is inherently slower than iteration. You are welcome to read what I did write, above.

Usually it is considered better (hint, hint) to do that before replying.


You wrote: "the stack is not free."

Which is even more vague (and broad) than what I was talking about. Given that we are talking about converting recursive algorithms into iterative versions that nevertheless use a stack datastructure, the comparison between the machine stack (used for recursion) and a program-defined stack is a perfectly appropriate criteria to evaluate your vague statement. I gave specific microarchitectural reasons why I predict the machine stack will be faster and you responded (a fairly reasonable response) that we should measure. So please do. I'm not the one making broad claims so the burden of proof still lies on you.


It is a pure fact that the stack is not free. That it is heavily optimized is itself proof of the fact; you don't optimize what is already free.

And, especially for heavily optimized features, you can only meaningfully measure specific cases, so generalizing is generally wrong. Which was the point of my remark.


Like any other tool? If it's useful, use it, if it's not, don't. In many graph problems and parsing problems, it's useful, because it lets you express your solution more "naturally" with respect to the underlying domain. That's where I've used it seriously.


If you don't get that then I have bad news for you: it's a powerful technique that oftentimes makes code shorter, more elegant, and improves readability.


Mmm, I remember doing this, with select(3) for maximum portability and all... then learning that neither Linux nor Windows actually follow what POSIX says select(3) should do, e.g. how to report a socket with non-blocking connect(3) operation when it finally fails to connect: it should be reported as readable (reading from it would fail immediately and would not block), writeable (same reason), having exceptional condition pending (it has a pending error).

Nope, Linux reports it as readable/writeable but not having exceptional condition, while Windows reports it as only having exceptional condition pending but not as readable nor writeable. Beautiful, just beautiful. In the end you have to put sockets in such scenarios into all three sets and check what getsockopt(s, SOL_SOCKET, SO_ERROR) returns for them, that's the portable way.



BTW, I got one of my best interview questions from a problem I encountered there.

Undergraduates solve it in two minutes; people with a BS need 5; MS 20; PhD never, or as fast as an undergraduate. The iterative solution is much simpler than a recursive one, so the question identifies how well-tuned your instincts are to problem spaces. If you go straight for recursion, you'll be there all day.

I found that everybody who found the minimal solution used the same hand gesture while describing it.


What is the question?


That would be telling.

I always recited the statistics at the outset, so that the interviewee would know how much complexity to expect in the answer. It never seemed to help them.



OT but still vaguely regarding puzzles - Jane Street has a pretty satisfying puzzle every month:

https://www.janestreet.com/puzzles/


> Based on a small user study, we find puzzle difficulty to correlate between human programmers and the baseline AI solvers.

Now that we have AI to solve "LC type of questions", can we stop asking candidates to invert a linked list in an interview.


Alternatively: ask them to code it, and hire the ones who refuse.


I, for one, welcome our new AI overlords




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: