
The skyline problem - brianpgordon
https://briangordon.github.io/2014/08/the-skyline-problem.html
======
bnegreve
This is an instance of finding the Pareto front [1] of a set of points in a
two dimensional space.

This problem is also known as " _skyline queries_ "!

According to [2], author's bound O(n log(n)) is the best bound for d=2.

[1]
[http://en.wikipedia.org/wiki/Pareto_efficiency](http://en.wikipedia.org/wiki/Pareto_efficiency)
[2]
[http://pdf.aminer.org/000/211/201/on_the_computational_compl...](http://pdf.aminer.org/000/211/201/on_the_computational_complexity_of_finding_the_maxima_of_a.pdf)

------
infinotize
To interviewers who have given this problem: how do candidates typically
perform? I would expect a lot of people to come up with some variant of an
O(n^2) solution on the spot, but probably not get much further without some
help, in an interview where you realistically have ~30 min to solve a problem.

Of course, this being HN, there are already several comments stating how the
optimal solution is obvious.

~~~
drostie
I'm also interested in how this varies by different languages.

For example, a programmer decently knowledgeable in JS will probably start
with the "pre-data-structure-change" solution presented in the article,
because if you're pretty good with JS then you know how events work and you
probably start with, "okay, I'm going to scan left to right and emit an event
on each new edge; that will look like this."

    
    
        var events = [];
        buildings.forEach(function (x) {
            events.push([x.l, 'add', x.h]);
            events.push([x.r, 'rem', x.h]);
        });
        events.sort(function (x, y) { 
            return x[0] - y[0];
        });
    

If you're lucky, they'll then say, "okay, I now need a height-stack where I
can query the largest height, so..."

    
    
        function max(arr){
            return Math.max.apply(null, arr);
        }
        var height_stack = [], out;
        out = events.map(function (e) {
            var x = e[0], y = e[2], cmd = e[1];
            switch (cmd) {
            case "add":
                height_stack.push(y);
                return [x, max(height_stack);
            case "rem": 
                height_stack.splice(height_stack.indexOf(y), 1);
                return [x, max(height_stack)];
            default:
                throw new TypeError("No such command: " + cmd);
            }
        });
    

(There are a couple of other ways to implement max() above, depending on
style; you could use .reduce(max_fn) for example, where max_fn is Math.max
truncated to exactly two arguments.)

But I wouldn't expect them to come up with binary heaps a la
[http://eloquentjavascript.net/1st_edition/appendix2.html](http://eloquentjavascript.net/1st_edition/appendix2.html)
. I guess the best you could expect the candidate to do is to say, "then I
Google someone who has done public-domain binary heaps and steal their code."

------
mpalme
The problem es essentially to map from an indiscrete space to a discrete one.
The solution that comes into my mind is a simple sweep-line algorithm:

1\. Take all the start x-coordinates of a buildings (=start events).

2\. Take all the end x-coodinates of the buildings (=end events).

3\. Sort all the events the by the value of the coordinate.

4\. Now walk over all the events and keep count of how many there currently
are (or in this case: how high the building is) by adding one for a start
event and substracting one for a end event.

[http://en.wikipedia.org/wiki/Sweep_line_algorithm](http://en.wikipedia.org/wiki/Sweep_line_algorithm)

------
prawks
A closely analogous problem which I have had to implement is the problem of
drawing n overlapping events with varying duration on a timeline, coloring the
event based on how many are underneath it.

Personally I find that a simpler problem to visualize, however it deals with
almost the same information. Rather than what height to draw, you're keeping a
count of how many events are underneath. You increment at each leading
"critical point" and decrement at each ending one.

The problem presented here is nice because it requires thought on both the
algorithm as well as the data structure. Very nice article.

------
signa11
if we _think_ about it somewhat, it becomes apparent, that it would take
linear time to merge one building to the skyline, and also two _skylines_
together.

merging skylines together gets us more bang-for-the-buck. and is trivially
done with scanning them from left->right, match x-coordinates, adjust
heights...

all of this is just divide-and-conquer, and usual recurrence rules apply i.e.
T(n) = 2T(n/2)+O(n), which means T(n) = O(n lg n)

------
srean
I am very bad at these things when put in a spot. OTOH if you would let me
pace around for a bit and stay invisible for a few minutes then its not too
bad.

Havent checked for correctness, but my off the cuff attempt would be to sort
the left and right edges, then use a variable for the height and a stack.
Proceed left to right. If I encounter a left edge and current height is lower
than stack, push height on stack and update current height. If left edge lower
than current, just push its value on the stack. If I encounter a right edge
that matches current height pop height from stack in to current height. If
right edge lower than current height, just pop the stack. The awkward case is
when the left and right edges coincide.

~~~
prawks
This is pretty much the solution given. I've encountered a similar problem
before (draw a bunch of overlapping squares with differing z-values) and done
it similarly. The difficulty you end up having is that a stack doesn't give
you a fast way to find the maximum height (what building/square) is on top at
any given point. Hence, the article suggests a max heap.

------
brianpgordon
This is targeted toward students who could benefit from someone hand-holding
them through the design of an algorithm, but anyone can learn something. It's
always good to have another potential interview question in your repertoire.

~~~
optimusclimb
Yup. I sure solve THAT problem every day. My boss comes in and is like, "well,
we have the bugs from that security audit, we need to come up with a design
for the feature we want to try and roll out in two quarters so we can agree on
it and cut tasks/divy up work, and then we're using two many boxes for the Foo
processor. I think you've never optmized that right? Why don't you have a look
at that. Also, don't use any standard library functions or look anything up.
When you're done with that, I have a question for you about manhole covers."

Yup, another great interview question for your repertoire to weed people out
and talk about the kinds of things that come up in the day to day of being a
modern software engineer.

~~~
legutierr
Does a modern software engineer not need to demonstrate an ability to design
efficient algorithms that solve problems that haven't been encountered before?

Or, in your mind, is the solution to every problem that a modern software
engineer might encounter already found in some stackoverflow post?

~~~
peeters
> Does a modern software engineer not need to demonstrate an ability to design
> efficient algorithms that solve problems that haven't been encountered
> before?

On the job? Sure they do. In an interview room? That's not an adequate
replication of the job environment. I am not asked to design an algorithm for
a problem introduced to me 17 seconds ago and have 5 minutes to do it.

Plus, you're committing the #1 sin of interviews; that many of the people that
can answer the question in an interview setting _have_ encountered the problem
before, and that's why they can answer it.

~~~
JoeAltmaier
That answers something that's been bothering me about interview coding. The
ones who pass, are the ones who've seen the problem before. So its not a
coding test nor a creativity test; its an experience test only.

~~~
cousin_it
I've passed many algorithm interviews without knowing the answer beforehand.
It's very possible, people just don't bother to learn. Any good textbook on
algorithms will give you the tools to create new ones. Mostly it's application
of a few basic concepts, like big-O complexity, divide and conquer, or dynamic
programming. Go through 100 problems on Project Euler and this stuff will
become second nature.

------
curiousDog
A regular google interview question as well!

------
tomjen3
Why not just sort the rectangles by the left side position, then walk through
the list, drawing a line from the current position to either the right of the
next rectangle (in case that one is taller) or to the right of the current
rectangle, in case it is not?

This even makes it easy to parallize it because you can use the quick sort
algorithm to split the rectangles in two non-overlapping sets and once your
two cores have made the two parts of the skyline you only need to update the
line between them.

~~~
Retric
Widths are not fixed. A tall skinny building might be drawn over a short wide
one.

Also, you end up as N^2 degenerate case if the buildings form a pyramid.

------
sgeisenh
If you think recursively, the n log n divide and conquer solution is pretty
straightforward.

The problem becomes much more interesting when you consider optimal parallel
performance. The best span you can do is log^2 n. I believe the problem can be
reduced to sorting, though it is not required to explicitly sort the sequence
of buildings in order to achieve the desired asymptotic performance.

~~~
cperciva
_I believe the problem can be reduced to sorting_

Speaking of which: Given values 0 < _a_i_ < _A_ for some _A_ , apply the
skyline algorithm to the rectangles _a_i_ < _x_ < A, 0 < _y_ < _a_i_. The
resulting sequence of heights gives you the values _a_i_ sorted in increasing
order; thus the skyline problem cannot be solved in less than _O(n log n)_
time.

------
CharlesC
Got this as an interview question at Blizzard. Failed pretty spectacularly. I
kept getting close but then the interviewer would jump in with "Well think
about it this way...what if blah blah blah?"

------
michaelmior
My first thought was to dump the rectangles into an interval tree. Also O(n
log n), but less space efficient since it always requires O(n) space
regardless of the structure of the problem.

~~~
brianpgordon
Are you sure that it's n log n? I think that the pyramid shaped configuration
defeats this strategy by forcing the interval tree to report something on the
order of n intervals with every query. So there are n queries on the interval
tree, and each query takes log(n) + n time.

Also, something interesting to note that I was thinking about yesterday is
that you don't actually need to use an interval tree for this. You can use a
range tree instead, which is simpler (to my mind). You only need an interval
tree to do a certain kind of query which we never need to do:

[https://briangordon.github.io/images/range-
interval.png](https://briangordon.github.io/images/range-interval.png)

~~~
michaelmior
I'm not sure what you mean by the pyramid-shaped configuration. In any case, n
queries taking log(n) + n time each is O(n log n). I think you're right that a
range tree should work fine.

~~~
brianpgordon
I mean this thing from the article:

[https://briangordon.github.io/images/worstcase.svg](https://briangordon.github.io/images/worstcase.svg)

> n queries taking log(n) + n time each is O(n log n)

No, it's O(n^2).

~~~
michaelmior
Wow, clearly commenting too late. Of course you're right :)

------
thewarrior
How would you efficiently find the critical points ?

~~~
nitrogen
The critical points are given as inputs to the algorithm in the form of the X
coordinates of the rectangles' edges.

~~~
thewarrior
D'oh .. Missed that . Thanks.

------
praptak
Fun fact: this is a specific case of finding the union (usually: area of) of a
set of axes-oriented possibly overlapping rectangles.

------
frownie
is it me or the algorithm suggested at the end degenerates if the rectangles
are organized as the steps of a stair and each of them extends to the
rightmost position ? In that case it degenerates because the heap manipulation
degenerates...

~~~
brianpgordon
I don't think so. No matter what shape the problem is in, there are always
exactly n inserts and n deletes to the heap. Those are O(log n) operations,
and there are 2n of them, so it's O(n log n).

~~~
wiz21
Ahhh read too fast :-( Crucial sentence is "Fortunately, we know about a data
structure which can keep track of an active set of integer-keyed objects and
return the highest one in O(logn) time: a heap."

But then again. If I read well, he wants to 1st sort the points, 2 iterate
over them and using the heap. So we have n log n for the sort and n * n log n
for the heap use.. Basically (1+n) n log n (or i'm still tired ? :-))

~~~
brianpgordon
The uses of the heap are:

1\. Peek the root of the heap to find the tallest rectangle in the active set.
This is a constant time operation.

2\. Add a new rectangle to the active set. This is a O(log(n)) operation.

3\. Remove a rectangle from the active set. This is a O(log(n)) operation if
you use something like a hash table to locate the element in the heap.

Each rectangle is added once (at its left edge) and removed once (at its right
edge). The root of the heap is peeked on every critical point. All together,
that makes the algorithm O(n log n).

Or am _I_ missing something?

------
arikrak
The animations are cool and helpful. How were they created?

~~~
brianpgordon
I used the trial of Edge Animate:

[https://creative.adobe.com/products/animate](https://creative.adobe.com/products/animate)

It's a really interesting tool. Here are the animation source files if you
want to play around with them:

[https://dl.dropboxusercontent.com/u/14282009/skyline.zip](https://dl.dropboxusercontent.com/u/14282009/skyline.zip)

~~~
arikrak
Thanks, I'll check it out!

