

Ask HN: Compressing sets of rectangles... help me find an algorithm? - gruseom

I'm looking for a fast way to do the following: given a set of rectangles in the plane, return a minimal set of rectangles that covers exactly the same space.<p>By "minimal" I mean "having the fewest number of rectangles". However, it's more important that the algorithm be fast, so it's ok to approximate the smallest set if that would speed things up.<p>A variation, if that's too easy, is to introduce color: given a set of colored rectangles, return a minimal set of rectangles that covers exactly the same space, with the restriction that no rectangle can include regions that have different colors.<p>This feels like the kind of thing there must be standard work on (image compression?), so I've probably been searching in the wrong places. Do any of you guys have ideas about the problem and/or pointers about where to find relevant work? Thanks!
======
keefe
The first question I have is : am I correct to infer that the set of
rectangles must be a subset of the original rectangles? In other words, it is
not sufficient to simply generate an arbitrary set of rectangles.

Finding a minimum set of rectangles to cover a particular space can be a
tricky problem, I worked on rectangularization of dose curves in radiosurgery
and under certain (restrictive) error conditions, finding a minimum set of
rectangles to cover an x-monotone curve can actually reduce to an NP complete
problem (k-MCST). I suspect your error cases are not that restrictive.

Just off the top of my head, the first thing this reminds me of is the convex
hull problem as described here <http://en.wikipedia.org/wiki/Convex_hull>
where the gift wrapping algorithm is a pretty good solution. If you take each
corner point of a rectangle and each intersection point of any two sides of a
rectangle as an input to the convex hull, this will generate a set of points
on the boundary which are required for your covering. If original rectangles
are required, any rectangle with a point in the convex hull is required. This
generates a set of required rectangles, you could then remove these from your
set and repeat, maybe? Similarly rectangles incident on holes in the polygon
formed by the gift wrapping algorithm would be required. I think that's not
exactly perfectly right, but is a general approach that should work with some
tweaks.

If original rectangles are not required, compute the convex hull of corner
points and intersections and include any holes, then you have a polygon and
you need to decompose it into rectangles, that's a well known problem.

This also bears some similarity to the skyline problem
<http://acm.uva.es/p/v1/105.html> which uses a plane sweeping technique that
is very similar to the gift wrapping approach for convex hull.

I'm not really sure I understand the color restriction, that strikes me as
potentially complicating because of the similarity to graph coloring...

OK I could probably keep running my mouth about this kind of topic all day so
I am going to get back to bug fixing... if that is at all helpful and you need
more detail I can try to provide it

EDIT : <http://en.wikipedia.org/wiki/Convex_hull_algorithms> There's a list of
algorithms for finding the hull, perhaps a modification of the divide and
conquer technique [http://www.cse.ohio-
state.edu/~gurari/course/cse693s04/cse69...](http://www.cse.ohio-
state.edu/~gurari/course/cse693s04/cse693s04su85.html) would let you get your
coloring constraints in? OK, I have to get out of this thread and back to work
(:

~~~
gruseom
Original rectangles are not required. What's important is only that the new
rectangles cover the exact same portions of the plane as the original
rectangles. This rules out computing the convex hull and then decomposing,
because that would fill in any holes, i.e. the new rectangles would be
covering space that the old rectangles didn't cover, which won't work.

Maybe I can explain this more fully. We are doing computationally-intensive
processing of different regions of the plane. The order in which the regions
get processed is unpredictable. We need to quickly, precisely and compactly
answer the question: "which regions have been processed so far"?

Suppose you hire a guy to paint your wall. He's very anal (or, let's be modern
about it, has Asperger's) and only ever paints in axis-parallel rectangles.
He's easily distracted and liable to paint rectangles in arbitary locations,
so the process doesn't proceed in any linear way, but he does cover the wall
eventually. Paint is expensive, so you instruct him never to paint over the
same area twice. Our problem is to efficiently answer the question, "What
portions of the wall have been painted so far?"

Obviously we could just keep a list and add each new rectangle as it gets
painted. But as the wall gets filled in, many painted areas will blend
together. We'd like to keep the list as small as possible, i.e. we want the
painted-so-far rectangles to be as large as possible. If we didn't care about
that, we could just divide the wall into pixels and track the state of each.

It's not a disaster if we paint a few spots twice, but it's really bad to mark
an area as painted if it hasn't been. That's why we can't ignore holes in the
bounding polygon, i.e. space between painted rectangles. (Edit: actually no,
it is bad if we paint the same spot twice. Imagine that the paint explodes if
you do that. :))

The painting analogy can be extended to include color. In the original
problem, we're painting the wall with a single color and all we care about is
painted vs. not-painted. But if we introduce multiple colors, then we want to
track not just "which areas are painted" but "which areas are painted blue and
which areas are painted yellow". We still want to answer that question with as
few rectangles as possible, but now we're not allowed to merge blue areas with
yellow areas. That being said, a non-color solution would get us a long way.

NP complete problems are found everywhere in this space, no question about it.
But there are also often useful results around approximations and heuristics.
We don't need the optimal set, just small enough to make what we're doing
fast.

Edit: the above also sheds light on why spatial indexing structures like
R-trees don't work for this out of the box (though maybe they can be adapted
to do so). These structures let you add rectangles and find them again. But
they don't merge rectangles. If our painter takes a fancy to one-centimeter
squares and decides to paint a thousand of them all in a row, an R-tree would
end up with a thousand rectangles, even though the total painted area is a
single n-by-1 rectangle at every step. Something like an R-tree with a
"compaction" phase would be close to what we need, but maybe not close enough,
because insertion into an R-tree is expensive.

~~~
keefe
I'm writing this between moments @ my job so it is a bit rambling... I end up
talking about similar problems I've worked on, suggest an interface and then
in the end address the particular problem and what modifications of convex
hull I think could be used to solve the original problem we were discussing
including holes. Before I get into all that I want to point at this article on
StackOverflow dealing with triangularization of a polygon with holes
[http://stackoverflow.com/questions/406301/polygon-
triangulat...](http://stackoverflow.com/questions/406301/polygon-
triangulation-with-holes) and point out the typical algorithm used in
triangularization, <http://en.wikipedia.org/wiki/Delaunay_triangulation> This
paper also looks promising but I haven't read it :
[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi...](http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WG3-4BVPRY5-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=8cf8e1775c4672a08acda0eaab0ab695)

This is relatively similar to the problem I worked on where a dose curve had
to be decomposed into rectangles to minimize dose delivery time. A paper on
shape rectangularization from my advisor that addresses this is here
<http://www.springerlink.com/content/t2555274j2k16641/> but I suspect other
resources will be more immediately useful. The constraints were more
interesting in terms of here is a patient represented as a set of voxels and
here is the prescription and tolerance of each so improper decomposition could
lead to exceeding tolerance of a critical structure, but I digress...

My understanding of this is that what you would like to have in your system is
a data structure representing an arbitrary, complex polygon derived from the
intersection of a set of rectangles such that the data structure can be
queried with some rectangle and answer with another set of rectangles that
indicates that the unpainted portion of the original rectangle. So something
like :

PaintHistory{

    
    
       //returns a polygon or set of rectangles indicating
       //regions to be painted
       Rectangle[] findUnpaintedSubregion(Rectangle); 
    
       //adds the given rectangle to the internal representation
       //of the painted area
       void paintRectangle(Rectangle); 

}

So, you would want to query findUnpaintedRectangle as fast as possible and
receive your set of rectangles to pain which could then be added I assume you
already have such an interface that you are currently backing with some naive
scheme.

I think building such a structure could involve all kinds of clever data
structures depending on your exact constraints. If the paint really "blows up"
then it doesn't seem approximations are appropriate, but I am reminded of the
barnes-hut algorithm (for computing the N-body problem,
[http://www.cs.berkeley.edu/~demmel/cs267/lecture26/lecture26...](http://www.cs.berkeley.edu/~demmel/cs267/lecture26/lecture26.html)
) and its use of the quadtree data structure to divide up the plane
<http://en.wikipedia.org/wiki/Quadtree> , but I am not sure if that is useful
in this particular case.

Originally, we were talking about the related problem of decomposing the
intersection of a set of rectangles into a set of non-overlapping rectangles
that completely cover the area covered by the original set of rectangles and
no more. I think solving this problem efficiently (n log n) would suggest a
data structure that could give more efficient real time querying.

"...This rules out computing the convex hull and then decomposing, because
that would fill in any holes..."

I think all this means is that the convex hull is not sufficient in itself. If
you have the points in the convex hull and a set of points representing each
hole, then rectangularization should be easy because the convex hull should be
rectilinear.

What about this algorithm :

1) Compute the set of points which are intersections of rectangles or their
corner points. Here is a fast algorithm for this :
[http://en.wikipedia.org/wiki/Bentley%E2%80%93Ottmann_algorit...](http://en.wikipedia.org/wiki/Bentley%E2%80%93Ottmann_algorithm)

2) Compute the convex hull of these points to give a containing polygon.

3) Compute the set of all polygons that represent holes in the convex hull
polygon. I think this is a well studied problem, but you can imagine a number
of naive solutions and then optimize them. For example, any polygon
representing such a hole must be composed of corner points or intersection
points, which are in the set we already identified. Do a plane sweep in
increasing x direction and parse through the source rectangles, discarding
points covered by other rectangles and points on the convex hull. This could
probably be done as a slight modification of step 1. Definitely a tricky part,
but I think a solvable problem...

4) At this point you have a set of points representing the rectilinear polygon
that is a convex hull and you have a set of points representing each
rectilinear polygonal hole in the convex hull. So, now you just have to
decompose this complex polygon into rectangles - I think if you just draw
vertical and horizontal lines at each point and compute intersections, that
should lead to a rectangularization with minimal cleanup and you are done.

I hope some of that is helpful, certainly not a link to a complete solution
but all that comes to the top of my head now....

~~~
jibiki
> If you have the points in the convex hull and a set of points representing
> each hole, then rectangularization should be easy because the convex hull
> should be rectilinear.

I don't follow.

    
    
        aa
        aa  
      bbbbbb
      bbbbbb
        cc
        cc
    

Convex hull is an octagon (with no right angles)?

~~~
keefe
Can you just modify the gift wrapping algorithm such that the next point
selected must be on a straight line from the previous point?

<http://en.wikipedia.org/wiki/Orthogonal_convex_hull> Suggests it is doable in
O(n log n) still, but totally overlooked this modification in my post my
bad...

~~~
jibiki
> <http://en.wikipedia.org/wiki/Orthogonal_convex_hull>

Ah, that's what I was missing, thanks.

------
frossie
If I understood the question correctly... you want to Google for "minimum
rectangular covering" eg.

<http://www.springerlink.com/content/d707j4x7107qq362/>

Disclaimer: IANAMathematician

~~~
gruseom
That search term digs up some stuff I hadn't seen before that might be
applicable. The particular paper you mentioned starts with a polygon rather
than a set of rectangles, though.

~~~
frossie
But a set of rectangles outlines a polygon, right? You pack them however they
are and draw between exterior vertices. Voila, polygon.

~~~
gruseom
Space between rectangles make holes in the polygon. We can't cover any space
that wasn't already covered.

------
jdoliner
Well if you're willing to sacrifice a bit of speed I think I have a pretty
good heuristic algorithm you can use: Step 1: Find a maximal point, by this a
mean a point `p' s.t. no point has x >= p.x and y >= p.y. (such a point
clearly exists). Step 2: Now it shouldn't be too hard to see that for a
rectangle to cover `p' it would have to have `p' as one of its corners, so we
set one corner as `p' and then grow the other corner until it runs into walls.

This is just off the top of my head so I'm haven't really done much analysis.
You could make it a bit slower and actually get the rectangle that takes up
the most area instead of just growing until you hit walls. Not sure how much
that actually improves it.

~~~
gruseom
This raises the question of exactly how to grow the rectangles, i.e. how to
pick out the ones that have adjacent space to consume, what to do with the
leftovers after a merge (since the union of two rectangles may not be a
rectangle) and so on. It's too slow to examine all the rectangles every time
you want to answer one of these questions, so you need a way of arranging
them. I've written code that is vaguely similar to what you suggest (but is
too slow). It sorts the rectangles in one x-coordinate and then one
y-coordinate, which still leaves a lot of searching to do. Perhaps keeping
four sorted lists of the rectangles would help, but that's costly too, since
as you coalesce rectangles you need to insert and delete from the set.

------
tlb
I think Cairo, the graphics rendering library keeps an internal data structure
that's the set of overlapping filled regions and updates it with and/or/xor
operators.

~~~
gruseom
Thanks. Perhaps I'll ask about this on their mailing list.

------
mikhailfranco
Search for R-Tree and the many variants used for spatial indexing. You will
find the papers very pragmatic about the tradeoffs between optimal coverage
and speed of creation, update and traversal. Parallelizing with transactional
guarantees is still a bit of a research topic. It would be interesting to do
some experimental implementations in, say, Erlang.

~~~
gruseom
We've actually been deep into R-trees, and I wrote an Rstar-tree recently, but
the problem posted above, of coalescing adjacent rectangles that have some
sort of affinity (e.g. same color), doesn't appear in the R-tree literature
I've read so far. Probably you could do it using neighbor queries and
remove/reinsert, but this is probably too slow. If you have any further
suggestions along these lines, do please post them.

