

Making Everyone Go Away - tshadwell
http://www.bennjordan.com/blog/?p=590

======
michaelt
The traditional way to do this is taking a stack of aligned images and
applying a median filter:
[http://www.jnack.com/adobe/photoshop/fountain/](http://www.jnack.com/adobe/photoshop/fountain/)
which relies on each pixel in the median case, not having a tourist in front
of it. There might be tourist destinations so busy that over a set of photos,
a certain pixels contain a tourist more often than not.

Or you can just do it on christmas day:
[http://www.ianvisits.co.uk/blog/2008/12/25/deserted-
london/](http://www.ianvisits.co.uk/blog/2008/12/25/deserted-london/)

~~~
kerkeslager
Just so it's clear: the median filter way of doing this doesn't rely on
finding two images with non-overlapping people, it uses the open spaces from
all the images. As such, you don't need an exponentially growing number of
images.

You'll still run into issues with two things:

1\. Someone napping or being otherwise still throughout your photos will show
up in the finished product. 2\. Systems which are stationary but put out a lot
of internal movement (trees, video screens) will likely show up as random-
colored pixels within the range of their colors. For trees this would look
like a blur. For TV screens it would probably end up gray-ish or staticky.

~~~
alephnil
This remind me how the first picture of a human was being taken. Louis
Daguerre took pictures of busy streets in Paris, but because the photographs
had an exposure time of around 15 minutes, the streets looked empty. The
exception was a guy that had his shoes shined, since he had been standing
still for the period the picture was taken.

[http://petapixel.com/2010/10/27/first-ever-photograph-of-
a-h...](http://petapixel.com/2010/10/27/first-ever-photograph-of-a-human-
being/)

This can be considered a kind of analog filter (would this not be an average
rather than median filter?)

------
glimcat
This is a reasonably well known problem in video processing, often referred to
as "background extraction." It mostly amounts to running local outlier
rejection on the video frames then generating a composite image. There are
better and worse algorithms for this, but it's just noise rejection. Start
with a median filter, tweak window size and number of frames, exploit color if
desired.

Key trick is LOCAL outlier rejection. You don't take the median of the global
dataset, you take the median of a subset of frames. Then you do it to a subset
of results, and so forth until you get a pretty image. Then you can highlight
problem areas and go back and try to sample them from different data,
depending on how much you care about that. An incidental benefit of this is
that it lets you dramatically speed up the job by throwing CPU cores at it, if
that's something you care about.

(Lots of relevant academic papers for after that.)

The problem encountered in the article is that 2100 frames gives 22.4 MB per
frame, which napkins out to 5.8 gigapixels uncompressed. For reference, at 30
FPS that would result in 70 seconds of continuous video. Using high-res stills
is going to balloon your storage cost & processing time, which has nothing to
do with underlying problem.

A good workaround if you want a high-resolution _result_ would be to do
processing at a reduced resolution, then upscale from there. E.g. if you drop
resolution by 1/4 for processing, you could take the output and for each pixel
find the best matches in the source data and sample a larger window from those
to get a full-resolution result.

Or you could use one of those nifty video super-resolution algorithms that
have been popular in recent computer vision papers. Depending on what you
chose to do when you captured the data, and what you feel like implementing.

Times Square is still problematic, mainly because it has persistent crowds
during many times of the day. People move out of the way, crowds don't unless
there are gaps (which may be inserted due to e.g. stop lights, transit arrival
times). Best advice there is to catch it when the traffic is less dense, or
when it's disrupted (movie filming, accident, random variation).

~~~
lloeki
'video' makes me wonder if there's a way to 'hack' an encoder (h264/avc?) to
use P- and B-frame information to mark moving areas (that's what they're
supposed to be good at) so that "only" still data (I-frame minus what moved)
remains. Now you've got a lot less data to churn.

Also, in this process, could marking pixels with "stillness" probabilities
resulting in some blending factor (alpha channel?) before flattening the
I-frames thing work?

~~~
voltagex_
I may be completely off-track here but this seems to happen when VLC drops
frames in a h264 source - you get green/black blocks except for the moving
parts of the image. Maybe libvlc/libav could help you here?

------
crb
What we believe to be the first human being caught on photo, by Louis Daguerre
in 1838, was a shoe-shine boy in Paris.

The rest of the people around him were not captured by the long exposure, but
he stayed still long enough to be captured.

[http://www.dailymail.co.uk/news/article-1326767/Louis-
Daguer...](http://www.dailymail.co.uk/news/article-1326767/Louis-Daguerres-
man-shoe-cleaning-boy-humans-photographed.html)

Technology marches on, but the song remains the same.

------
jweir
Peter Funch has a series of compositions like this. Only he is adding not
removing.
[http://www.v1gallery.com/artist/show/3](http://www.v1gallery.com/artist/show/3)
scroll to the middle for examples of everyone holding a folder, or wearing
red, yawning, etc.

Pretty intensive work to get the photos, shadows and composition together.

Edit: a link directly to an example
[http://petapixel.com/assets/uploads/2009/12/babel11.jpg](http://petapixel.com/assets/uploads/2009/12/babel11.jpg)

~~~
tombrossman
Hugin can do some of this, too. I use it mostly for stitching panoramas
together but it is possible to hold the camera still and catch several
'instances' of a moving object. For example, here is someone pier jumping into
the sea, and you can see four instances of her while the other people (and of
course all stationary objects) are perfectly normal:
[https://imgur.com/h5Ysb6b](https://imgur.com/h5Ysb6b)

There was a news story I saw recently with a photographer doing this near an
airport and had dozens of planes in the final image, like a big traffic jam in
the sky.

~~~
jweir
Here is one such series of airplanes...

[http://twistedsifter.com/2012/02/picture-of-the-day-
striking...](http://twistedsifter.com/2012/02/picture-of-the-day-striking-
multiple-exposure-shot-of-takeoffs-at-hannover-aiport/)

~~~
lstamour
Actually, the update below says that it was basically made in Photoshop,
because there were planes that apparently don't serve that airport, and the
relative sizes of planes were off. Still a neat idea, but probably hard to
capture precisely like that without an automated stationary camera.

------
ChrisClark
Google+ Auto Awesome has a feature that does this for you. If you take
multiple pictures of a subject with your phone it will automatically give you
a photo with all the moving people/cars/etc from the scene removed.

The second feature listed on this page:
[http://googlesystem.blogspot.ca/2013/10/auto-awesome-
action-...](http://googlesystem.blogspot.ca/2013/10/auto-awesome-action-
eraser-and-movie.html)

~~~
reitanqild
Yep. And as much as I like the latest version of google+ sometimes it is
funny:

Removing the kids and their dad playing soccer so I can have a get a better
view of the toolshed. ;-)

(Tbh: AutoAwesome also created a "action version". I also gurss they are data
mining which AutoAwesome s we keep and which ones we delete.)

------
existencebox
Unless I'm entirely wrong, this is the same Benn Jordan of Flashbulb fame. A
hilariously talented individual, if you haven't already listened to his music,
you really should.

[http://theflashbulb.bandcamp.com/album/nothing-is-
real](http://theflashbulb.bandcamp.com/album/nothing-is-real)

~~~
rxdazn
I too was suprised to find that he was into tech stuff.

And yes he's done some really cool records.

(It does say on his homepage that he is The Flashbulb)

~~~
hansjorg
He posted some videos on YouTube a while ago of a live visualization system he
made:

[https://www.youtube.com/watch?v=t9jW3SYTtho](https://www.youtube.com/watch?v=t9jW3SYTtho)

Have to agree with existencebox, he is hilariously talented. His blend of live
guitar, drill and bass and acid is awesome (slightly annoying video editing
here):

[https://www.youtube.com/watch?v=4_SxlRQhHOA](https://www.youtube.com/watch?v=4_SxlRQhHOA)

------
lloeki
This is technological bruteforce. Behold a more low-tech approach[0]:

> _how do you photograph one of the biggest, most populous cities in the world
> without having any people in your pictures? With a lot of patience and an
> alarm clock!_

[0]: [http://blog.roberttimothy.com/2013/05/Deserted-empty-
London-...](http://blog.roberttimothy.com/2013/05/Deserted-empty-London-
photos-of-British-capital-without-any-people-28-days-later-style-
pictures.html)

[Also]: [http://humanless.org/](http://humanless.org/)

------
onion2k
I wonder how much of the difference in pictures with lots of people is down to
things like shadows rather than people obscuring the view. You might get a
'good enough' result by approaching the problem using averages of the pixel
values after applying a high pass filter - take a few hundred pictures, build
an array of all the same pixels in each image (eg 0,0), sort them by some
factor such as luminescence, and take an average of the most common within a
small tolerance.

~~~
graeham
Agreed, the original method seems very inefficient to me. (I think you would
want a low-pass filter though? And probably a mode-filter rather than a mean
filter?)

------
jamessb
It would have been nice to see the initial unsuccessful results using
Photoshop, since it looks like a median filter can potentially work quite well
at removing tourists: [http://nifty.stanford.edu/2014/nicholson-the-pesky-
tourist/](http://nifty.stanford.edu/2014/nicholson-the-pesky-tourist/)

The alternative approach is just to use a really long exposure time, like
"Silent World": [http://www.popsci.com/technology/article/2012-04/artists-
use...](http://www.popsci.com/technology/article/2012-04/artists-use-filter-
normally-used-nasa-remove-people-crowded-urban-places)

------
smoyer
It's interesting that the clouds overhead (in the Chinatown before and after
pictures) are averaged out, but the clouds on the horizon are reasonably
fixed. I'm guessing this has to do with the apparent rate of movement since
the clouds directly over your head always seem much faster than those
elsewhere.

------
tobw
This research paper that has a very similar idea: [http://people.mpi-
inf.mpg.de/~granados/projects/bgest/index....](http://people.mpi-
inf.mpg.de/~granados/projects/bgest/index.html)

They basically stitch together a set of images of the same place such that the
result does not contain occlusions. This is done by always selecting the input
image with the most frequent color, similar to OP's idea, assuming that
occlusions will cause 'outlier' color values. They use a Markov Random Field
to cleverly control where the stitching seams should be and use Poisson
Matting to create a smoothly stitched result.

------
jschulenklopper
The trick from the analog photography era seems easier: take many shots with
very low exposure on the same image frame from a fixed viewpoint. Over time,
the things that are static make a stronger imprint on the film, whereas the
moving things are too volatile to get recorded (in more than one shot). No
need for averaging a stack of digital images in Photoshop...

------
cessor
I wonder if the process can be improved by using long exposure pictures. You
might know the effect from long exposures at night: the lights of the cars are
in the frame, but the cars themselves have vanished. During daylight a similar
effect could be achieved - apply a strong neutral density filter to the lens,
so that the exposure can be increased to a couple of minutes maybe. The moving
people, cars and so on are not still enough to make the sensor react.

I was astonished when I saw this video (at approx 4:57 minutes in)
[https://www.youtube.com/watch?v=T24_uq0AY6o](https://www.youtube.com/watch?v=T24_uq0AY6o)
where the stream of people has almost vanished.

I wonder if the results could be the same but with less data, when you just
take a couple of long exposures and merge them, removing all things that are
too soft (-> movements) in the process.

However this is a really cool idea for a photo project :)

------
peteretep
2,100 images doesn't pass the sniff test for me, but I might be totally out.
I'd have thought if you spaced out 100 photos over a minute each, you could
take the mode pixel, and be pretty close. Am I massively under-estimating the
complexity?

~~~
stonemetal
Depends on tolerance for failure. Take a heavily trafficked area like time
square, what is the probability that your 100 pictures would capture a clean
pixel for a spot across the square even once, let alone enough for it to
"win". Say lots of people are wearing black coats, black might accidentally
"win" a few pixels here or there.

~~~
pbhjpbhj
I'd actually quite like to see a series of progressively enhanced images made
in this way. Like 1 pic every second, take, the mode pixels; repeat for 10
pics, 100, 1000, ... might be interesting. There's probably a range across the
number of images that ties with the busy-ness of an area to create some
interesting visual effects? Like is there a point where an image starts to
appear out of the noise, could that be an effective measurement technique
(counting insects/cataloguing their level of activity?)?

------
joshvm
Same kind of thing can be done to satellite imagery:

[http://www.wired.com/2013/05/a-cloudless-
atlas/](http://www.wired.com/2013/05/a-cloudless-atlas/)

------
andrewcooke
you want mode, not median. there's no reason to think that the background is
close to the median colour. that's kinda the whole point of the article...

(median works well if you have symmetrically distributed noise, which is true
when denoising astronomy photos, for example, but not here).

~~~
tomkarlo
I was thinking the same thing, but for sets of images where the occluding
items are a relatively small percentage of the image area (and are moving
around enough between frames), taking the median pixel value is effectively
the same thing as the mode, but faster. (e.g. if you take 10 frames, the 1-2
frames where a pixel is occluded will almost certainly be an outlier to the
8-9 frames where the pixel is almost exactly the same, and the median will
take the value in the middle of those 8-9 frames.

(Another problem with mode is that you'd have to posterize the image to ensure
that shifting light, noise and camera movement don't cause the values of
background pixels to vary slightly. Mode is much more brittle in that
respect.)

If I recall correctly, Lightroom / Photoshop has a handy "median" filter, but
no mode filter, which is why the popular method uses median.

------
sjtrny
See Figure 2 in
[http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf](http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf)

There's lots of work on background estimation. This is just one approach using
nuclear norm minimisation.

------
jacquesm
This is where 'crowdsourcing' could possibly be taken at its most literal. All
those Eiffeltower pictures that every tourist just has to take might contain
enough data to allow a complex adaptation of this technique to produce an
image _without_ those very tourists.

~~~
gohrt
This is Microsoft Photosynth's function, IIRC

------
onejoe
Also worth watching the painstakingly done Empty America series of videos by
Ross Ching: [http://rossching.com/empty-america](http://rossching.com/empty-
america)

------
lmm
If you go in the early morning in summer it's surprisingly easy to get shots
of major landmarks with noone around. (I've done this in London)

~~~
eldelshell
Yeah, also 8am on January 1st is a good day to feel alone on Earth.

~~~
mnw21cam
28 days later and 28 weeks later were filmed at silly o'clock in the morning
in London to get an abandoned feel. They just had to digitally edit out some
early morning commuter traffic on a bypass that was visible in a couple of the
scenes.

------
blueblob
what would happen to some things that aren't people, like a flag?

------
chiph
How does it not filter out the sun and other sky-related elements?

~~~
raving-richard
It does filter out clouds. Look at the before and after photos.
[http://i.imgur.com/NFNjBZD.jpg](http://i.imgur.com/NFNjBZD.jpg)

------
NAFV_P
You can do this with ONE photo.

Pick an urban scene that doesn't have an electronic goods store in sight, then
wait for the release of the next iPhone.

