
Unbounded High Dynamic Range Photography Using a Modulo Camera - dnt404-1
http://www.media.mit.edu/research/highlights/unbounded-high-dynamic-range-photography-using-modulo-camera
======
dr_zoidberg
> No more will photographers or even ordinary people have to fumble with
> aperture size and exposure length.

This is a "computer scientists" understanding of photography, and this phrase
alone can even be seen as "dangerous" by photographers. There's more to
aperture size and exposure length than "how much light reaches the sensor",
like focus, depth of field, motion blur and bokeh, to name a few, that is not
aknowledged by that understanding. The technology is good, but it won't change
the way photographers deal with photography, at best, it will give them a
better tool to work with.

I also wonder how much impact this could have. Foveon Inc. had developed a
great sensor that was going to be revolutionary, about 20 years ago. Today
only Sigma uses it, and the UX is so bad in those cameras that the (very good)
technology they had their hands on never got a chance to shine.

Also, is this implementation very different from having a 16-bit ADC in normal
sensors? The end results would be the same: higher dynamic range. And the
increase in bits seems a lot closer (because it would mean little changes to
the current manufacture processes) than a whole new type of sensor being use
in cameras.

~~~
Tloewald
> This is a "computer scientists" understanding of photography, and this
> phrase alone can even be seen as "dangerous" by photographers.

The actual headline of the article is actually _more_ informative and less
misleading than the subhead:

"Unbounded High Dynamic Range Photography Using a Modulo Camera"

In photographer's terms, they're referring to ending _blown highlights_
(caused by filling the wells in a sensel) not _saturated_ images (caused by
exaggerating color data).

There's a school of photography that basically lives and dies by over-
saturating images (e.g. [http://kenrockwell.com](http://kenrockwell.com) ) and
they'd likely have a cow over the headline, but understand and be totally
onboard for the actual goal.

Depending on how quickly the well can reset and how many times resets can be
counted this will either be better or worse than simply improving existing
approaches. The state of the art in full frame sensors is just under 15-bits
of dynamic range. So to _equal_ it an 8-bit sensor would need to be able to
count resets at least 128 times AND be able to do this in 1/8000 of a second,
at the same sensel density and efficiency! Don't hold your breath.

[http://www.dxomark.com/Cameras/Nikon/D810](http://www.dxomark.com/Cameras/Nikon/D810)

(I'd also suggest that trends towards using multiple sensors to assemble high
resolution images will easily blow this away since they can use high- and low-
sensitivity sensors to synthesize more dynamic range and resolution. Also see
recent Olympus patents to capture polarization data at the same time.)

BTW: assuming they can do this stuff, they presumably can read the sensor
pretty darn fast — so they should be able to eliminate image "tearing" in
digital video and the need for mechanical shutters altogether. That's probably
a bigger issue than dynamic range.

I suspect that clearing a well in < 1/10,000,000 of a second (which would only
lose 10% of the photon detection time during a 1/8000s exposure) is going to
be tough. Assuming 25% of the sensel real estate needs to be sacrificed to the
modulo circuitry, that's a loss of 32.5% of photon data which is about the
same loss as for pellicle mirrors (as seen in Sony's pretty unsuccessful SLT
cameras) for a feature that won't be much use in many situations.

~~~
coldtea
> _BTW: assuming they can do this stuff, they presumably can read the sensor
> pretty darn fast_

What's funny is that the iPhone (and a high end Android phone) has way faster
processors than your typical $10.000 DSLR.

~~~
Tloewald
Actually they're not so much faster -- although the camera companies don't
have the economies of scale to iterate nearly as fast as the phone companies
(so they hang onto a processor generation for a couple of years).

A good smartphone can capture 4K video or 20fps 8MP or whatever off the sensor
— both around 160MB/s. A $10k DSLR is handling 16MP at 11fps — that's around
300MB/s (remember that the DSLR is handling 14-bits per sensel).

The interesting thing is that if you look across the camera lineups, there's
almost no difference in CPU between a $300 consumer camera and a $10k
professional (there is a big difference in RAM - the DSLR is able to store 50+
uncompressed images in RAM (let's say 3-4GB), while the phone or low end
camera processes them into JPEG before storage.

------
darkmighty
The title here is terrible because over-saturation is an effect of poor tone
mapping, which is inevitable if you want to display an HDR image in a low
dynamic range display. An accurate title would be the end of over- _exposed_
images, but why not just keep the original?

~~~
kps
Two different senses of the word ‘saturation’. The subtitle of the article
(current title here) is unfortunate.

~~~
tyingq
Yes. I expected to see an article about stock images being overused across
lots of publications, and whatever the fix for that is.

------
soylentcola
Looks to be more related to exposure than saturation but still, it's an
interesting new take on dynamic range. I'm more a hobbyist than an expert but
current tone mapping techniques (at least anything mostly automated) can often
leave you with "halos" around objects where the software has feathered the
edges between lighter and darker areas. Then the article mentions the issues
with using multiple shots for tone mapping.

I still try to just expose "properly" for the effect I want and shoot RAW to
give me a little leeway in tweaking shadows and highlights but I'd be
interested in trying out software that uses this technique if only for
something new to mess with.

~~~
darkmighty
Halos are a problem, but they can mostly be removed with good post-processing.
Note they're not a fundamental aspect of tone mapping. Tone mapping is just
whatever algorithm you chose to compress the HDR information into a low
dynamic range we have for displays. The great thing about this camera compared
to multi-exposure HDR (apart from the halo effect you mentioned) is you can
actually take pictures of moving objects or even make HDR movies (the temporal
variation should actually help with better phase unwrapping). That would be
amazing for the film industry I think.

~~~
soylentcola
Out of curiosity, with regard to video, what would be the difference between
what you describe and a nicer version of the sort of processing used in (for
example) some Axis security cameras? I guess the current method is just to
capture a decent range and then process each frame to boost the gain on
shadows and lower the gain on highlights, right? Sort of an extension of
quick-and-dirty HDR on still photos.

Either way, this is the sort of stuff that has always interested me about
photography and videography. The creative aspects are obviously the end goal
for most people (myself included) but I've always been a sucker for the
technical aspects of it...the craft to the art as it were. Nothing prompts me
to pick up my camera and shoot some photos or some video like learning about
some new tool or technique that I want to try out. It may be the mark of the
eternal amateur but still enjoyable.

~~~
darkmighty
I'm not familiar with those cameras. But if they're using a conventional
sensor, this technique yields qualitatively different results.

------
slr555
I am probably less technical than most of you but the upside of this
technology (at the point it is real world ready) might be producing what I
would call a RAW file that is RAW(ER). By that I mean the file would simply
contain a wider range of scene information that could be accessed by Adobe
Camera Raw or the like. For the class of photographers that have no interest
in RAW the information would help the JPEG engine in the camera make a
"better" informed JPEG (or whatever the standard is at that time). To me
capturing more scene info is better than less assuming there are minimal
penalties being paid elsewhere (file size, burst shooting etc).

As a side note, it seems to me that aperture and shutter speed would still be
essential as they affect not only exposure but depth of field and subject
motion. Perhaps a technique that allows proper exposure independent of
fstop/shutter/ASA would allow exploration of extremely wide aperture with slow
shutter but no need for ND filters.

In any event it will be interesting to see what happens.

------
Mithaldu
What's with the completely useless thumbnail at the bottom?

That said, the technological explanation sounds simple and solid. Why aren't
cameras doing that already?

~~~
putterson
I wondered the same; it takes two hyperlinks to get to the project's home page
where they have a larger version.
[http://web.media.mit.edu/~hangzhao/modulo.html](http://web.media.mit.edu/~hangzhao/modulo.html)

~~~
Mithaldu
Nice, thank you!

------
baldeagle
It seems like this would make big glass even more important. This solution
protects for over exposure, but doesn't address under exposure on a scene that
moves (think indoor bar photography). So it isn't a end all solution that will
everyone capture reality with their cell phones, but it is an awesome step in
that direction.

------
kefka
I had an idea similar to this that solved the oversaturation issue.

Use 2 cameras. Have them at right angles to the shutter point, separated a few
inches. At the T junction where the camera pointing would cross with the
shutter area, put a prism there.

The prism would give each camera 1/2 of the available photons. Now use any
standard HDR processing technique.

~~~
paulmd
Interestingly enough that's how Technicolor used to work. Three separate
exposures from a single lens, split off by pellicle mirrors.

The downside is that like any pellicle you are dividing your light between
multiple exposures, meaning you some mixture of a longer exposure, a faster
lens, or more lighting. Technicolor required special high-intensity arc
lighting. Also, the pellicles must be kept spotless because dust shows up on
the final image.

Next-gen sensors will actually implement this physically - some receptor sites
will be larger than others.

~~~
kefka
Admittedly, my understanding of traditional optics is very limited. I happened
upon it in a dream whilst trying to figure out how to handle dynamic scenes
while using SLAM.

I was thinking of a prism (new word: pellicle mirror) like this one :

[http://static1.squarespace.com/static/54ca877ce4b014ea90e14b...](http://static1.squarespace.com/static/54ca877ce4b014ea90e14bda/54ca9f5de4b021f8b6d68cc1/54caa186e4b021f8b6d6c6b8/1422565766946/penta_3.png?format=original)

Except I don't need the deconvolutional prism. I can handle that in software
easily with a X or Y matrix flip. And I assume we are using the same CCD
module for both areas; given that, I can sum the contrast between the pixels
and calculate the min and max, as well as can calculate the HDR image from
both sources.

Calibration would be easy to handle, using a checkerboard to link the CCDs to
each other (assume aligning errors will happen).

I also see no reason why the pellicles wouldn't be kept clean. The same goes
for the CCDs as well. I'd embed them in a dark acrylic box and try to seal
them as best as can be.

~~~
paulmd
The acrylic box blocks all your light (dark is bad), and now you have to keep
THAT clean instead. Any surface in your optical path needs to be kept clean,
but especially those that are between the center of the optical system and the
sensor. Dust on the front element doesn't matter too much, dust on the rear
element matters a lot and pellicle mirrors are even worse since they're closer
to the focal plane. Scratches from cleaning are real bad too.

As long as your lens system is sealed it's not a problem. But many systems are
interchangeable-lens, and when the lens is removed then dust can float in. Or
some lenses generate a "vacuum" effect as they focus in and out.

Sensors often incorporate an ultrasonic motor which vibrates it and throws
dust off. Or some sensors are mounted on shake tables to reduce blur from
operator motion - you can do a similar thing there too. At the end of the day
you just have to be careful though.

Like I said, totally possible optically, and yes, very simple to process. The
downsides are that when you split the beams each sensor only gets half the
light intensity, which means you need to get more light into the optical
system in the first place, and you need 2 sensors, which are by far the single
most expensive component in the whole camera.

You can actually do something similar with Magic Lantern, a hacked-up open-
source firmware for Canon cameras. They scan alternate rows of the sensor at
different sensitivities, and use the low-sensitivity row as the most-
significant-bits of dynamic range for the high-sensitivity image. You lose a
bunch of vertical resolution, but hey, you don't need 2 sensors.

------
jlarocco
I wonder how this compares to just adding a few extra bits to the sensor data.
The paper
([http://web.media.mit.edu/~hangzhao/papers/moduloUHDR.pdf](http://web.media.mit.edu/~hangzhao/papers/moduloUHDR.pdf))
compares against an "8-bit intensity camera", but most cameras today use 12-16
bits internally. Isn't "this pixel overflowed" essentially just an extra bit
of information?

The pictures using the "modulo camera" are definitely better quality, but
don't look noticeably better than what I could do with a RAW from my camera
and 2 seconds in Capture One or Lightroom adjusting the shadows and
highlights.

------
salimmadjd
As a photographer who at times needed to deal with HDR this is great news. In
the past, I had often wondered why some scheme like this wasn't possible.
However, the title of this post is horrible. I understand they were trying to
market it in a way it was accessible to most readers. That said, photography
has become quiet popular and most people now understand HDR. So they could
have just presented as true HDR with only a single frame image.

------
oppositelock
I would love to feed a photo of an overexposed zebra to their reconstruction
algorithm and see what happens. I think it will be spectacular!

------
dharma1
Read about something like this on [http://image-sensors-
world.blogspot.co.uk](http://image-sensors-world.blogspot.co.uk) a while ago -
instead of saturating the CMOS when max brightness is exceeded, it just keeps
going.

Looking forward to seeing this commercialised.

------
IshKebab
This is a great idea. It would be interesting to see the failure modes though
because the phase-unwrapping surely can't handle every situation (e.g. adding
a constant value >255 to the whole image).

------
megrimlock
Anyone know what they mean by an "inverse modulo" algorithm? That only seems
meaningful for numbers coprime to the sensor ceiling value.

------
Tekker
"Beginning of the end for over-saturated images" ?

Well, then, say goodbye to the Disney channel, the most oversaturated channel
out there.

------
paulmd
Here's an idea, instead of separately tracking the number of "resets" we just
add those bits on to the left hand side of the measurement. We could call
it... having more bits in the ADC. Expose for the highlights and then pull
more detail out of the shadows with your extra bits of ADC precision.

There are not 1 million ADC units in a 1-megapixel camera, there's a few ADCs
(a Canon 7D has 2 image processors with 4 ADCs each) that are iterated across
the CCD sites to progressively read them out. To make this new sensor, you
need to cram 50 million sets (50mp is state of the art) of voltage
comparator/charge reset/digital counter circuits onto the CCD, and for N bits
of rollover accuracy they need to be able to trigger, erase the charge, and
resume exposure at least N times during the exposure time E (which is say
1/500th of a second, since they're complaining about movement during multiple
HDR exposures). Practically speaking the trigger/operation/resume period must
be significantly less than E/N since I see no way to retain the exposure
during the period when the charge well is being drained to zero. If this time
is non-trivial, that translates to losing the fine bits of your ADC accuracy.

In their image they compare a 13-stop exposure (the current state of the art)
to an 8-stop exposure (state of the art in 1900). So assuming they didn't just
pull a photoshop out of their ass, they are overall claiming a 5 stop increase
from state of the art. That's 5 extra bits of recovery (1 stop is double the
range, i.e. an extra bit), so they think they can trigger the reset circuit 5
times during an exposure. That implies this circuit must have a minimum cycle
time of 1/2500th of a second (practically speaking a lot less to give time for
the actual exposure). With this math I'm also assuming that the comparator is
perfect and doesn't lose any accuracy - variance in trigger threshold or
trigger time translates to losing some of your accuracy again. They also need
to generate little enough waste heat to avoid hot pixels (this is a major
reason the ADC is on the image processor rather than the CCD), and you need 50
million sets of these on the CCD.

If you can do it then go for it. It's a nice idea on paper but I think there's
a lot of physical obstacles to overcome. If it were that easy someone would
have done it already. It's especially difficult given that reading out the
sensor is _destructive_ \- measuring it wipes the charge, so I'm not sure how
any of this would work at all given that.

Now what would actually be interesting is to apply the image-processing
techniques to dual-DR imaging, as they mention in their "related works"
section. The open-source Magic Lantern firmware for Canon DSLRs allows you to
scan alternating rows at different ISOs, so effectively you can capture a lot
more dynamic range at the expense of your vertical resolution. It's all
volunteers and they probably haven't applied all the fancy image-processing
magic with the convolutions and the hippity-hop rap music. How about
reconstructing those overexposed lines instead? Or working on some of the
sensors with physical implementations of dual-DR?

~~~
mungoman2
Reconstruction is based on guessing in software/ISP. That way you don't need
to keep track of the number of resets.

------
jcr
The original title of the article is, "Unbounded High Dynamic Range
Photography Using a Modulo Camera"

------
vernie
Can somebody explain how this design differs from existing self-resetting
pixel designs?

~~~
oppositelock
I'm not sure what you mean by self resetting pixel designs. Do you perhaps
means the self-shuttering (rolling shutters) on CMOS sensors?

This is very different. If you oversaturate either a self shuttering sensor or
a shuttered CCD, that information is gone. All you know is that this is "100%
bright, or more". Your software which turns raw sensor readings into an image
makes the area full brightness, and you get blow outs.

With this approach, say the light is bright enough to make a pixel read 567%
(if that as possible). A standard sensor would record 100%. This sensor would
empty itself five times and record 67%. Then, the algorithmic magic happens
which looks at neighboring areas and figures out how many roll-overs happened
and adds the 500% back in.

You can't do that perfectly (hence my comment before about an overexposed
zebra), and it will introduce new forms of aliasing, kind of like de-bayering
the bayer masks makes moiree patterns.

~~~
vernie
I've seen CMOS pixel designs in the past that can detect saturation and
effectively dump the well (reset) repeatedly until saturation no longer
occurs.

------
jchrisa
Reminds me of the implementation of the TRNG from yesterday's front page.

------
brador
This was the runner up prize winning paper, what won?

~~~
Joeboy
[https://twitter.com/ICCP_2015/status/592403545468448768/phot...](https://twitter.com/ICCP_2015/status/592403545468448768/photo/1)

~~~
mjcohen
That's a picture. Where's the paper?

~~~
amritamaz
Here's the paper link:
[http://www1.cs.columbia.edu/CAVE/publications/pdfs/Nayar_ICC...](http://www1.cs.columbia.edu/CAVE/publications/pdfs/Nayar_ICCP15.pdf)

------
iliis
A related, but quite different technology I worked with are so-called Dynamic
Vision Sensors (DVS, [1]): This is a CCD that was developed to mimic a
biological retina and only reports _changes_ in brightness, asynchronously for
every pixel. Essentialy, you get a stream of short messages saying "Pixel XY
just got a little brighter at time T".

While such a thing is not very useful for taking pictures (pointing the camera
at a static scene will generate no output at all) or videos (you don't get
discrete frames, you get a continuous stream of differentiated pixels), it has
a lot of potential for computer-vision tasks:

For one, as the camera only reports changes, only the interesting information
is transmitted which greatly reduces computational load (eg. no need for "yes,
that white wall is still there"-style calculations for every frame).

Secondly, because these changes are reported independently per pixel, they are
very fast: Microsecond accuracy with a dozen or so uS delay are easily
achievable. For comparison, a fast camera with 120 FPS (which will produce _a
lot_ more data to process) has an accuracy and latency of 1/120s = 8333uS.
This also implicates that motion blur is basically nonexistent.

And thirdly, as absolute intensity doesn't matter, you don't have any problems
with high dynamic range either.

The only downside is, that you don't really get a picture and most of the
traditional computer-vision approaches are unusable ;)

But still, such a camera is very neat, even if you just track the movement of
your mice or want to balance a pencil on it's head[2].

The real deal however is full vision based 3D-SLAM, which would allow for very
fast and robust movement without any external help from motion capturing
systems or expensive (in money, weight and power) sensors like laser scanners.
AFAIK, we're not there yet (see [3] for some work in that direction), but that
would bring pizza delivery with a drone directly to your desk quite a bit
closer to reality...

\--

[1]
[http://www.inilabs.com/products/davis](http://www.inilabs.com/products/davis)

[2]
[http://www.ini.ch/~conradt/projects/PencilBalancer/](http://www.ini.ch/~conradt/projects/PencilBalancer/)

[3]
[http://rpg.ifi.uzh.ch/research_dvs.html](http://rpg.ifi.uzh.ch/research_dvs.html),
also
[https://www.doc.ic.ac.uk/~ajd/Publications/kim_etal_bmvc2014...](https://www.doc.ic.ac.uk/~ajd/Publications/kim_etal_bmvc2014.pdf)

------
tlb
Title changed from "Beginning of the End for Oversaturated Images"

------
cozzyd
Instagram and the like will ensure the future proliferation of shitty images,
no matter the technology.

