

OpenRoss – fast, scalable, on-demand image resizer - Peroni
http://developers.lyst.com/data/images/2014/06/23/openross/

======
heynemann
There's also thumbor
([https://github.com/thumbor/thumbor](https://github.com/thumbor/thumbor)).
It's a very mature implementation of this type of server and has been very
battle-tested ([https://github.com/thumbor/thumbor/wiki/Who%27s-using-
it](https://github.com/thumbor/thumbor/wiki/Who%27s-using-it)).

At globo.com we have near a billion images (we are a big portal). Can you
imagine pre-generating that many images every time a new format gets added?

We serve everything with thumbor with a Varnish cache in front of it and we're
very happy with it. It has enabled our designers to work with any image size
they can think of.

If you guys need more info, please check thumbor's docs:
[https://github.com/thumbor/thumbor/wiki/](https://github.com/thumbor/thumbor/wiki/)

~~~
OutThisLife
Is the amount of traffic the site gets worth billions of images?

~~~
heynemann
Definitely... We get around 50M page views/day in our website. And since we
are a media company we need to have a storage of images from celebrities,
sports and news in general.

------
m0th87
I'm a fan of pilbox [1]. It's "fast" (builds off of OpenCV + Tornado), easy to
understand and does much more than just image resizing. Also, it should be
simpler to deploy than OpenRoss - since it doesn't bake in nginx, you can
deploy it to Heroku (which is what we do), or behind Amazon's ELB.

We use it on a fairly large production site, and it works fine. If you throw
it behind CloudFront or some other CDN that supports cache busting, it's
plenty fast.

1:
[https://github.com/agschwender/pilbox](https://github.com/agschwender/pilbox)

------
willcodeforfoo
I just built a very simple version of this in about 100 lines of go, using
[http://github.com/disintegration/imaging](http://github.com/disintegration/imaging)
for the image processing.

I was surprised how... latent downloading from S3 is, and how it didn't seem
to matter if I was running on Digital Ocean or EC2, downloading the image from
S3 was always the slowest part. (Digital Ocean was actually faster!)

It takes about a second to process an image, but it is cached in front of
CloudFront similar to the article.

Any tips for improving the latency from S3?

------
simonw
I've worked with systems that pre-size images on creation, and I'd much rather
work with a system like this that resizes and caches on demand.

You never know when you're going to need a new size (the introduction of
retina images a few years ago doubled the dimensions of images you need to
serve for example, and your design team is likely to come up with new size
requirements occasionally as well) and backfilling to resize millions of
images is a big pain and costs a lot in terms of storage.

------
archier
This architecture can definitely save a bundle on storage costs but usually it
runs into slowness when resizing, or, run into problems with image quality
when resizing repeatedly.

22ms for a GraphicsMagick resize is quite fast. I'm curious what the average
input and output sizes were used when computing that number?

------
jacklevin74
[http://imageshack.us/pages/resize/](http://imageshack.us/pages/resize/) \--
Our Imagizer cluster does 6 Gbps at the moment thats 750MBytes/sec all day
long, its fast and scalable ;)

[https://imageshack.com/discover](https://imageshack.com/discover) All of the
images here are rendered with Imagizer; We store all originals in our
HBASE/Hadoop cluser, while Imagizer does on-demand tranformations.

It works with non-imageshack links too:

[http://imagizer.imageshack.us/v2/500x500q90/http://actionfor...](http://imagizer.imageshack.us/v2/500x500q90/http://actionforspace.com/wp-
content/uploads/2014/04/star-trek-voyager-ship-hd-wallpaper-background.jpg)

------
Gigablah
Why not skip the backend completely and just use nginx + cdn?

(Have nginx proxy_pass to S3 and image_filter the response)

[http://nginx.org/en/docs/http/ngx_http_image_filter_module.h...](http://nginx.org/en/docs/http/ngx_http_image_filter_module.html)

~~~
Peroni
The CDN does not have all sizes of images. We use the ondemand resizer+cache
because we may modify our website design in the future and need a new image
size. Serving the exact image size makes our pages faster to render and saves
bandwidth. Plus that image filter is quite limited and doesn't handle
compositing.

------
bluejade
At populr.me, we use a variation on the architecture described here. One
difference is that we cache resized images to S3, rather than to on-disk
cache. This enables all servers to share the cache. Otherwise, when a new
server is brought online, it doesn't benefit from the cache, so for a time,
every request it receives incurs the most costly path of source image
retrieval and resizing.

An added benefit to caching to S3 is that since S3 won't run out of space, we
can cache rendered images for longer (we use S3 lifecycle to keep cache
expiration simple). The scaled images tend to be smaller than the source
images, so the retrieval from S3 is pretty fast. Over the past week,
retrieving scaled images from S3 has cost ~46ms versus ~84ms for the larger
source images.

------
saetaes
I'm curious to know how people feel about offline (pre-transformed) vs. on-
demand transformations. Are there any HN'ers out there that have worked on a
site with a large set of images, and have an opinion on this? Adobe's Scene7
product works in an offline mode as far as I can tell, and seems to have
captured a large segment of retail companies with product catalogs.

~~~
hungnv
I had worked for a social network, and our system provides a function to let
user upload their photo then transform it to some fixed size of original one.
We did have pre-transformed and on-demand too. Pre-transformed for the image
that's most viewed by user, like new feed's photo (720x720), large photo
(1024x768), and the origin one (if user's screen is detected as big screen),
we have to resize it asap. Other sizes, like thumbnail, we do on-demand
transform using nginx resize filter plugins, and caching using varnish and/or
traffic server. That system have been working well until this time. I would
say on-demand transformation is good idea, since you don't have to store
resized-image that's never viewed by any user, so you save your storage. But
that idea must be implemented well, very well if you're going to serve million
users.

------
jondot
As a healthy feedback - the correct architecture sits right below your nose,
since you already have all of the components.

You should pre-process images in your scraping cycles, and not when a client
comes to request it.

In this way, your "scale" is always predefined, bounded, expected, and much
smaller - defined by your scraping scale and not user scale.

Good luck!

~~~
dalke
Isn't that what they started with, and decided against? "In our infancy, we
saved all product images with 10 preset sizes, and then rendered the image
which was nearest in size to what we required. As we grew, this solution
became unwieldy for the levels of traffic we were experiencing and nor was it
appropriate for our mobile app."

~~~
mantraxC
They say that, but there's one giant missing ", because..." in that paragraph.

They never explain why they render in 10 sizes (why don't they know which
sizes they need), and why is loading one of those sizes on a mobile app not
feasible.

It's stated like that and left hanging in the air.

Either their use case is very weird, or they're deliberately vague for some
reason only they know about.

Also to clarify why 10 sizes is weird, normally what you'd do is double the
width/height of your images with every size (so quadruple the pixels), same as
one does with mip-maps. Or with icons.

Let's say the smallest sensible size is 128x128 (minimum needed to discern a
product, and easy to downsize from there, won't get much smaller at good
quality).

So we have 128, 256, 512, 1024, 2048, that's 5 sizes. And I'm pretty sure the
images they need aren't over 2048x.

So 10 sizes is just pointless.

And with OpenRoss, doing cropping and adding whitespace at the server,
producing pointless image duplicates, is senseless. Even the most crippled
client-side technique can do the cropping and whitespace for you (yeah, even
html).

This screams "we didn't think this through much".

~~~
Peroni
When the site was originally designed, there were only 4 or 5 sizes, one for
the feed, one for the product page, one for related products, one for
thumbnails, etc.

As the design changed, extra sizes got added to the image processing, and as
with most early stage start-ups, it worked therefore there was no need to fix
it.

We eventually ended up in the position of having 2.5M products all with
preprocessed images of certain sizes from our design history. As we wanted
more flexibility with design, but also knew that a lot of our images would
have a very low likelihood of being accessed (fashion items have single runs
and are never remade in future seasons), a big batch process didn't seem
appropriate. Additionally, it would mean storing several different copies of
images in our S3 bucket, even if we knew the product would not likely be seen
again.

A more attractive solution (at least to us) was to do the hybrid approach,
where we would resize on demand, and then cache for a long time. This way, we
only do the processing for images that need it, in almost a functionally
identical way to large scale batch processing, but the process is demand-led.

~~~
jondot
But the nice thing is - using a storage solution where you don't pay for what
you don't use.

------
bjt
Based on all the links to similar apps in this thread, writing an image-
resizing service to sit behind a CDN seems to be a right of passage. My
creation from a couple years ago:
[https://bitbucket.org/btubbs/thumpy/src](https://bitbucket.org/btubbs/thumpy/src)

It's still running happily in production.

------
johne20
I would love to see a golang version of this.

~~~
kawera
Maybe these:

[https://github.com/ReshNesh/pixlserv](https://github.com/ReshNesh/pixlserv)

[https://github.com/emicklei/karina](https://github.com/emicklei/karina)

------
nperson
You should definitely limit the sizes that can be generated otherwise with
URLs like
[http://host/WIDTH/HEIGHT/MODE/path/to/image](http://host/WIDTH/HEIGHT/MODE/path/to/image)
it's pretty easy to mildly DDOS you.

------
chaostheory
At first I thought OpenRoss was a new competitor to GraphicsMagick and
ImageMagick, but it's just a Twisted plugin. The title should be changed.

~~~
bpicolo
The title is fine. It just says image resizing. Image/GraphicsMagick are a
whole different ballgame.

------
waterlooalex
Looks like you built your own [http://cloudinary.com](http://cloudinary.com)
but with less features

------
mantraxC
At scale, processing thousands of highly granular items (images) in any way is
an embarrassingly parallel task even in its most crude and naive
implementation.

So claiming "fast", ok, but claiming "scalable" feels like a redundant
buzzword, a bit hand-wavey.

When you make such generic claims, be quick to explain what you mean, or
people will not treat you seriously.

~~~
TeeWEE
Yeah what i was thinking. There is no state to be managed. Its just a pipeline
which doesnt not anything more than more machines to handles more load. Its
relatively easy to implement.

