At globo.com we have near a billion images (we are a big portal). Can you imagine pre-generating that many images every time a new format gets added?
We serve everything with thumbor with a Varnish cache in front of it and we're very happy with it. It has enabled our designers to work with any image size they can think of.
If you guys need more info, please check thumbor's docs: https://github.com/thumbor/thumbor/wiki/
We had detailed how we scaled thumbor at Yipit last year: http://tech.yipit.com/2013/01/03/how-yipit-scales-thumbnaili...
The blog post doesn't mention S3, but we have a storage plugin that reads from and writes to S3.
We use it on a fairly large production site, and it works fine. If you throw it behind CloudFront or some other CDN that supports cache busting, it's plenty fast.
I was surprised how... latent downloading from S3 is, and how it didn't seem to matter if I was running on Digital Ocean or EC2, downloading the image from S3 was always the slowest part. (Digital Ocean was actually faster!)
It takes about a second to process an image, but it is cached in front of CloudFront similar to the article.
Any tips for improving the latency from S3?
You never know when you're going to need a new size (the introduction of retina images a few years ago doubled the dimensions of images you need to serve for example, and your design team is likely to come up with new size requirements occasionally as well) and backfilling to resize millions of images is a big pain and costs a lot in terms of storage.
22ms for a GraphicsMagick resize is quite fast. I'm curious what the average input and output sizes were used when computing that number?
https://imageshack.com/discover All of the images here are rendered with Imagizer; We store all originals in our HBASE/Hadoop cluser, while Imagizer does on-demand tranformations.
It works with non-imageshack links too:
An added benefit to caching to S3 is that since S3 won't run out of space, we can cache rendered images for longer (we use S3 lifecycle to keep cache expiration simple). The scaled images tend to be smaller than the source images, so the retrieval from S3 is pretty fast. Over the past week, retrieving scaled images from S3 has cost ~46ms versus ~84ms for the larger source images.
(Have nginx proxy_pass to S3 and image_filter the response)
It's still running happily in production.
You should pre-process images in your scraping cycles, and not when a client comes to request it.
In this way, your "scale" is always predefined, bounded, expected, and much smaller - defined by your scraping scale and not user scale.
They never explain why they render in 10 sizes (why don't they know which sizes they need), and why is loading one of those sizes on a mobile app not feasible.
It's stated like that and left hanging in the air.
Either their use case is very weird, or they're deliberately vague for some reason only they know about.
Also to clarify why 10 sizes is weird, normally what you'd do is double the width/height of your images with every size (so quadruple the pixels), same as one does with mip-maps. Or with icons.
Let's say the smallest sensible size is 128x128 (minimum needed to discern a product, and easy to downsize from there, won't get much smaller at good quality).
So we have 128, 256, 512, 1024, 2048, that's 5 sizes. And I'm pretty sure the images they need aren't over 2048x.
So 10 sizes is just pointless.
And with OpenRoss, doing cropping and adding whitespace at the server, producing pointless image duplicates, is senseless. Even the most crippled client-side technique can do the cropping and whitespace for you (yeah, even html).
This screams "we didn't think this through much".
As the design changed, extra sizes got added to the image processing, and as with most early stage start-ups, it worked therefore there was no need to fix it.
We eventually ended up in the position of having 2.5M products all with preprocessed images of certain sizes from our design history. As we wanted more flexibility with design, but also knew that a lot of our images would have a very low likelihood of being accessed (fashion items have single runs and are never remade in future seasons), a big batch process didn't seem appropriate. Additionally, it would mean storing several different copies of images in our S3 bucket, even if we knew the product would not likely be seen again.
A more attractive solution (at least to us) was to do the hybrid approach, where we would resize on demand, and then cache for a long time. This way, we only do the processing for images that need it, in almost a functionally identical way to large scale batch processing, but the process is demand-led.
I have a system I'm trying to sunset that has a 160 and 130 size. Totally redundant, and doesn't really save that much space/bandwidth.
Still, OpenRoss is a cool project to learn from. It might not fit many use-cases, but apparently it works for Lyst.
Bottom line, overtime you tend to understand that there is great flexibility in this type of architecture.
That's actually quite rude.
Facebook, Twitter and Google Plus all have a feature where a URL gets expanded out to a "rich preview" (Twitter Product Cards for example are particularly relevant to a site like Lyst: https://dev.twitter.com/docs/cards/types/product-card )
These all work best with an image that has been pre-cropped and resized.
As an added bonus, a new one of these emerges every now and then - with a different size requirement.
So claiming "fast", ok, but claiming "scalable" feels like a redundant buzzword, a bit hand-wavey.
When you make such generic claims, be quick to explain what you mean, or people will not treat you seriously.