
How Imgix Built a Stack to Serve 100k Images per Second - movielala
http://stackshare.io/imgix/how-imgix-built-a-stack-to-serve-100-000-images-per-second#stackshare-weekly
======
jawngee
I love imgix to death, really the only web service I use daily that constantly
amazes me.

I've been using them for a few clients now and it's been a fundamental shift
in how we think about media management in the CMS. Rolling out new features
with different image size requirements using existing media doesn't require us
doing anything other than changing a few parameters in the URL. Before, we'd
have to run processing tasks on a fleet of EC2 instances and wait a few days.

We compared a few similar services, but none of them really got the flow right
like Imgix does. Some require you to setup the parameters beforehand or define
everything in their web interface or via a REST api. And quality wise, none of
them are near as good as Imgix.

I can't sing their praises high enough honestly.

~~~
boundlessdreamz
> Before, we'd have to run processing tasks on a fleet of EC2 instances and
> wait a few days.

Few days? Why?

~~~
ddorian43
2 ways to do thumbnails. 1 Precompute(what they did) and 2 Compute on
demand(imgix)

Of course the best is to do something in the middle.

~~~
jawngee
Imgix is the middle, they process the parameters once and cache the result.

~~~
ddorian43
That is just caching.

Middle is: You precompute some fixed presets(like him, lower latency) but also
have the ability to generate on demand if a preset doesn't exist(used when
adding new presets, no reason to wait *days to have the new preset).

~~~
skuhn
Once you've built a system that works fast enough to handle image rendering
work on-demand, then adding batch or precomputational features is largely just
creating requester logic to ask for the image before the real demand arrives.

------
tacos
This company repeatedly sets off my B.S. detector with its technology story.
Here's a previous article where they didn't have their schtick quite so
polished:

[http://www.wired.com/wiredenterprise/2013/05/imgix-
graphics-...](http://www.wired.com/wiredenterprise/2013/05/imgix-graphics-
card-internet/)

Believe it or not: Google already had awesome, high-performance image tech.
YouTube re-encoded every _video_ in 2010 as a favor to Apple. Meanwhile these
guys are saying Google couldn't make _thumbnails_? So they had to do a startup
using the GPUs in Mac minis?

“We just couldn’t crunch those images down to a smaller size,” Zacharias
remembers. “It would have taken a significant amount of Google’s entire
processing power just to do that.”

Preposterous, dishonest claim.

I looked at this back then and discovered:

Google provides it through App Engine. Adobe provides this through Akamai.
[http://cloudimage.io/](http://cloudimage.io/) does it through Rackspace CDN.
[http://cloudinary.com/](http://cloudinary.com/) does it through CloudFront.

That said Imgix seems to provide a stable service people enjoy. Perhaps they
should sell that instead of these oddball tech claims. Because what they're
doing isn't hard or novel. They sure are making it complicated, though.

~~~
15155
> They sure are making it complicated, though.

Seems like a large degree of it is marketing-related.

"Any intelligent fool can make things bigger, more complex, and more
violent..."

------
discardorama
> An early Facebook photos engineer mentioned he had seen remarkable
> performance and quality coming out of the Apple Core Graphics stack .

What does the Apple stack have, which can't be done on a Linux box with a
high-end GPU like a K20 or a Tesla?

~~~
tacos
[Talking purely CPU] You can resize 100k images a second on a 2013 laptop.
[Source: the Google guy who serves your images.] A couple ARM cellphones could
meet this entire company's [CPU] image processing needs.

Think of what YouTube is doing for videos in the time you read this sentence
if you want to put their silly claims in perspective.

Anybody remember when Steve Jobs said the 400Mhz PowerPC G4 Altivec was a
supercomputer? But the only code that showed up was a Reverb that took a
weekend to run and a couple FFT libraries that had accuracy problems? Good
times.

~~~
duaneb
100k images a second is definitely more than my laptop can even read off a
disk in a second. Unless they're already small images.

------
bpicolo
My middle-click is broken on this site and doesn't open things in a new tab. :
(

------
micheljk
> "In most cases, your images will traverse the entire stack without ever
> touching a hard drive."

> "Our fetching and caching layers are largely custom built, using MogileFS"

Could you specify the use case of mogilefs in your project, is it some sort of
"cold storage" for the fetched image data? Can you provide more details -
amount of trackers, storage-nodes, storage-size, ... ? Do you also consider
using a ceph cluster in future?

~~~
skuhn
Commodity CDN's will always have some amount of cache misses. For most
services that's fine; there isn't too much additional overhead in serving a
static Javascript file a few times. If you're doing computational work for
each request, you want to try to avoid unnecessary work as much as possible.

It's helpful for us to hold onto master images after fetching -- these images
can be large and may not be served to us from a particularly fast source. The
speed of light does come into play, so anything we can do to shorten distances
is beneficial. This cache layer respects the Cache-Control headers of the
original objects, so the behavior there is under our user's control.

There's a similar need to cache rendered output ("derivative images"), because
while most operations are very fast they do still add a little bit of latency.
This helps with our rendering capacity, but it's really about improving time-
to-serve latency wherever we can.

We evaluated a few open source object stores and decided on MogileFS. It
largely came down to the simplicity of what Mogile does and how it does it. We
figured that we could rip pieces out and adapt it to our needs with a lot less
work than more elaborate systems (like ceph). With that said, we didn't
actually do an implementation test with ceph, so it may in fact be great for
our use case.

Not sure off-hand about the number of servers deployed for use with Mogile,
but it isn't a gigantic deployment. We aren't storing anything that can't be
re-acquired, so we can adjust our storage to hit particular cache hit ratios
rather than storing the enormous long-tail forever.

The first generation of storage servers are 16 core / 32gb / 6x4T systems --
more like what I would use for computationally related storage like HDFS than
a large object store. We could stick with this hardware spec, or we might move
to SSDs if latency becomes an issue (I personally doubt it will). We might
also move to more dense systems, with more like 20-30 drives. That's all based
on our metrics and how things perform in practice.

------
dheera
I wonder if one could slash costs by building a stack on top of Imgix that
concatenates several "master" images into a single "grandmaster" image and
then just serve them up individually with cropping operations ...

------
tyho
How does the cost of Mac Pro's compare with more traditional servers?

~~~
bluedino
Cheaper than re-writing and re-testing everything to work on Windows/Linux +
whitebox servers.

~~~
tacos
They're using code that they cannot license elsewhere. They're making OS calls
to process the images.

------
fasteo
Somehow disappointed by the title. This is just a list of software they use.
IMHO, that's not a "stack" article in my book.

>>>> The load balancing and distribution layer is based on custom C code and a
LuaJIT framework we created called Levee. Levee is capable of servicing 40K
requests per second on a single machine

Is "Leeve" open sourced ? It would be nice to see this source code and it
would had been great to include some discussion on why they chose to implement
this instead of using some other off-the-shelf LB solutions (like HAProxy or
Openresty which are also mentioned in the article). This is the kind of things
I like seeing in a "stack" article.

~~~
skuhn
This is pretty typical in my experience for StackShare posts. For example:
[http://stackshare.io/500px/how-500px-serves-up-over-500tb-
of...](http://stackshare.io/500px/how-500px-serves-up-over-500tb-of-high-res-
photos)

imgix is very interested in open sourcing things we create, and there are a
few smaller projects at [https://github.com/imgix](https://github.com/imgix)
now. Hopefully Levee will join them in the future, once we have reached a
certain point in development.

------
sul4bh
From the article: "Levee is capable of servicing 40K requests per second on a
single machine."

How are concurrent connections simulated for testing?

