
Ask HN: Service Idea: Cloud based image hosting - al_james
Hi there.<p>I run a medium size travel blogging site. When I started developing the site, I did not give much thought to dealing with user uploaded photos. I simply saved them to directory on the server and resized them into the correct sizes at upload time.<p>However, when the site became bigger, photos actually became a bit of a pain in the butt:<p>* Its a pain to synchronise across multiple servers<p>* If you change the sizes of the photos in your templates you need to regenerate all the images to the new size (hoping you kept the originals!)<p>* A web server set up to serve images efficiently is different to one set up to serve dynamic HTML efficiently. Running two web servers with limited resources can be a pain.<p>Of course, most of these go away by using something like Amazon S3. However, you still have the problem of making sure your images are available in the right sizes, file format conversion etc.<p>I propose a cloud based service to make this all easier for developers. Something that allows you to use a simple HTTP API to post images to. You can then request the image back in whatever size you want. Other image process functions could be supported (e.g. cropping, black and white, saturation) etc...<p>So say, you save an image to the service with file name 'image1', you could form a url to apply these changes:<p>http://fictional-image-sevice.com/myaccount/image1?crop(50,50,600,400)&#38;size(300,200)&#38;black_and_white&#38;format(jpg)<p>Would return a cropped, resized black and white version of the source image as a JPG (regardless of original file format).<p>This would remove the headache of having to worry about image resizing or processing.<p>It would be a low cost service. It would use amazon s3, so storage and bandwidth would be the same. It would simply charge a fee per 1000 image requests.<p>(Really simple) client libraries would be available in all popular languages.<p>I have a really fast implementation of this image 'proxy' written in node.js that can handle these transformations, and caches the results in memory and then disk to speed it up. So it would be fast. Eventually it may be possible to offer a geo-aware version that downloads from a local server.<p>Any legs?
======
jacquesm
I built _exactly_ that about a year ago, serving up billions of images (not
billions of different images but billions of requests) every day.

The way I ended up doing the scaling (we didn't need any other format
conversions) was using the retrieval URL and a 404 handler that is smart
enough to be able to access a list of 'allowed' sizes (so that doesn't become
a potential attack vector). So if you access a file in a size that hasn't been
made yet it gets created on the fly.

The whole thing has been up and running across 9 servers for a year now, it
has triple replication and a bunch of varnish servers on the front end to make
it fast.

We have two ways of putting data on there, one through an API that accesses
the servers directly, another using a queuing mechanism.

To improve the legibility of the urls we used a virtual path rather than a
bunch of parameters.

so
[http://mycdn.com/storage/client/format/id/id/id/id/id/id/id....](http://mycdn.com/storage/client/format/id/id/id/id/id/id/id.jpg)

where the 'id' bits are 2 digits from the image identifier.

The nodes have 4TB storage each. Originally we used XFS but deletion was too
much of a bottle-neck so we ended up switching the system after it was already
live to EXT3, which improved performance quite a bit.

I'm sure that if you build this 'properly' (as in nicely abstracted, multi-
user, with redundancy by using multiple locations and so on) that there is a
market for it but I'm not sure how big that market would be.

So yes, this probably has legs.

~~~
ritonlajoie
Is what you built public or private ?

~~~
jacquesm
Private. Building this taught me a lot I of stuff that I thought was 'easy' is
actually pretty hard when you need to do it often enough :)

I always thought live video was hard, it turns out large numbers of images is
actually _much_ harder. That really surprised me.

------
arfrank
Google App Engine just released something similar to this a few weeks ago:

Announcement: [http://googleappengine.blogspot.com/2010/08/multi-tenancy-
su...](http://googleappengine.blogspot.com/2010/08/multi-tenancy-support-high-
performance_17.html)

Docs:
[http://code.google.com/appengine/docs/python/images/function...](http://code.google.com/appengine/docs/python/images/functions.html)

It could be set up to do image resizing on the fly per URL parameters you pass
to it, and storage/bandwidth is cheaper than S3 if I recall correctly. It's
based on the same infrastructure as Picasa.

Edit: In fact it could be easily used to create such a service rather than
having to build out the functionality oneself.

~~~
al_james
Thats _very_ interesting. Thanks!

~~~
arfrank
No problem, let me know how it goes, my email is in my profile. You question
peaked my interest in building such a service on top of GAE with just basic
API access and billing for usage. As long as you cover the hosting cost Google
charges you, it'd seem to be relatively straightforward. I'm just not sure if
there is enough control built in to determine what bandwidth went where.

~~~
nl
GAE (can) serve images out of the blobstore, so you can monitor statistics
when you tell it to get the image out. Docs are here:
[http://code.google.com/appengine/docs/java/images/overview.h...](http://code.google.com/appengine/docs/java/images/overview.html#Transforming_Images_from_the_Blobstore)

(I expect you'd probably want to use memcache to cache images rather than the
blobstore everytime, though)

(Note that you pretty much _have_ to keep the images in the blobstore because
you don't have filesystem access. You might be able to keep them in the
datastore if you wanted, but those are the only two AppEngine options)

~~~
al_james
Hmmmm.... The 1 MB limit in and out of the image service could be a problem.

~~~
nl
The image service is nice, but really only needed if you want to do
transforms. You can serve raw image data out of the blobstore.

------
BobbyH
Wordpress does this, but rather than serve the images from s3, it uses s3 to
store the images and populate a self-hosted varnish cache:
[http://blog.apokalyptik.com/2007/10/10/so-you-wanna-see-
an-i...](http://blog.apokalyptik.com/2007/10/10/so-you-wanna-see-an-image/)
This reduces the s3 bill by an order of magnitude
(<http://ma.tt/2007/10/s3-news/>), so you may want to consider this approach.

In fact, using this approach, you could use s3 (just storage) to undercut s3
(storage+bandwidth) on cost and get lots of customers! I'd be in! :-)

