Hacker News new | comments | show | ask | jobs | submit login
Show HN: Raw Image Storage for photographers using S3 and Glacier (picnib.com)
44 points by jjbohn on Aug 8, 2013 | hide | past | web | favorite | 49 comments

Couple of things...

First, I really dig the "Made with ♥ in Nashville, TN" footer. Way to have some pride. People need to pimp their non-SV residences more.

Second, I've been mulling over doing something like this. Since you're doing it already, please steal my idea. Make a connector for LightRoom and make it stupid simple to link a RAW with exported JPEGs.

If you're looking for Lightroom integration, mosaic (http://mosaicarchive.com) provides an almost identical service and already has a plugin.

But was mosaic made with love in Nashville? ;-)

I do IT support for a few working architectural photographers. Over the years I've thought about this too (at least, I've worked the numbers a few times). It's hard to make it work; they need a minimum of 8TB of storage for existing material, then growth by about 2.5TB yearly (to date, but most are about to start time-lapse video). Though, your cold storage approach is intriguing.

Napkin calculations:

  0.04  /GB * 1000 = 40.00 /month (curr) =  480.00/yr
            * 2000 =                     =  960.00/yr
            * 3000 =                     = 1440.00/yr

  0.004 /GB * 8000 = 32.00 /month (cold) =  344.00/yr + 48.00/yr/TB
                                                      + 100.00/yr/2TB
                                                      + 175.00/yr/3.5TB

  1 shoot = 25GB (stills only) or 50GB w a 30 sec TL (add 25GB)

  ~100 shoots /yr ==> 2.5TB/yr @ 25GB, 5TB/yr @ 50GB

[Aside, that's 5-12 mins to upload (25Mbp/s - 10Mbp/s, only 3 mins @ 50Mbp/s). So consider additional internet cost.]

Without frequently juggling current to cold, or doing yearly compounding calculations, I'd eyeball this at ~$1300/yr, with growth of $150/yr, each year.

One way to make this more attractive is to leverage the online storage to build other recurring-revenue generating services, but most photographers don't go for that type of thing. (In my experience, stock image sites, etc. are primarily the realm of hobbyists.)

You also have the trust/confidence problem. I suspect every photographer will want to maintain their own backup as well. So, they won't "save" money (even if online storage was less expensive, but I suspect it's not) because they still need to pay for local storage. Of course, for apples to apples, you need to factor in time to maintain those backups, either my hourly rate or their own; things like disk-testing, data juggling (downsizing 1TB -> 3TB drives, etc.). But again, it's still 'extra' cost, because they won't (IMO) switch to solely online storage.

A question:

Are you providing tools to allow photographers to leverage their online storage? E.g. Dropbox-like sharing with clients (even just contact sheets?), integration against online print services, store-fronts for image sales, etc.?

All that said, I'm interested in seeing your private beta though. I think this can work, one day. :)

Definitely want to let them leverage the storage eventually, including contact sheets, psd storage, etc. Starting with just the basics for now to see where it goes.

There was a story on HN a few weeks ago by a photographer wanting this kind of service. I found the discussion very interesting.

Is this in any way inspired by that? Or is this a coincidence?

Very much inspired by it. Wish I could find the post actually. I had some services running for my wife's photography business, but nothing that could really be released in any way. Seeing that other post though made me start taking the steps to turn it into a SaaS.

I believe the article you are looking for is here: http://paulstamatiou.com/storage-for-photographers

Discussion: https://news.ycombinator.com/item?id=6020969


There was some good discussion in this thread:

https://news.ycombinator.com/item?id=5673628 see orofino

And in an even older thread:

https://news.ycombinator.com/item?id=4619132 see akmiller

That's what I love about HN :) The minute I read the post I wondered how long it would take for someone to build it - turns out it takes 29 days.

I've been using the FastGlacier GUI client and paying $0.01/GB/mo. Of course Glacier pricing increases for retrieval, early deletion, etc. But there is only one middle-man (Amazon). On OSX, Arq looks like the best client.

We applied to YC with this idea last year. I wrote about it in the last thread on this topic:


Overall, your two biggest challenges are going to be convincing people they need backups (they still don't know, we tried) and pricing. Currently, Backblaze gives me unlimited backups for $5/month or less. With nearly half a terabyte of photos, it's cheaper for me to back up everything with them than use a service like this, which doesn't back up the rest of my stuff.

That said, I hope you guys can find a way to make it work!

Great posts. Thanks for sharing them. Will definitely keep all of what you said in mind.

I found your pricing to be unclear. You say it starts from $0.04/GB, but is this for the cold storage or active storage. If I uploaded 50GB of photos to "cold storage" immediately, what would be the monthly cost? What if I then uploaded another 10GB into active storage, what would the monthly cost be for 50 cold and 10 active?

I've signed up though. Email is similar to username ;-)

Yeah, I'm a pretty terrible copywriter. Let me see if I can make it more clear on the site. Thanks for the input.

I don't use Smugmug but what possible advantage does this have over Smugmug? Even their top tier plan with lots of features is only $300 a year. All their plans have unlimited storage. $300 a year would only cover 625GB at $0.04/GB.

Smugmug only backs up the jpegs you upload for display. If you want to store your RAW files on smugmug you have to use their separate smugvault service which costs $0.09/GB/month

ahhh ok, I didn't notice that :) I stand corrected

If you're content with the 2048x2048 pixel limit for your stored photos, picasa is ulimited and free. (You do have to sign up for Google+; without it, the pixel limit for photos that don't count towards the quota is smaller.)

And if you're content with JPEG only (I assume it's not RAW, though I have no real way to test it).

Though tbh I've begun recommending Flickr over Picasa. Better (significantly) 3rd-party app support, 1TB is more than the vast majority will use in a long time, and it's easy to default everything to private if desired.

I find working with Flickr more cumbersome if you only use the web UI. For example, Flickr has a limit of 200 photos that you can upload in one batch, which is a problem when you have a slowish upload link (ADSL/cable).

Yeah, there are definitely some problems. But then G+ has problems too. I've had marginally better luck with Flickr's uploader fwiw (faster, fewer fatal errors for no reason), and the browsing experience is slower.

But if you're uploading lots of photos there are lots of tools out there for only Flickr that put both web UIs to shame, automatically retry, etc. The Picasa application is... surprisingly decent, but I've had worlds of pain with its buggy syncing, and it gives off a feeling of almost-abandonware.

If someone open-source, encrypted backups at that price, I would buy it in an instant. What I want is, basically, rdiff-backup on EncFS, but I haven't found a way to hack the two to work together.

* Arc http://www.haystacksoftware.com/arq/

* JungleDisk https://www.jungledisk.com/

Arc supports encrypted backup to Amazon S3 and/or Glacier, works with occasionally connected external hard disks perfectly. I use this to backup RAW photos from Aperture stored on an external USB hard disk (at least 50GB worth).

Very nice, thank you! It looks like neither supports headless Linux, though...

Can you describe your workflow for aperture?

It's nothing too special. I use a separate Aperture library per year, each stored on an external hard disk. It's a Western Digital "My Book Edition II" with 2 x disks in RAID1 configuration for redundancy. This is configured to backup to Amazon Glacier using Arq, which is clever enough to only backup when the drive is connected and only uploads the new or changed files. Works quite effectively and once configured I don't really need to think about it.

I have a friend that backs up his photos to my computer using a combination of ecryptfs, rdiff-backup, and rsync. It goes something like this:

    mount -t ecryptfs ~/.photos-backup-crypted ~/photos-backup
    rdiff-backup ~/photos ~/photos-backup
    rsync -a --del ~/.photos-backup-crypted remote-server:photos-backup

That's pretty much what I want, but it requires you to have as much free space as the thing you want to back up.

What I wanted to do was mount EncFS in reverse mode (so it gets plain files and creates an encrypted virtual volume) and rdiff-backup that to a remote host, which works very well. However, the problem is that, when your directories are disparate, you can't easily make them appear as one directory, so, to back up 5 locations you'd have to do this 5 times.

I can fake it using AUFS or similar, but it's too large a dependency. Someone wrote an EncFS patch to let it follow symlinks in reverse mode (https://code.google.com/p/encfs/issues/detail?id=147), which would solve the problem, but I don't think EncFS is maintained any more.

Have you tried Duplicity (http://duplicity.nongnu.org/)? I use it for backups and think it's great. You can even backup directly to Amazon S3.

I have. Unfortunately, due to the way it encrypts files (it basically turns everything into one big, encrypted tarball) it needs to redo the whole thing from time to time, as diffs stack and it gets unwieldy.

Combined with a slow home connection, it basically never managed to upload all my files, and it could never resume. That's why I want to go the EncFS route, I don't care if file sizes and numbers are visible if it makes uploading resumable, sane and easier.

It's not just about metadata being visible. Duplicity is designed to work when the remote machine doesn't have the private key to decrypt your files. If you use EncFS instead, anybody who compromises the server with your backups can also get the key to decrypt them.

EDIT: Never mind, I just saw your other comment and learned about reverse mode. That's a simple but clever feature.

Nah, they can't get your private key if you transfer just the encrypted files. That's the whole point, really, but obviously they can get it if you mount it on the server itself (and thus disclose your passphrase).

SpiderOak provides encrypted backup for any types of files for around $0.09/GiB. When i signed up for a free account, i got a 10% off email a few days later.

I currently use SpiderOak, but I'm nothing if not paranoid, and they aren't open-source. Other than that, it's pretty good for backups, and quite cheap, too.

Yeah, you wonder why they don't open-source their front-end client code so that it can be easily audited to make sure the encryption is done right. It would go a long way to increasing some users' peace-of-mind. Then again, since it's all javascript, you could probably look at the html source, but it's probably minified and obfuscated.

It's not JS, it's Python, AFAIK. I talked to their CTO (I think it was the CTO) about it, who said that it was a big priority with them, but they couldn't open-source it yet. It would do a lot for my peace of mind, though, and I wouldn't look for alternate solutions if everything was okay with it.

Oh, so it's not a web-based client. Sorry, I confused it with Mega. To be honest, why not roll your own alternate solution? This one here is just a frontend to S3 and glacier. It shouldn't be too difficult to write a script that takes all the files in a directory, encrypts it with GPG or OpenSSL, and then uploads it to glacier.

Actually wait, yeah, that's pretty feasible. It seems that glacier pricing is free for low storage volumes. https://aws.amazon.com/glacier/pricing/

The problem is all the diffing and stuff. You can get half a terabyte of storage for $3/mo, so glacier support isn't necessary, but getting accurate diffing is hard (there's a reason why rdiff-backup is as large as it is, and duplicating the entire thing doesn't make much sense).

I would much prefer to write a small layer to glue EncFS and rdiff-backup together (and I started doing just that: https://github.com/skorokithakis/encrups), but I haven't managed to find an elegant solution for the directory aggregation yet.

Tarsnap too expensive?

Around 8 times more expensive than this, yeah.

Note that the $0.04 price is likely "cold storage" via glacier, which is cheap for write-only storage ($0.01/GB/month) but it's much more expensive than S3 for retrieval ($50/million requests to S3's $1) and there's ~4h round-trip to get data back according to Colin's old analysis of Glacier for Tarsnap: http://www.daemonology.net/blog/2012-09-04-why-tarsnap-doesn.... I'm guessing the data in cold-storage will not be interactible-with or visible in the interface.

Note that the second-to-last paragraph of Colin notes the possibility of tarnsap "glacializing" a complete machine dropping storage costs to 3~5c/GB/month, but it not be readable or deletable save by "deglazializing" the machine (15~20c/GB) to move the data to S3 for retrieval or deletion.

Yeah, the data I have is pretty much just old projects, photos, stuff like that that I archive for sentimental purposes, and that I never really need, but don't really want to delete either.

I'd be fine with something that's basically just "keep this data for me just in case all my other backup locations go down".

Looks good, I signed up. You only support photos?

Right now it's photo centric just to make sure we get that experience down. Eventually I could see it move to other areas though. My wife's a photographer and I saw a need for this kind of service that was focused on photographers. There's lots of other photo backup, but the only ones that do raw files are more syncs than anything. She needed a place to persist images she was done with in a cheap way.

I work for an advertising photographer. We constantly work with 3 local (seperate) copies of the same data. Would love to find a viable cloud service.

If we could test this that'd be great!

"With our pay-as-you-go pricing, you pay a base rate" LOL

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact