
Ask HN: Which file storage service would you recommend for the long term? - storer
[EDITED]<p>I have about 1TB of files right now, and I expect that to grow by 0.5TB every year.<p>My goal is to<p>* use a service for the next 50 years<p>* lose none of my files<p>* be able to add to my files every day<p>* be able to access any of my files any time<p>Which service would you recommend?<p>The ones I know of: [Dropbox, Google Drive, Microsoft OneDrive, iCloud, S3]<p>I&#x27;m not really tied to a platform. I have a Mac, a PC, and a Google account.<p>Right now I use Dropbox, fwiw.
======
kisamoto
I use Arq [[https://www.arqbackup.com](https://www.arqbackup.com)] which is
provider dependent and doesn't lock you into a specific cloud like Backblaze
or Crashplan.

You can set it up to sync to Glacier/S3/Dropbox/GDrive/OneDrive/Local NAS/FTP
server or any combination of them if you want to have multiple copies.

I get 1TB of OneDrive free with Office365 so backup to that but also to S3 so
that if I was to stop using Microsoft I'm not tied down.

Has unlimited upload rate; never deletes files; good restore client; can be
bought with unlimited upgrades for $60 and at 1TB (without taking into account
accessing the data) on S3 you'd pay $30/month | $24/month reduced redundancy
or $10/month on glacier for less frequently accessed data.

Because it's a bit more expensive I only tend to put absolute essentials (like
photos) on S3. Arq does also compress to help save some cost here.

Worth mentioning that Arq also encrypts everything client side before
transmitting so everything is stored encrypted as well. Might be important for
you.

~~~
jwr
Another vote for Arq if what the OP wants is backup. This lets you pick and
switch storage providers. I don't think you should plan on staying with any
provider for more than 2-3 years, the landscape changes too fast. Arq lets you
be independent.

~~~
copperx
Aren't the backups now dependent on (1) the storage provider and (2) the Arq
software developer?

You lose Arq, you lose your backups.

~~~
tfe
The format Arq uses is open and documented, so that you can restore without
having Arq if necessary:

[https://www.arqbackup.com/s3_data_format.txt](https://www.arqbackup.com/s3_data_format.txt)

And there are tools to help:

[http://sreitshamer.github.io/arq_restore/](http://sreitshamer.github.io/arq_restore/)

------
jkot
> My goal is to ... lose none of my files

Duplicates. Perhaps two or three online services, plus local disk array at
home. Check data integrity every year...

Every decade you will have to swap service or two, as companies evolve.

~~~
balu_
Any idea how to cleverly check data integrity over some distributed services?
(without downloading everyting)

~~~
tobylane
SSH in and do the checksums yourself.

------
Jedd
How much 'be able to access my files' do you need - constant refreshing, or
just the ability to recover? Glacier will give you the latter, for a price,
but is designed to be something that you only read from during an emergency.
Writes / updates, and storage, are very cheap.

Disclaimer - I work for Riverbed, who used to have a lovely product called
WhiteWater, subsequently renamed SteelStore and sold to NetApp. This is a
backup / archive system that uses whatever 'cloud' (not a word I'm fond of)
backend you prefer. Data is de-duplicated at ingress, and encrypted at rest.
It would require a local VM, but that's a small matter these days. No idea on
pricing.

If you're at all inclined, and primarily to address the '50+ years'
requirement, you could look at rolling your own solution. Architecture would
be (initially) something like a RPi with a 3TB drive - bought and installed at
(at least) two friend's places. Perhaps a contra-deal arrangement with them.
VPN for security & convenience. LUKS for at-rest encryption. Something sitting
on top of rsync (over SSH) for in-transit encryption + file-level
deduplication -- I use dirvish[1] for archives that are too large / too binary
blobby for git. I've got two family members already doing rsync (using
unison[2]) backups over the net to VMs I host that then get dirvished.

[1] [http://www.dirvish.org](http://www.dirvish.org) [2]
[http://www.cis.upenn.edu/~bcpierce/unison/](http://www.cis.upenn.edu/~bcpierce/unison/)

------
junto
I use a Synology NAS drive that backs up to Amazon Glacier for an expensive
(but invaluable) restore if my NAS drive goes up in a fire.

Synology gives me my own private cloud (all major OS and mobile clients) and
loads of other benefits as well.

I'm very happy with it.

------
yellowapple
> use a service for the next 50 years

Probably won't happen. Businesses come and go. Unless you want to be
constantly shuffling file storage backends, such a plan will probably end in
disaster.

If you want something with the requirements you describe, you'd be better off
self-hosting. I'd set up two servers - one in your house or office or what
have you, and a second in some further-away location (perhaps a relative's
house). The "servers" could be as simple as ordinary desktops with a bunch of
Western Digital Red drives. Pop your operating system of choice on them, set
them up with some way to transfer files (I personally just SFTP important
files to an OpenBSD box; once I find a suitable secondary location, I'll
probably set up another one and have the first one pump the contents of its
backup folder to the second one on a daily basis with something like rsync),
and be done with it.

The downside is that now you have to manage the hardware side _and_ the
software side of things, but at least now you have some assurance that - so
long as you actively maintain your backup server(s) - your data will be safely
backed up.

------
Veratyr
What to use depends on your use case. Are you looking for backup or are you
looking for sync and 'always there'?

For the former, I'd recommend using Amazon's Glacier or Google's Nearline
storage.

If you intend to retrieve your files often, Nearline will do better as the
costs are lower and retrieval is faster. You'll likely want to wait until it's
out of beta first though, which might take a while.

If you don't intend to retrieve them often, Amazon's Glacier might be a good
choice.

There are also services like CrashPlan and Backblaze that are definitely worth
looking into. They're definitely easier to use and CrashPlan at least gives
you the option to encrypt.

For the latter use case, I'd recommend Google Drive. You can get a ton of
storage through Apps for Work Unlimited and there's very little chance of it
disappearing in the near future with Google's apparent goal of moving all your
data into their cloud.

~~~
storer
I'm wary of Google products, because they seem to go through radical changes
or get killed off pretty regularly.

EDIT: same with iCloud (or MobileMe or whatever it was before that)

~~~
blfr
I highly doubt Google would close a data storage service (like Nearline or
Drive) without ample time for users to move elsewhere.

------
yarone
I'm a fan of Backblaze. Been using them for about 6 years. They meet all of
your requirements. They don't make it particularly easy to access your files
from another device (not a great cloud interface like Dropbox), which is
probably one way they keep costs low ($5 per month, unlimited storage).

~~~
leetNightshade
One negative of Backblaze is that they still don't support Linux. I'm not a
huge Linux user, but I'd like to have the option, otherwise I'm seriously dis-
interested in the service. The more open/options the better.

------
rip747
For my personal stuff I just use Google Drive. I don't think that Google is
going to ever abandon it since its tied into all their products as the storage
solution.

For my servers, I've been wanting _something_ that I can use Amazon S3 with.
Just did a search and found out about CloudBerry
([http://www.cloudberrylab.com/](http://www.cloudberrylab.com/)). This is
perfect since I also have a NAS and scripts that backup the server nightly.
With this I could essentially just install and repoint the existing script to
the new drive. Also, S3 has built in versioning which is really nice.

~~~
Teckla
I had trouble with Google Drive on Windows. The Google Drive process would
frequently take 100% of a core even if just a single 1k file was in the Google
Drive folder. It's broken.

This really bummed me out because I was evaluating it before sending Google my
money, but I have concerns about the quality of their client. Apparently this
has been a problem for years.

------
sertys
I can recommend using Filement(filement.com) for that. It would be a sync
interface with you and your data. While as you can add and manage your
devices, you can also add your cloud accounts so you can move your backups
around and intensify diversification. That said, 50 yrs is a huge amount of
time in tech and guarantee is hardly to be held against any business venture
or even protocol. But since you have not specified a requirement for
automated, non-assisted backups over the time course, i believe that process
will be organic and your data will find its way to the future hopefully
intact.

------
msamir
for a person with TB of data and increasing; your condition of 50 years
service guaranteed in the future is strange.

I may suggest deploying your own service using
(owncloud)[[https://owncloud.org/](https://owncloud.org/)] on your own
dedicated servers

~~~
hrehhf
It worries me that [https://owncloud.org](https://owncloud.org) seems to
support only TLS 1.0, not 1.1 or 1.2. I can't speak about their software, but
it is not a good sign to see the security warning when I first go to their
site.
[https://www.howsmyssl.com/s/about.html](https://www.howsmyssl.com/s/about.html)
considers TLS 1.0 "Bad".

Edit: in their favor, at least they do provide GPG signatures for their
downloads.

------
leejoramo
The best way to plan for 50 years is to continually have a 1, 5 and 10 year
plan.

Your 50 year requirement is the real tough problem here. I would have a hard
time recommending any of the types of services your are referring to for more
than a year at a time. To achieve anything beyond that you need to look for a
service that is VERY standards based and portable.

You do not mention a need for "backup/restore" or "version history. So lets
treat sync & cloud access separately from backup & versioning.

File Sync and Cloud Access

I think any thing that syncs local files to the cloud will do as long as you
don't rely on special features of a specific service. This sort of means that
iCloud and S3 are out of the running since neither of them is a cross-platform
file directory syncing system.

You could keep copies on one or more systems that you control and using the
service to provide sync and cloud access. Then if you decided that the service
that you are currently use no longer meets your needs, or is out of business,
you could easily switch to a different syncing/cloud service.

Today I would say that Dropbox is the best solution. Personally, I am
beginning to replace Dropbox with BitTorrent Sync and keeping a close eye on
syncthing.net. Although, BT Sync doesn't have a web interface, I consider that
a plus for privacy, and BT Sync's mobile apps and built in sharing features
replace most of what I need from Dropbox's web interface.

Backup/Versioning

This is where it gets really tricky looking out 50 years. I simply can not
recommend thinking out that long term. I think you can look out 10 years at a
time, and then make choices for the next 10 years.

Something like rdiff-backup with would last at least 10 years, but it doesn't
have the easy to access interface. (last I check the radio-backup web
interfaces where not very good).

Personally, I use a combination of diff-backup and Crashplan pointed at local
and remote storage. CrashPlan gives me remote access via the web and apps, and
a professionally managed data store. (I feel that while I don't trust others
with protecting my data, I don't trust myself 100% either). I also use Arq
(with now has a Windows client, and provide an open source code for doing a
restore) and backup to a remote site.

[https://www.crashplan.com](https://www.crashplan.com)

[https://www.arqbackup.com](https://www.arqbackup.com)

[http://www.nongnu.org/rdiff-backup](http://www.nongnu.org/rdiff-backup)

[https://syncthing.net](https://syncthing.net)

[https://www.getsync.com](https://www.getsync.com)

~~~
acdha
> The best way to plan for 50 years is to continually have a 1, 5 and 10 year
> plan.

This is the key point to focus on: you don't actually have a backup plan if
you don't regularly access the stored data – until then, it's just hope.

I like the CrashPlan + Dropbox approach _but_ there's a key gap: you need
something else to check strong signatures (SHA-256/512) on data to avoid
scenarios where a bit flips on one of your computers, the file is updated and
you don't notice it for a long time, which is really easy for data which is
infrequently accessed or where the client programs typically cover minor flaws
(most image/video/audio formats).

My favorite example is a JPEG-2000 file which had corruption in the middle of
its tile hierarchy – either a thumbnail or full 1:1 zoom looked fine but
something like a 50% zoom would show corruption. You'll probably never find
something like that without automated checksums until it's too late to recover
a copy from somewhere else.

What I'd really like is something like a mature large-file optimized Git repo
where there's a strong, checksumed history and it's immediately obvious if a
remote version is out of sync.
[https://github.com/bup/bup](https://github.com/bup/bup) and [https://git-
annex.branchable.com/](https://git-annex.branchable.com/) are very
interesting.

Archive Team is trying git-annex as a way to back up the Internet Archive
globally, which will likely lead to changes helping that tool become solid for
general usage:
[http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK](http://archiveteam.org/index.php?title=INTERNETARCHIVE.BAK)

------
ic-junk
Another service for $5 / Month Unlimited Cloud Backup is
[https://www.backblaze.com/](https://www.backblaze.com/)

I haven't used it for accessing individual files from backup, just whole drive
backup.

~~~
sp332
It's a little clunky to restore individual files (at least it was a few years
back when I used it), but not horrible. I'd be worried about losing files just
because it doesn't have an infinite history. If something corrupts your data
and you don't notice, the good version will be gone in a few weeks. Over 50
years the chances of that happening add up.

------
DyslexicAtheist
find a service where encryption keys belong to you. I've tried wuala but their
java client sucks. after I exceed my storage the client didn't prompt me to
upgrade but told me that I was now locked out of wuala and my service
suspended and went into an infinite loop forcing me to exit the X-window
manager and kill the process.

Also google drive has become annoying (e.g. I upload a ripped DVD movie which
I own and paid for and share it with my son who lives in the same house,
google tells me that I'm sharing copyright content (wtf)) So they definitely
look at content and behave like big brother.

------
zatkin
Based on your criteria, it sounds like you could use physical hard drives to
backup to, rather than a service. You should store your files onto an external
hard drive and keep the external hard drives in a safe place.

~~~
mikro2nd
...multiple copies in mutiple places

~~~
zatkin
That wasn't part of his criteria, but I understand what you're getting at.
Personally, I wouldn't do physical hard drives, but I know others who have
done it (and done it well) and are happy.

------
philippnagel
Haven't used it, but seems to fit your use case:
[http://aws.amazon.com/de/glacier/](http://aws.amazon.com/de/glacier/)

~~~
storer
Seeing as I haven't heard of it, it doesn't seem to meet my criteria of being
popular enough to be likely to be around in 50 years.

EDIT: It's Glacier I haven't heard of, not Amazon.

~~~
Veratyr
Dropbox (which you have heard of) is built on top of Amazon S3. Glacier is a
sibling product to S3 and similiarly unlikely to disappear.

~~~
akhatri_aus
Dropbox doesn't use S3 anymore. They did initially. That being said S3 is a
pretty good solution.

~~~
kooshball
What do they use now?

~~~
toomuchtodo
Possibly OpenStack's Swift. Its API-compliantish with S3.

------
sarciszewski
Have you considered S4 by Least Authority?

[https://leastauthority.com](https://leastauthority.com)

------
PauloManrique
I suggest OneDrive. With a cheap Office subscription you get 1TB and can sign
for unlimited space at preview.onedrive.com

