

Ask HN: Block based cloud storage application. Promising - or not? - ahaslam

I've built a fairly complete prototype of a block-level virtual disk backed by commodity cloud storage.  
It provides robust, efficient, zero-maintenance offsite storage ie. it delivers data security.<p>It works by implementing a massive virtual disk volume (up to 256 Terabytes) formatted with the platform filesystem-of-choice - NTFS/Windows only at this stage.<p>Block updates are encrypted and compressed on the fly then marshalled and snapshotted before exporting to Amazon S3 (but could be any cloud storage provider).<p>From the users perspective, they've just added a massive internal hard disk accessible via the standard fileystem access API's and tools.  
One way of thinking about it is as a cloud-backed-TrueCrypt but with full-volume versioning, thin provisioning and compression added.<p>Its primary benefits are:<p>* data security (offsite vs local, cloud storage robustness vs hard disk robustness)<p>* the ability to map and manage storage capacity many orders of magnitude larger than the local storage capacity of the installation platform ie. 100's of Terabytes of storage available on a tablet.<p>* it offers a transparent cloud storage gateway - users can leverage of cloud storage via the familiar storage disk model<p>As opposed to the folder/file representation implemented by the the webdav/fuse offerings, the block-based model retains all of the native filesystem attributes associated with the users files are retained (permissions, encryption, compression) as well as other filesystem metadata (journalling, quotas etc).<p>Bandwidth and storage overheads are surprisingly low: An empty 256 terabyte NTFS volume still requires a 600mb filesystem metadata overhead, yet that formatted volume can be represented using just 150kb of bandwidth/cloud storage initially while subsequent snapshots to the same volume only incur ~40kb each.<p>So it can readily and efficently perform regular fine-grained snapshots on user's data, consuming bandwidth and storage in the same ballpark as file-based technologies.<p>So that's what it does - here's where I am at the moment:<p>When I first started on it (quite some time back), my initial thoughts were to release it as a pro-sumer product or as a freemium service.<p>What I have come to realise is that even though tools like DropBox and ZumoDrive are not focused primarily on data security (ie. they deliver sharing and collaboration), the level of mindshare these services enjoy makes marketing any sort of cloud storage offering an uphill task, especially for a startup.<p>So I have had cause to stop and re-evaluate whether I should push on or drop it.<p>Consequently, I'm after feedback:<p>Is this approach/technology/product promising or a dog?<p>Are the benefits of full-fidelity data compelling enough to differentiate it?<p>Is it just a solution looking for a problem?
======
TuaAmin13
My $0.02 as a storage administrator (primarily NAS)

It sounds like your "data security" isn't my data security. -"robust,
efficient, zero-maintenance offsite storage" -"offsite vs local, cloud storage
robustness vs hard disk robustness" When I say data security I mean that my
data is encrypted and locked down six ways from Sunday. "Robustness" isn't
security. "Offsite" doesn't mean security. I've got turnstiles, card readers,
locked cages, and passwords standing between you and my local data. "Cloud
storage robustness". That's security from failures, not necessarily security
from hackers which is what data security means to me. You say it's encrypted
but there's no mention if this is encrypted storage or if it's encrypted
information transfer.

Other than you needing to re-evaluate your buzzwords, it sounds like you have
a cloud SAN rather than a cloud NAS. That does sound different enough that you
could definitely have a niche to pursue. There are programs that work better
with SAN storage than NAS storage, but at some level I'm wondering: If I'm
running my own MSSQL server (or something else I'd host in house that needs
SAN storage), why would I pay for a local server but remote storage? Why
wouldn't I just have a remote server with remote storage or a local server
with local storage? I'm not immediately seeing the use case, perhaps you can
paint me a word picture. Sure 256TB is great, but if you need 256TB you
probably have more than enough money to buy your own storage or you have some
crazy financial regulations or something that would require you to keep it in
house. On the smaller scale, I run in to the "Why am I paying to keep a local
file server around to distribute this remote block device?"

Again, a solid use-case or pain point may turn me in to a believer of why I
need this product. From what you've described I'm not seeing it, but I like
the technology potential.

~~~
ahaslam
I also like the technology potential - the question I am asking myself is what
use to put it to where there is real demand.

Yes - it is a cloud SAN. And it could easily be bundled with a software iSCSI
stack in a virtual machine image as a virtual cloud storage gateway appliance
for private datacenters. But for enterprise, I don't see the use case as
primary storage ie. as the underlying storage for a DB. More likely as 'near-
line' or archival storage where it is retained in a form that is readily
mounted and accessible using standard FS api's for indexing, data mining etc.
As I've been focused on consumer, I haven't done enough research into the
comparable cost against other archival mechanisms like tape.

When I built it, I guess I was inspired by addressing my own pain points: I
didn't have a flexible, easily interfaced data protection system that
automatically made sure I had a copy of my files offsite. Like having the
equivalent of a massive offsite external USB drive. I also wanted a solution
that kept my files 'live' - rather than locked up in some arcane backup
application image.

* Thanks for the feedback on the terms around security - will take this into account.

* Encryption is AES 256 on-the-fly - so it's encrypted before it hits local storage or remote storage - and it uses SSL for the transfer.

* The 256 Tb capacity is really just a way of moving the upper limit on capacity so that users don't have to worry about exhausting the available space in the volume - something that file-based solutions don't need to worry about. Users can choose whatever size volumes they want.

~~~
TuaAmin13
For the case of near-line storage I believe this would work. I understand why
you would need this in that case.

At that point you'd simply have to address pricing. With some napkin math I
just paid $15/TB/month (for duration of warranty) for near-line storage. I'm
taking liberties here with assumptions (no power or cooling but this offsets
the cost of longer transfers to the cloud vs internal to your data center).
I'm also not factoring in personnel costs.

For consumers I have absolutely no idea what the pricing would be like.
$15/TB/month seems _cheap_ and I would probably be expecting something between
25 and 100GB for that same price if you told me what your product does.

~~~
databace78
Impressive technology...but there are a number of products that do what you've
built (or come close). Someone already mentioned Nasuni.

TwinStrata might be closer (block oriented I think, iSCSI interface).
<http://www.twinstrata.com>

I think Gartner refers to these products as "cloud storage gateways." Also
known as "hybrid cloud storage." Google either of those to get a list.

Good question about the pain points TuaAmin. Those usually include: a) never
having to run out of storage again b) never having to deal w/ tape backups
again c) knowing you can recover data to anywhere in the event of a disaster

------
amock
Have you seen <http://www.nasuni.com/> ? It sounds like your product is very
similar.

~~~
ahaslam
Haven't seen that particular one - thanks. Nasuni is actually a NAS filer but
yes, the underlying benefit is the same. I haven't seen a SAN version yet
though.

I believe this class of cloud-backed SAN/NAS products fall under the umbrella
of 'hybrid' storage appliances, acting as gateways to easily distribute your
data from a private cloud to the public cloud - as well as accelerators by
virtue of their caching capabilities.

------
persona
maybe <http://bitcasa.com> would like the technology if they already don't
solve all those problems...

