Do all these services provide equivalent or abstractable guarantees? Amazon S3, for example, provides 'read-after-write' consistency, meaning once you've received a positive response to a put operation, you can expect to be immediately able to retrieve that object. But it used to be 'eventually consistent', meaning it was possible to receive a positive response to a put, but then not be able to read the object immediately after.
Similar guarantees are needed around how soon after deleting something can readers expect to get 404s.
If these guarantees differ, you might find abstracting over the stores doesn't work the way you'd like...
That is pretty old, considering what S3 has changed in the past two years. In particular, it talks about reading S3's us-standard region from the East and West coasts, and that has evolved to where us-standard (which is really us-east-1) is now the same as all other regions when it comes to consistency (presumably because all requests physically go to the East coast now, but I haven't tested that):
Per that AWS page, you might want to add an asterisk that you lose read-after-write if you make a GET or HEAD to the keyname before your first PUT. So checking if the object exists will punk you down to eventual consistency.
S3 only has read-after-write consistency for PUTs to new objects (not overwrites), and if you do a HEAD/GET before the PUT that degrades into being eventually consistent.
Netflix have turned the S3 metadata eventually consistent problem into s3mpr metadata eventually consistent problem. The difference is that they can now inspect and reason about s3mpr's metadata.
Spotify have had to do the same thing for Google Cloud Engine.
I can't help wonder if eventually consistent object stores will go the way of NoSQL databases, when a consistent, scalable hierarchical filesystem appears.
Also on GCS, if you do a HEAD after DELETE on a bucket that is under lifecycle management it returns 200 instead of 404. Not really a consistency issue but it can really come and bite if you if you're not aware of it. GET returns 404 but HEAD returns 200.
I reported it as a bug but Google said it was by design. More specifically they said: "You are correct, if the versioning enabled in your bucket then the object metadata is saved as an archive object in the bucket [1].This is the reason you are getting 200 for your HEAD request."
>I can't help wonder if eventually consistent object stores will go the way of NoSQL databases, when a consistent, scalable hierarchical filesystem appears.
Of course they will. Eventual consistency is a huge tradeoff that I don't think anyone would make if they weren't forced to.
Exactly. Availability and partition tolerance are often hard requirements, so we have to do all sorts of gymnastics to deal with eventual consistency. In personal projects where availability isn't a big concern and my most precious resource is my own time I tend to make different tradeoffs.
Very good question. Of course all services behave a little differently which is out of our control. To be honest, I don't have an exact answer on that. In my first tests I could chain a download directly after the upload with all services. But I don't know if that's because they're all guaranteed to be read-after-write or some were eventually-consistent just really fast. In general, if one service is read-after-write and you want to switch it for one that isn't you might get problems, unless you've programmed defensively (checking for existence before proceeding).
Give us some time to check that and run some more tests.
That's kind of the point though. This abstraction seems doomed to be extremely leaky. Honestly, I'd rather just deal with them separately than be constantly fighting an API.
I'm using CrashPlan to backup from my laptop to another computer on the network. I see that rclone can backup to a local filesystem. Do you happen to know if I can I do with rclone what I do with CrashPlan? i.e. use rclone to backup my local filesystem to a remote server?
It looks very sleek though, I'll have to give it a try.
This is our brand new unified API for enterprise cloud storage providers. It's part of the CloudRail API Integration Solution which consists of multiple unified APIs for different categories like social, payment, consumer cloud storage etc. Our value props are: A single API for multiple providers & No API changes since we keep the integrations up-to-date. All that without a hosted middleware. So we never touch the data. Looking forward to hear your feedback.
Why not allow people to specify URLs ? e.g. Swift (what RackSpace runs) can be run anywhere, Ceph can imitate s3 or swift, and Eucalyptus can allow people to run API compatible versions of AWS.
Providers can also add new regions at the drop of a hat.
Limiting it to pre-defined URLs is a pain, and relies on tools updating for new regions being added.
It has compatibility with local filesystem, Backblaze B2, Google Cloud Storage, Microsoft Azure, and OpenStack Swift. Note that this is software and not a service.
It appears that Cloudrail not a service either, just a commercially licensed library (like Qt for example). In fact, I find their simple library approach a bit less awkward than your local server approach, though I suppose that was necessary in order to make a neat drop-in replacement for code that's already targeting S3.
S3Proxy uses Apache jclouds underneath which has broad compatibility with object stores including S3 clones. However no two S3 implementations are alike so you will need to test.
Nope, we don't use it but it is a very cool project. We have SDKs for Android, Objective-C, Swift, Java and Node.js so far. That's why we couldn't us it anyways. We don't want to compete with open source projects like this. Our goal is to offer easy integrations for all platforms and all major use cases (not only cloud storage). Btw, to support open source, we made our solution free for non commercial projects.
It's a nice feature-set, looks well-designed, and handy - but anyone using one of these APIs would find it super trivial to implement this themselves? I'm not sure I see the cost-benefit making sense here? I'm keen to be convinced, though - what's the killer use case? (I'm not aware of APIs changing or being deprecated in any big way)
Edit: ah, the other APIs make way more sense to me. You can offer multiple options to a customer. But this doesn't seem to have an equivalent?
This allows you to easily integrate multiple of these providers if necessary or easily switch. Of course you can integrate them one by one on your own but that takes a lot of time time. With CloudRail it's one API which works even cross platform. Our ultimate goal is to handle all your integrations and not only cloud storage. So unified APIs for fast API integrations and API Change Management to keep your integrations running forever.
Can you provide a use-case for when you would need to be able to easily switch (or potentially hot-swap) cloud storage? It sounds like some of the justification for ORM's; while it would be nice to have if it took no more time to integrate with that API its hard for me to see how it could justify extra effort in implementation/maintenance unless you have a concrete understanding of when you need it.
I see what the feature is, but I don't see the use case or benefit - in this particular case! The social, payment, consumer storage, I definitely do. But it feels a little like the age-old example of abstracting away your database layer, in case you wanted to change which database you use later. (It's incredibly rare to do it!)
Even if you don't want to switch and believe in non changing APIs, it is easier to integrate with CloudRail :) But as already mentioned, this interface is really part of a bigger offering. We want to handle all your integrations eventually and enterprise cloud storage has to be part of that.
Our system monitors the APIs and informs us about changes or the provider does. This happens usually months before the integration would actually break. Afterwards CloudRail updates the SDKs and informs the affected users via email and in the portal. And affected means really affected, so only if you use this specific (broken) function. All you need to do then is update the SDK to it's latest version. We are also working on a optional and completely automated way to update the SDK. But most developers want to test it before. Btw, any opinions on the auto update here?
Its nice but what incentive does cloudrail have to provide perpetual gratis access to someone elses API?
Unlike an app that can still work when the original developer goes broke, an API requires that always on access.
Does my whole build break because cloudrail can't handle the traffic anymore? At least with AWS or GCS or Azure, etc, one has a business relationship with the API provider and so can keep the lights on.
CloudRail doesn't touch the data. Everything flows P2P between the app and the provider. It's like if you integrate the native AWS SDK. So even if CloudRail goes completely down, your integrations will keep running.
Happy to see you guys having success! I've done my master at the University of Mannheim. Never thought that this would happen, but I kind of miss the city.
I was looking for the same thing but for email providers. I would like to let users to subscribe to mailchimp,aweber,getresponse etc using a single integrations instead of having to integrate each one of them.
We have a very simple interface for email here: https://cloudrail.com/unified-email-api/ But it just offers sending emails. What you described is actually a potential candidate for a next interface. Would love to discuss that use case with you. Feel free to reach out: support@cloudrail.com
Terraform abstracts the APIs of multiple cloud providers into HCL, HashiCorp Configuration Langauge. What Terraform is essentially doing is enabling you to skip the part where you write code to talk to AWS' API, allowing you to spin an EC2 instance, or talk to GCE and store something in GCS, and instead just work in one language, via one tool. If the underlaying APIs to those providers shift and change over time, HashiCorp updates the logic under the hood within Terraform, and your code continues working. What it does not do is provide a single resource type or function/method that you can use to upload objects to both AWS and GCE - you have to write two separate resources to work with each of the two example cloud providers.
What CloudRail is saying is: the upload() function will work whether you tell it to push the object to S3 or GCS.
What Terraform forces you to do is change `resource "aws_s3_bucket_object" "picture" {}` to `resource "google_storage_bucket_object" "picture" {}` when you want to change from uploading to GCE from AWS.
As I understood it, it is more a solution for dev-ops to create the infrastructure. Like a provider agnostic CloudFormation. CloudRail is about making it super simple to integrate APIs into an app.
Similar guarantees are needed around how soon after deleting something can readers expect to get 404s.
If these guarantees differ, you might find abstracting over the stores doesn't work the way you'd like...