Is being 100% open-source really the motivating factor to use this over S3? I wo...

notmyname · on Nov 8, 2011

As others have said, being open source is important because it gives you more control over your data. If you have access to the code that is storing your data, you have the option to host it yourself or pay someone else to host it with the same client. I've written about this more on my blog (http://programmerthoughts.com/openstack/democratization-of-d...).

If you are looking for an alternative to S3, I'd ask that you look at Openstack swift (http://swift.openstack.org). It's 100% open source, proven at scale in production, and, if you are hosting it yourself, can offer lower op-ex than using S3 (of course, there is a cap-ex cost to buy your hardware).

dotBen · on Nov 8, 2011

If you have access to the code that is storing your data, you have the option to host it yourself or pay someone else to host it with the same client.

That's certainly true in 'regular' open source economics.

But storage is somewhat unique in that the significant cost factor is not the price of alternative proprietary software but the hardware costs of the storage medium itself - and those are constant regardless of the license structure the software layer is using.

Additionally, scales of economy come into play to such an extent that the costs of hosting this kind of storage myself will be wildly more expensive than a volume player like Amazon who operates entire datacenters (this argument goes beyond just the op-ex/cap-ex tradeoff)

rarrrrrr · on Nov 8, 2011

We also expect price to be a motivator, since it costs less than 1/2 of what S3 does at $0.06/GB.

But we're not competing with S3 directly as a general cloud storage solution. We're specifically focusing on the case of long term archival storage.

You can compare the two services as tradeoffs from the expression: Inexpensive, High Throughput, Low-Latency (pick any two.)

S3 picks High Throughput and Low Latency.

Nimbus.io picks Inexpensive and High Throughput.

But for bulk archival and restore tasks, does 100ms of latency really matter to you? In other words, are you equally happy if your backup/restore job completes in 2 minutes vs. 2 minutes and 0.1 seconds? Do you care enough to pay more than twice as much? So that's why we're focusing on the archival market.

Egregore · on Nov 8, 2011

Yes, being opens source is motivation, because you can create your own cloud with your own hardware when you need it.

And it's additional assurance that you'll be able to deploy your system even if they go out of business.

rarrrrrr · on Nov 8, 2011

(SpiderOak / Nimbus.io cofounder here)

In addition to supporting the founders personal ethics about software freedom, we feel an open source backend is important for just the sake of confidence.

Some people will want to purchase the minimum of 10 machines and host a Nimbus.io storage cluster themselves (and we are also making our hardware specs open source.) Other cloud storage providers may even do this. We hope a few people will consider the hosted option, paying Nimbus.io $0.06 per GB.

In any case, all of these are a win for us. We're already spending money every day to maintain a reliable storage backend for our encrypted Backup & Sync business at SpiderOak.com. Nimbus.io is an evolution from that. Community involvement here is most welcome. :)

Aside from that, it's just a design we are excited to share. Every other distributed storage system I could find uses replication instead of parity. A system based on parity sacrifices latency but can deliver higher throughput on individual requests (at about 1/3 the cost.) There are use cases even outside of archival storage where this is attractive.

hemancuso · on Nov 8, 2011

I don't see how a parity based implementation can work in a meaningful way across multiple datacenters. You certainly couldn't rebuild if you lost an entire datacenter due to disaster. Replication is the only way here.

So any comparison to S3 in that regard is meaningless - Nimbus can't achieve that level of durability, correct?

Additionally, if you're just doing parity across multiple chassis in a single datacenter and lost a couple racks do to a power outage it would seem the network would likely shit the bed trying to rebuild, potentially bringing the whole system down. Have you guys worked through nastier failure cases that architectures like S3 can avoid?

rarrrrrr · on Nov 8, 2011

Excellent points.

Geographic redundancy with parity compliments the network topology we find in many cities: a metro area fiber ring connecting many data centers with low cost site-to-site (not internet) bandwidth. It's even lower cost to just buy excess capacity with lower QOS.

Every archival storage provider I've talked to has a write-heavy workload. Write traffic maybe more than 3x read traffic. So for example in this situation replicating between two sites requires a site-to-site connection equal to the size of the incoming data. Since site-to-site connections are full-duplex, in the parity system the bandwidth for reads and writes is provided at a similar price to what would be spent on replication bandwidth for writes.

That said, the first iterations of Nimbus.io won't provide geo redundancy beyond the geo-redundancy that creating an offsite backup inherently provides. We expect to add on geo redundancy storage as an upgrade option at a slightly higher price (still way under S3.)

Replying to your second point: If transient conditions like only a couple racks lost power, the system wouldn't trigger an automatic rebuild right away. It would continue to service requests with parity and hinted-handoff until the machines come back online. In any case, when the system decides a full rebuild is needed, the rebuild rate is balanced with servicing new requests (similar to how a RAID controller can give tunable priority to rebuild vs. traffic.)

wmf · on Nov 8, 2011

I don't see how a parity based implementation can work in a meaningful way across multiple datacenters. You certainly couldn't rebuild if you lost an entire datacenter due to disaster.

Sure you can. Given a system that can tolerate loss of N shares, you need to ensure that no datacenter holds more than N shares. In practice, this means you need many smaller datacenters, not two or three; whether that is economically feasible depends on the provider.

methodin · on Nov 8, 2011

Isn't that the whole point of S3 - you don't need to make your own cloud with your own hardware?

JoeAltmaier · on Nov 8, 2011

Its a leap that folks desiring cloud storage will want to host their own some day. I think its a little like "I want an open-source car, so when they switch back to horses, I'm ready!"

bmelton · on Nov 8, 2011

It's less like that than it is "When we're operating at a scale that makes it more affordable to own hardware than to lease, then it'd be nice if we could."

The other unmentioned benefit is in scratching your own itch. If you want feature X, and Amazon won't give it to you, you can develop it in-house and host it yourself.

jasongullickson · on Nov 8, 2011

I think it's more like saying "I want a drivers license, so when I can afford a car I don't have to keep riding the bus."

blackiron · on Nov 8, 2011

The license for server-side code - the one you would use to create your own cloud - is AGPL. Isn't this license too restrictive for business?.

hugoroy · on Nov 8, 2011

Why would it be? It's the same as the GPL, only difference is: modified version sources must be available to remote-network-interaction users. I don't see what's restrictive for business.

mseebach · on Nov 8, 2011

If you go and sell somebody a hosting solution based on Nimbus, you'd need to share your source code.

What I'm not sure of, is if you build, say, a photo sharing webapp using Nimbus as the storage back-end, does your webapp become AGPL by linking? I'm fairly certain GPL would require this, but, as per the rationale for AGPL, you don't care about that when you run a webapp.

Curiously, if Nimbus adopted the exact S3 API instead of "similar to", it would not constitute linking, as it's using a standard interface.

justincormack · on Nov 8, 2011

AGPL uses the same definition of linking as GPL so communication over a network API to a storage backend is not linking.

http://www.quora.com/Does-the-AGPL-extend-the-idea-of-linkin...

mseebach · on Nov 8, 2011

Ah, thanks for that. I was under the impression that the bar for linking across the network was rather higher.

edanm · on Nov 8, 2011

Is there anyone, anywhere, who considers that as a plus? Who would actually consider rolling out their own cloud infrastructure?

notmyname · on Nov 8, 2011

Actually, many people consider it. Some are simply cautious about hosting their data with a third party. Some are prevented from using a third party for compliance or regulatory reasons. Also, it's generally more cost-effective for extremely large datasets to be self-hosted rather than hosted by a third party.

edanm · on Nov 8, 2011

Aren't the bulk of customers who turn to the cloud rather small operations, who are trying to "outsource" as much of their infrastructure issues as possible? And aren't these customers much more concerned about pricing, rather than possible future growth?

Note: I don't mean to ask this sarcastically. I'm actually asking.

notmyname · on Nov 8, 2011

From my experience working with Rackspace Cloud Files, customer sizes are all over the map. Some customers are very small. Some are very large. I know that S3 has a similar variance in customer size.

From my experience talking to users (and potential users) of Openstack (http://openstack.org), there again is variance. Most people are relatively small (a few hundred GB to a few hundred TB). Some are much bigger (several PB). The most exciting thing I heard was that CERN is evaluating Openstack swift (http://swift.openstack.org) for their storage needs. A researcher from CERN gave a keynote at the last Openstack design summit. CERN generates 25 PB / year and has a 20 year retention policy. They have vast storage needs. The storage needs vary greatly.

I've seen that outsourcing infrastructure is great to a point, but the largest users can generally get substantial cost savings by bringing their infrastructure back in house.

mseebach · on Nov 8, 2011

The cloud is great for scaling, but once you have a large dataset and more-or-less predictable growth, it could easily become more economic to handle it your self. Using something like Nimbus would make such a migration easier.

On the other hand, it's not like the S3 interface is rocket-science. Re-writing your apps file-storage interaction is the least of the effort in a multi-terabyte-migration.

spatten · on Nov 8, 2011

A lot of Canadian companies are unable to use S3 (or EC2) due to the Patriot Act.

Sure, they can get around this by using European buckets, but that kind of sucks for latency.

I can easily imagine setting up a company using this software with servers in Canada using this software. So yes, it's a plus.

(It's not just Canada of course, and it's not just the Patriot act. Gambling companies, for example, can't host in the US)

asharp · on Nov 8, 2011

A cloud hosting provider.

hopescope · on Nov 8, 2011

"I could care less if this is open source" David Mitchell explains why this phrase makes no sense and means exactly that opposite of what you want to say: http://www.youtube.com/watch?v=om7O0MFkmpw

michael_dorfman · on Nov 8, 2011

The phrase is usually used ironically: "I could care less" is used to mean the opposite "I couldn't care less".

Similarly, when my daughter says "Nice hat, Dad", she is not actually complimenting me on my choice of haberdashery, but rather, pointing out that she thinks it is not nice at all.

This message brought you by Irony: Making Communication More Interesting Since the Dawn of Language.

morsch · on Nov 8, 2011

For what it's worth, I would not classify this as irony (or at least it's hardly a prototypical case). I'm sure the current meaning of "I could care less" is quite thoroughly conventionalized: it's part of everyday speech, and many people do not notice any non-literal effects such as irony -- as evidenced by the prescriptionist videos which feel the need to explain to people the "true" meaning of the expression. Irony may have had a role in the etymology of the expression. All of this is very similar to a dead metaphor.

repsilat · on Nov 8, 2011

> I'm sure the current meaning of "I could care less" is quite thoroughly conventionalized: it's part of everyday speech

Only in some places - I can only remember having heard it on television from the US. I shiver in pain every time I hear it, too, so I'm pretty sure I haven't heard it in person (having lived in New Zealand and Australia).

RyanMcGreal · on Nov 8, 2011

In fairness, I wouldn't offer the pedantry of a prescriptionist who was moved to create a video as evidence that the average person doesn't get the sarcasm of "I could care less". I think it's rather more likely that the average person couldn't care less whether the phrase is literally correct, as long as the listener or reader understands its meaning.

morsch · on Nov 8, 2011

I'm sure most people would see the original non-literal features of the phrase if they were to think about it. The point is, they don't! Not because they're dumb but because the entire expression has unit status in their vocabulary. The fact that people do not notice the original non-literalness in the phrase (and indeed understand it as intended) is evidence that it's not non-literal anymore.

I'm harping on about this because it's such a nice poster child for an entrenched (conventionalized) meaning of an entire expression as opposed to just a word, and for the lack of componentiality of meaning in language. In other words, there's more to the meaning of a sentence than just the meaning of its words. Componentiality is one of the points of debate between different schools of thinking in linguistics.

RyanMcGreal · on Nov 8, 2011

I suppose I just have a hard time getting too exercised about what is, essentially, a banal artifact of a highly idiomatic language. When I find myself getting bogged down over a particular expression, I step back and ask myself: if person A uses this expression, will person B understand what they mean? Really, this is all that really matters.

verroq · on Nov 8, 2011

You are talking about sarcasm. Not irony. And a commonly made grammar mistake isn't irony. Unless your whole post was wrapped in a big <sarcasm> tag and I've just made a fool out of myself.

michael_dorfman · on Nov 8, 2011

The relationship between sarcasm and irony is subtle; sarcasm often makes use of irony, but is characterized by its "biting" nature. The quote from my daughter is ironic and sarcastic; the quote about "caring less" is ironic, and may or may not be sarcastic depending on the context.

Irony here refers to verbal irony, a discrepancy between the literally meaning of a phrase and its intended meaning, such as saying "What a nice day!" when it is raining.

Thus, "I could give a shit" and "I couldn't give a shit" are identical in meaning, as the former is doubtless intended ironically. Similarly for caring less.

gojomo · on Nov 8, 2011

It's a contranymic idiom. A contridiom!

morsch · on Nov 8, 2011

No. He is talking about irony, not sarcasm. Irony is typically understood to mean saying A while being aware (or of the opinion) that !A. Viz. "Nice hat, dad", or "Real good idea" (when it's not). Sarcasm (cutting remarks) often involve irony, but not always; it's an orthogonal concept.

Also, dropping the "not" in "Could not care less" is not reasonably said to be a grammar mistake. It's usually not sarcasm, either. Whether you agree that it's irony is a different matter (I'm pretty sure it's not).

white_devil · on Nov 8, 2011

Really now? Next you'll be telling us that "then" is used ironically instead of "than", even when "than" is what people mean.

Face it, people say "I could care less" for the same reason they say "then" instead of "than". There's just something about the English language that makes most of its native speakers unable to use it.

You came up with this irony theory because you've made the same mistake yourself, and your ego wants to deflect the accompanying shame.

The word "not" is not a particularly big word, but then again, most native speakers can't use "than" either.

It's amazing what an outburst of theorizing wankery your comment sparked.

bmelton · on Nov 8, 2011

This argument is pervasive on HN. Here is the best example of it[1], and please see CodyRobbins posts[2] on the subject as some of HN's best posts ever.

[1] - http://news.ycombinator.com/item?id=853100

[2] - http://news.ycombinator.com/item?id=854042

ryanklee · on Nov 8, 2011

Wow. at first I thought you were linking to something actually interesting and thoughtful regarding the nature of security and open source software. Turns out you were being boorish, prejudicial and wrongheaded about the absolutely normal and acceptable use of language. There is no such thing as linguistic prescriptivism. What you are espousing is the linguistic intellectual equivalent of creationism. It only exists in unfortunate circles of bored, annoying laymen and their grammarian fore-bearers, who were equally unfortunate bored and annoying. For 10,000 reasons why you should leave everyone alone with your silly prejudices, see Language Log or any other of the dozen or so blogs linguists run. </rant>

pessimizer · on Nov 8, 2011

If that you could care less is the most you can say about something, that's an insult. If you say that you couldn't care less, that's just a lie - you're responding to it.

http://en.wikipedia.org/wiki/Damn_with_faint_praise

I love David Mitchell, but this rant by other people is a long held irritation of mine: false pedantry:)