Today bet365 have signed the agreement to purchase all Basho IP. We expect the agreement to be ratified in the US courts next week. Once cleared, our intention is to open source all code, help rebuild the community and collaboratively take the development of RIAK forward.
In the coming weeks we will hopefully answer the questions people have and will be calling on the community to help forge the initial RIAK Roadmap.
One of the initial questions we have for the community is which OSS license would people like applied to the code? Our thought is the most open and permissive.
GPLv3, as it protects from patent trolls and also encourages creating an better ecosystem.
There are companies that refuse to use GPLed code. They want to modify and then sell closed forks of your product, without paying you, and by doing so they fragment the userbase. GPL prevents this.
If you want to allow them to do so you can always switch to BSD or dual-license - perhaps after a private agreement and if you find the companies reputable.
Fantastic news! I just used Riak Core for making a multi-master service. It was impressively straightforward! Glad to hear there are people intents to continue the project as OSS.
Will this include the Basho technical documents? Much of the better technical documentation lives on basho.com and was returning 404's.
Great!
There aren't that many persistent (disk-based) KV-stores out there. I planned to, but didn't get to actually use Riak, and was sad when the company seemed to go down.
As far as licensing goes, it would be great to have one that would allow me to link the client code libraries and run the servers without having to worry about anything. If I understand correctly, Apache2 goes way beyond that, while GPL is somewhat ambiguous on that. (Yes, one could implement their own client, but that'd be reinventing the wheel).
As I replied below, the distinction between libraries and servers is a good one. I was too extreme in my statement without making that distinction.
For libraries, yes, absolutely, companies should rightly be cautious, because it's relatively easy to expose yourself to legal risk.
Nobody is "going away with the work of others" – the whole point of a license is so contributors decide how others can use their work. What exactly is the issue here?
GPLv3 is for a lot of companies too aggressive on requirements around patent defense, etc. Though many were never fond of the copy-left nature of GPLv2(and 1), they allowed it. GPLv3 upped the bar, to a degree that many lawyers won't sign off on its use.
Apache 2.0 is the standard for almost every company I've dealt with, with MIT and BSD 3 clause being accepted as well. But Apache 2 seems to make most lawyers happy.
You say the GPLv3 is too aggressive on requirements around patent defense, but the Apache 2.0 is okay? The Apache 2.0 license requires a patent grant. How do they differ in the patent grant?
I'd speculate that for many companies the GPLv3 is too strong in its copy-left stance, by disallowing Tivoization.
GPLv3 patent text was created by literally copying the part from the Apache 2.0 text, and then adding a paragraph to handle patent agreements such as the one between Novel and Microsoft.
The distinction between the two text is case of paten agreements.
If your organization uses any of: Linux, Android, emacs, gcc or any GNU utility, OpenJDK, or MongoDB, then GPL is clearly not a no-go. GPL only poses issues when the product is a library. For standalone products or ones with linking/access exceptions like OpenJDK, GPL poses no issues, and organizations aren't afraid to use it, especially when they don't intend to modify it.
GPL is a great license that ensures that improvements to the product are shared back with the community. In fact, for a database like Riak, Affero GPL, like the MongoDB license may even be better, as organizations that make improvements to Riak but don't redistribute it but run it internally on their servers would also be required to contribute back their modifications.
> If your organization uses any of: Linux, Android, emacs, gcc or any GNU utility,
Two things wrong with this right off the bat. First, not all of these are GPLv3, which is the version with the problematic patent clauses that scare companies away. Second, it is not just in libraries that the GPLv3 is generally accepted to pose potential issues. You need look no further than Apple and the great investment they went through to avoid ever shipping GPLv3 software. The GPL kicks in as soon as you distribute the software.
> First, not all of these are GPLv3, which is the version with the problematic patent clauses that scare companies away.
So, if Linux were GPLv3 you're saying that most companies wouldn't use it?
> You need look no further than Apple and the great investment they went through to avoid ever shipping GPLv3 software.
Most companies aren't like Apple.
> The GPL kicks in as soon as you distribute the software.
That's right, and that's a good thing. If you distribute GPL software that you modify, you need to make your modifications public. But as Riak is not a library, it does not infect any other component. If you distribute software with (say, Android) Linux but don't modify Linux then the GPL doesn't infect the other components of your software, as they're not actually linked (or Linux has some explicit exceptions).
> GPL only poses issues when the product is a library
What about the client libraries? Do you consider them to be a part of the product? It might make sense to release the client code (library) and the server code under different licenses (Apache2/GPL, respectively) if GPL is to be used.
> It might make sense to release the client code (library) and the server code under different licenses (Apache2/GPL, respectively) if GPL is to be used.
I think APACHE-2.0 is probably the best for end users in terms of acceptability in enterprises etc. A lot of places have issues with the virality of GPL licensing.
Anecdotally, those who decide to use GPL don't like the results - users are afraid to extend functionality, and conscientious extenders provide code that specific to their use-case and that can't be used.
In 2012, I worked on a nine month project to migrate a fairly large hot dataset (10s of TBs) off of SimpleDB and onto Riak. At the time, we were experiencing tens of hours of downtime a quarter, much of which could be attributed to SDB.
My favorite story about that migration is that four months into the project, we were moving along nicely and ready to start cutting over to Riak after the Christmas holiday. My office had decided to go out to lunch our last work day before everyone headed off for the holidays, and I received a call from a SDB support team member informing me that we either needed to move off their platform by Christmas Day or they'd have to shut us off (we'd start degrading other customers). Fortunately we were nearly ready to pull off the cutover, so we quite literally cut all of our traffic over on Christmas Eve (one of our busiest days of the year).
Over the upcoming years, our database continued to grow and grow, and all the while Riak trucked along. It wasn't always a smooth road, and we certainly had our challenges from time to time, but I have yet to hear of anyone who has used a massive 100TB hot database and not had to do work and maintenance.
Throughout the years I used Riak, it served me very well. I'm grateful to the innovative work so many at Basho created during their tenure, and I'm glad to see Bet365 attempt to steward the project to a new phase of its life. If I could toast every last basho employee, I would. Thank you all!
> I received a call from a SDB support team member informing me that we either needed to move off their platform by Christmas Day or they'd have to shut us off
Wat?! That's an excellent reason to burn fields and salt them.
I was / am pretty sympathetic to their situation. At that point, SDB was already deprecated in favor of dynamo, and they had been asking us to move off of SDB for over a year (before I had joined the company via acquisition). If someone ever does an oral history of AWS, I'm sure that SDB would be the "grand mistake" they made in their first few years. It was never ready for prime time in the way it needed to be.
> Amazon SimpleDB is designed to integrate easily with other AWS services such as Amazon S3 and EC2, providing the infrastructure for creating web-scale applications.
If I hadn't heard your story, I wouldn't see anything there to steer me away from it.
I think Riak properly materialized the NoSQL promise of fully scale-out architectures. Have not seen an open source product that delivers on this (no Mongo does not cut it for me). Sad to see they did not manage to make a business out of it... The bet takers may be a good patron to the project, but ideally it is governed by an independent shop that focuses on further development and marketing.
RethinkDB was good from a distributed systems perspective, but a nightmare to maintain in production. Backups and restores would take 12+ hours on ten gigabytes or so and slow queries would grind the whole system to a halt, which isn't possible in Erlang based systems like Riak due to the preemptive scheduling of the BEAM.
Riak builds on theoretical foundation, laid out in Amazon's Dynamo paper. The other NoSQLs did not have this theoretical underpinning, and so they just offer a "best effort".
Sure, but that said, no other DB Jepsen tested til that point necessitated the kind of gymnastics he had to do to get it to fail. It's pretty solid CS, and it's a shame the project had the end of life that it did.
Dynamo is not exactly a performant or efficient model. It's the equivalent of pulling all the distributed systems guts out and handing them to the user to deal with. And the resulting toll is quantifiable: http://damienkatz.net/2013/05/dynamo_sure_works_hard.html
Damien's a very smart guy, but I don't think I agree with him here:
> Within a datacenter, the Mean Time To Failure (MTTF) for a network switch is one to two orders of magnitude higher than servers, depending on the quality of the switch.
A switch is highly unlikely to fail. They seem to be bulletproof. But having worked with a datacenter (on the engineering team of an early AWS competitor), switch _misconfiguration_ was all too common. Maybe a tech accidentally plugs in the wrong ethernet cable and forms a switching loop. Maybe someone fat-fingers a tag and a broken VLAN gets automatically deployed to 10,000 nodes. Either way, the _switch_ is alive, well, and pushing packets - but they're the _wrong_ packets and the result is indistinguishable from hardware failure to the end user.
At datacenter scales, these things happen... not infrequently. If you engineer your database to expect that netsplits are rare, you're going to have a bad time.
VLANs were the bane of my existence when I had to figure out how to deal with them. I don't envy anyone whose job is to manage them on switches for a lot of servers.
Good points. Weren't the last couple of AWS outages partly due to misconfigured network configs? Depending on your problem the replicated reads makes sense given those kinda of outages. Though I'm new-ish to riak's "core" design, when you get the APL with the servers, isn't it feasible to create a design similar to what Damien's proposing using a preferred master for a given vnode?
The problem was replication across the cluster and getting them to coordinate their values. We eventually did what you suggested in our dev environments so that we didn't lose our sanity. .
> They're open sourcing the code (likely because they need help)
Unlikely. I genuinely believe Martin Davies is doing it as a service to the Erlang community. bet365 doesn't need to open source anything. They've got deep enough pockets to hire developers to maintain Riak internally for the foreseeable future.
> Is Riak so critical to Bet365 that the right move was to _buy the company_ versus switching to a different storage system?
Yes. There isn't a storage system readily available which offers the same capabilities as Riak. At the scale at which bet365 operates, doing so in a gradual fashion takes years. By capabilities I mean the ability to make different tradeoffs in different use cases. Riak has a nice set of levers to tradeoff consistency for availability. Its built in support for CRDTs is quite amazing.
We use a huge amount of RIAK at bet365, and prior to the IP acquisition had secured the rights to the enterprise code. We have a large development team and are already working on enhancements. Securing the IP, and the subsequent opening of the code allows us, and any other enterprise users in a similar position, to release code for the good of the wider community.
> Is Riak so critical to Bet365 that the right move was to _buy the company_ versus switching to a different storage system?
Quite possibly, yes. Data has incredible inertia. Not only because of the storage system used, but also all of the years of ops tooling, application integration, etc. that surrounds it. I don't know anything about Bet 365, but if they had any non-trivial amount of their business data locked up in Riak then from a cost/risk analysis they may have decided that purchasing the company was the safest and cheapest option to protect their own company.
The note mentions a receiver, which means the company is being liquidated by a court, which implies that it's effectively (though not technically) bankrupt.
How much does it cost to move to Postgres? How much does it cost to buy the assets?
How much of a halo do you get from writing a nice post about moving to Postgres? How much of a halo do you get from buying and open sourcing something cool?
I doubt Postgres could handle what Riak's doing. Riak is a Dynamo style database, which basically means that you get zero features but you get something that's fast and scalable. Postgres gives a different set of tradeoffs.
This really doesn't surprise me. My experience with Riak (admittedly in 2012-2014) was that it did not live up to the hype. As strictly a key-value store, it wasn't bad, but toss in 2i, etc, and it quickly became problematic.
Open sourcing the software will surely be a good thing, though. I'm eager to see where it leads.
I worked with riak pretty extensively in that time period. I'm in more or less the same boat. It had cool features, but they didn't make up for the overall kind of slow performance and limited feature-set.
(Admittedly, our use of riak was clearly a case of premature optimization: we didn't have anything like the scale to require an available data-store.)
As a technical evangelist for Basho, I would encourage sales prospects to think carefully about what they really needed. One of the problems trying to sell Riak is that, unless you know you need the availability it can offer at the scale it can support, you're probably better off choosing something with more features and less scale.
And, although clearly we're swimming in more data than ever, not all that many people have both massive data sets and a burning need to never lose any of it. A lot of data can be lossy without any real consequence.
In my experience with Riak, it absolutely lived up to its promise of scale, even if single-request speed was never quite as zippy as anyone would have liked.
However, the Dynamo design is just borderline unpalatable from a developer's perspective. I enjoy being aware of exactly the trade-offs being made, and Basho was always really open about it, but I don't blame anyone for not wanting to dig deep in the weeds to make it work for fairly common use cases.
I have no experience with Riak, but I once went to a conference (a very short while ago) and saw a presentation by a higher up of Basho (I won't disrespect the person by naming them), in short: It was a very generic, bland and positively abstract presentation about big data. Other presentations were highly technical, so I was excited to get to know Basho as well - I came out of that thinking "wtf was this guy talking about?". It's obviously not a reflection of what Basho stood for, or even what the individual in question stood for, but that's the only memory Basho has left in me.
A shame you were left with that impression. For quite some time, from top to bottom, Basho had quite a reputation for solid technical talks. The distributed systems conference RICON was quite excellent.
(Disclaimer: former Basho technical evangelist & engineer.)
Ah they didn't. I went to work for them after I was let go by Basho in January, but have since moved on to general contract work on Riak. I am lucky enough to remain in contact with the people at bet365. My take, for what it is worth, is that this is a very generous gesture from bet365 and it helps the whole community. bet365 really hide their light under a bushel in terms of how big a UK technology success story they are. I'm looking forward to working on the newly open sourced replication code.
I've no experience w/ Riak what so ever but years ago I was contacted by hh for an open position at Basho. They where looking for field support engineer to a local gaming company. They would hire me and put me directly to client. Maybe i'd have gotten some training on the Riak, but it was never mentioned during the rounds. All interviews where done remotely. Eventually I did say no to the offer.
Some time later, even the guy who interviewed me was out from Basho.
I knew a couple of folks in Client Services at Basho, one of whom later moved over to the engineering side. From what he told me, the client services group was a great team to work with and had great management.
Basho has silently gone dark since around Easter, with no indication of anything on their site; it still looks like Riak/Basho are an active, living company. Despite being hosted on basho.com, this is just a post to a mailing list, still not an official account/announcement of anything.
I have a client with a support contract and while I don't know for certain that they didn't contact someone within the organization, I know that most people, including myself, found out only when someone asked if we could contact Basho support and they were informed that it wasn't there anymore.
Wow that one slipped by me totally. I had no idea Basho was in trouble (though I'm not a Riak user, I have used some of their other erlang projects including webmachine and lager).
That is only one avenue to open source contributions. They may contribute via individual commits directly to other libraries such as https://github.com/basho/riak etc
This is great news! I've spent the last two years working with Riak. I love its admin tools, and Basho engineers were the best to work with. I'm glad to see Riak live on.
I evaluated it in early 2011. At that time all NoSQL dbs were new in market.
For me, their restriction on usage in cluster mode, very less support for all languages and lack of file storing abilities like GridFS of Mongo made it unattractive.
Eventually Mongo seemed good choice because it was completely free for any kind of use with amazing features and lots of libraries. I am glad I made that decision :)
If you are going to open source Riak, then maybe consider donating it to Apache Software Foundation? Maybe not immediately, but in the long term it feels like it would make sense.
In the coming weeks we will hopefully answer the questions people have and will be calling on the community to help forge the initial RIAK Roadmap.
One of the initial questions we have for the community is which OSS license would people like applied to the code? Our thought is the most open and permissive.