

Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search - siculars
http://basho.com/introducing-riak-2-0/

======
twoodfin
On a related note, if you're not listening to the Think Distributed
podcast[1], you're missing a terrific combination of research and engineering
discussion around distributed systems. It routinely features a few of the
Basho guys. The episode on causality, especially, was first rate.

[1] [http://thinkdistributed.io/](http://thinkdistributed.io/)

~~~
paxcoder
From what i heard of the podcst, it's meh.

------
biot
Can anyone comment on what they've found is the sweet spot (in terms of
features and functionality) where using Riak is the #1 choice? I love the
technology and have wanted to use it for a while but haven't had a need for
something not better suited to a relational database.

~~~
moomin
I think the sweet spot is when you realize that MongoDB has lost your data.

~~~
manishsharan
I am not a MongiDB fan (I am auser but not a fan) but what your are implying
is FUD; MOngoDB is unlikely to lose your data but it is not as idiot-proof as
traditional DBs. Anyone who bothers to read the documentation would know how
to avoid data loss. Having said that, I am not happy about Mongo's DB Locking
but it hasn't affected my applications' throughput yet.

~~~
ismarc
Any particular advice on how to not lose data when map-reducing from a sharded
collection into another sharded collection? Because we had to migrate away
from mongodb in a hurry when we had ~20-40% of our data just not get written
(basically 1-2 shards worth) from that scenario. And by in a hurry, I mean we
spent a significant amount of time debugging, troubleshooting and tracking
down what was causing the issue (we were on 2.2 at the time, not sure if it's
been fixed since...) and realized there was no way to fix the bug unless we
were willing to delve into how mongo did the writes from map-reduces. So once
we found the underlying issue, we quickly migrated away.

The number of conditions where it will silently lose data and you have no
control over the write consistency is absurd. However, every time it's brought
up, people shout it down because they assume you're talking about the known
(32-bit version and dataset size/ram size, not setting it to confirm the
write, etc) write issues, and not completely different ones that aren't
resolvable.

~~~
manishsharan
@ismarc , I have never had to encounter your particular use case. Is there a
bug report with details for duplicating this ? Like I said, I am not a fan of
Mongo and were I to encounter an issue like yours in my use cases, I would
bite the bullet and migrate to something else.

~~~
ismarc
There may still be. The first two I opened were closed pointing to docs on how
to set the writing stuff. The reproduction is pretty easy, if a shard tries to
write its results to a shard that is write locked, none of that shard's map
reduced data after that point is written to any shard. The more evenly
distributed your data, and the more shards you have, the more likely you'll
hit the condition. Combined with the fact that all unsharded collections
always go to the same shard, the whole system becomes useless unless you can
fit your entire dataset in all collections in ram on a single box.

------
rch
Looks like it is going to be a great release. The first thing I wonder about
with search is facets, especially when Solr is involved. I assume that being
compatible with Solr clients means supporting faceted search, but has there
been any public discussion of memory management with large, distributed
indexes?

Edit: Discussion of facets starts around 25 minutes into the Yokozuna talk at
RICON East 2013

------
pharkmillups
We're streaming talks live today and tomorrow from RICON West in San
Francisco.

ricon.io/live.html

There are no less than six talks dedicated to this release.

------
ghc
Want more details about Riak's new Data Types? They're on GitHub:
[https://github.com/basho/riak/issues/354](https://github.com/basho/riak/issues/354)

~~~
igtheflig
That issue was created before we did the work, and it is a bit out of date, a
fuller treatment is here, and it includes references to papers that helped us,
as well as how we overcome some of the hurdles in that issue you linked too
[https://gist.github.com/russelldb/f92f44bdfb619e089a4d](https://gist.github.com/russelldb/f92f44bdfb619e089a4d)

------
gpsarakis
Is this version compatible with 1.4.2 or will it require a full data
dump/restore?

~~~
gregburd
If you have a 1.4.2 cluster you can do a rolling upgrade to 2.0 without doing
a "dump/restore". Please remember, this is a tech-preview, don't upgrade your
production cluster to 2.0 until it's been released for production use. Feel
free to test drive this preview, and if you do so please send us feeback!
#disclaimer I work for Basho

~~~
gpsarakis
OK, thanks for answering!

------
oomkiller
IIRC, the previous implementation of Riak Search had some gotchas when it came
to rebalancing and adding/removing nodes. Does anyone know if this new arch
fixes or helps with those issues?

------
Nate75Sanders
First time I had seen CRDT as "Conflict-free" instead of "Commutative", but it
seems to be a reasonably popular choice for C after some googling.

~~~
tsantero
CRDTs can either be commutative or convergent; much of industry and academia
have settled on the opaque "conflict-free" instead of CmRDT/CvRDT for
simplicity

------
Ycros
That is an incredibly well written announcement.

