
The end of NoSQL? FathomDB launches scalable relational DB - amirnathoo
http://fathomdb.com/news/demo2010
======
Semiapies
Tacking "The end of NoSQL?" onto the title is karma-whoring via exploiting the
silly database holy-warring we've seen on HN lately.

~~~
jemfinch
Accusations of karma-whoring are karma-whoring ;)

~~~
pmichaud
Recursion, I love it.

~~~
GrandMasterBirt
HN has fixed that, can only vote up once. Damnit, genius!

------
oomkiller
Here we go again. The AltDB movement is not just about scaling. While one of
the goals is to make things scalable, it's not THE goal.

BTW AltDB is my term for the NoSQL movement. It implies much less.

~~~
justinsb
Care to define the goals? Apparently it's not about scaling. It's not about
SQL, because there's no reason you can't support a SQL-style query language.

So I do seriously want to know what's left... We're always on the lookout for
the next feature to add ;-)

~~~
_pius
To me, it's about reducing the impedance mismatch between your data model and
the implementation of that model in the database schema. Some data work better
in a relational model and others work better in a document model. The idea of
one ending the other seems preposterous to me.

Congrats on the launch though!

~~~
neilc
Of course, this is a very old idea (predating the invention of relational
databases, even) -- it just gets reinvented every decade or so, with a fresh
helping of media hype.

~~~
jrockway
Of course, this decade, we actually have OOP, so OO databases make a lot more
sense.

~~~
easp
We had OOP last decade, and the one before that too.

------
adamt
I am sorry - but the subject here misses the point. Just having someone else
who can host a cloud MySQL database for you in what appears to be a single
location doesn't solve the problem that people have running high-availabilty
global internet services.

The challenges with MySQL are that scaling it effectively to a multiple-
master, multi-datacenter solution is almost impossible. Especially if that's
over a wide area network.

~~~
justinsb
The new technology isn't based on MySQL. It's a fully distributed, no single-
point-of-failure, scalable relational database. You can choose fully managed
hosted traditional MySQL, or we'll be hand-picking early customers for the
scalable technology. Contact me if you're interested.

We're starting with single-location, but we can certainly support cross-
datacenter configurations. CAP seems to dictate that this will either require
acknowledgement from the remote datacenter (choose consistency) or a window of
loss (choose availability / partition tolerance).

Until we see customers regularly hosting their webservers in multiple
datacenters, we're going to stay focused on optimizing for a single location.
But maybe my viewpoint is out-of-date... How many datacenters are you
currently distributing your webservers across?

~~~
adamt
Your website FAQ question 1 says it's standard MySQL. In another comment here
you say: "a new database which we've built ourselves, which we've announced
but isn't publicly available yet."

If it's standard MySQL, then you hit the same issues as if I were to host
MySQL myself. If it's your own database you've built yourself, but is not
available yet, then I am not going to trust it with my data.

In what I do I often need to have a high-availabilty website where by a loss
of connectivity, flood, power issue etc in that data centre can cause outages.
Most of the time on our critical projects we have one or two physical data-
centers with EC2 for testing and disaster recovery.

Although being based in Europe - we do a lot of work in SE Asia. The
connectivity between Asia and Europe (often you end up with 400ms+ RTT and 10%
packet loss) means you have to host local websites in the region, and even in
country (you might be able to drive between Malaysia and Singapore, but the
internet inter-connectivity is terrible).

If you have any kind of application that ever needs high levels of read-write
traffic you quickly end up needing to distribute the database in some way, and
for this type of application, it's an awful lot easier to build upon Cassandra
and do a bit more work in the application then to try and get MySQL or
PostgreSQL to act as a multi-site, multi-master database over sub-optimal
links.

~~~
justinsb
There's a lot we need to do... we just put up the page with the DEMO video
today, everything else isn't yet updated to reflect the announcement.

This is why we're pursuing both products: if you have a hair-on-fire problem,
and are willing to spend the time to investigate and become comfortable with a
new DB, then you should use the new database. If you're not there yet and
MySQL works for you, we'll run it for you so it's less painful. You've chosen
Cassandra because you're obviously in the first group; willing to learn about
new databases because of your Asia/Europe pain. You're invested in that, so
I'm not going to try to persuade you to change your mind; we're looking to
help those that have the same problem, but haven't already locked themselves
in to non-SQL databases like Cassandra.

------
btilly
Consistent, Available, and Partitionable on the fly. Choose any two.

Each choice is justifiable for different applications. But you have to choose.
Anyone who thinks they can make the choice for me and then tries to tell me
that they have met all my possible data needs is selling snake oil.

~~~
justinsb
I think you mean Partition Tolerance. I would ask you whether you've actually
read the proof, but that question seems redundant.

~~~
moe
So, ad hominem attacks aside, which tradeoff did you make for your product, or
did you find a way to sidestep the problem altogether?

I mean, you're announcing this thing like the 8th world wonder ("The end of
NoSQL") and seem to have no problem comparing yourself to Oracle, no less.

Do you have anything to show for it? Or At least a date _when_ you will show
us something?

~~~
justinsb
The reason why I mentioned Oracle in the demo is precisely because of the
whole CAP debacle. So many people talk about CAP as if the SQL database was
itself an impossibility, much less Oracle's scalable offering - but you can
pick up the phone and an Oracle salesperson will sell you the physical device
that you're implying can't exist.

So, if somebody misquotes the CAP theorem, it's an indication to me that
perhaps the thread isn't going to go anywhere. I'll point them to the place
where they can read about it (the proof was where the CAP theorem was
formalized, not Eric Brewer's original presentation), and move on. In this
case, that wasn't fair to btilly, who just made a typo.

So, as I said in the presentation and elsewhere here, if you have a hair-on-
fire problem with scaling your database, and are willing to be an early
customer of our new database, and would make a good reference customer in a
few months, we'd love to work with you today. Otherwise, you'll have to wait
till we open it up more widely.

~~~
jbellis
> you can pick up the phone and an Oracle salesperson will sell you the
> physical device that you're implying can't exist

Bullshit, and I will elaborate.

Oracle has two "scalable offerings:"

RAC relies on a single large san to "scale."

Exadata relies on super-fast interconnects to get multiple machines to look
more like a single huge one, which has the obvious speed-of-light limitations
as well as the price one.

Neither approach is the kind of scale out with commodity hardware that
something like Cassandra gives you.

Invoking the mythical "pay Oracle enough money and they will make it scale"
mantra is frequently done by people who either don't know better or are
deliberately muddying the water, but that doesn't make it right.

~~~
justinsb
Just because Oracle isn't using the exact same techniques as Cassandra, that
doesn't make Oracle's ability to scale fictitious.

~~~
jbellis
I see you aced Straw Man 101.

Did you not understand what I wrote, or do you think that "large san" or "fast
interconnects limited to machines in the same couple of racks" count as
"scaling?"

~~~
justinsb
Ouch - aren't you getting a little bit personal here? I'll try to remain
professional, and I'll accept the fact that you're quoting things you didn't
say, as the re-write was clearer as to your intention.

When it comes to scaling, there's no rule which says you have to scale in any
particular way. Many database customers consider scaling _up_ a form of
scaling, and probably 99% of the world's database users will never go beyond
what can be achieved on a single machine. They don't care about your pet
project, about how the way it works is cooler and technologically purer. They
consider databases a tool, and they don't really care how it works. I don't
think too deeply about how a can opener works. I'm able to open more cans
faster with a more powerful electric can opener. Even scaling up is still
scaling.

There are customers with bigger needs. For some of them, SAN based scaling is
just what they need; more IOPS let their database keep running and they can
get on with their lives. If they need more IOPS, they add more drives to their
SAN. Scaling with a SAN is still scaling.

Some customers have more complex demands, and look to Exadata or Netezza. They
might start with just a few nodes, and add more nodes as their load increases.
It's still scaling.

Now, we in CompSci circles get excited about scaling using clusters. That's
_sexy_ scaling. Cassandra and FathomDB can both change the rules of the game,
and I'm excited about what FathomDB can do here, just as I can see you're
passionate about Cassandra. But let's not pretend that most customers really
care about how it scales; they care about what we can do. To customers, if it
looks like a duck, and it quacks like a duck, it is a duck. But when they're
choosing a database, scaling is not the only requirement, and certainly
scaling in a certain way is unlikely to be on their list of requirements. If
you've ever read an RFP, there's a bewildering number of questions that have
nothing to do with technology at all, and often the technology section is a
depressingly short list. Although part of the RFP game is to try to get the
purchaser to write in requirements that only you can deliver, purchasers are
considerably more savvy than that.

You can jump up and down like Rumpelstiltskin arguing that yours is the only
database that's _really_ scaling and the Oracle solution didn't meet the
requirements, and that they should have chosen you. At the end of the day all
you're left with is one person jumping up and down screaming about the rules
of the game, and a customer that got what they needed and an Oracle
salesperson that earned their commission.

~~~
moe
I really don't like how you keep sidestepping the hard questions that were
raised.

Instead of curling up in semantics discussions, how about simply answering a
few of those? I think you could do that without revealing anything about the
magic sauce of your product.

To reiterate:

* Does your system support the full SQL vocabulary, including all join types?

* Is it ACID?

* Do I really not have to arrange my data in any special way (schema-, or partition-wise)?

* Does it really scale near-linear, regardless of the workload that I apply?

* Why do you bother with MySQL-Hosting as a secondary product if your new db scales down just fine?

~~~
justinsb
It might seem to you that we can answer these questions without revealing
secrets, but consider that (1) I've not been answering them and (2) you're
asking whether we can do what was thought to be impossible, so real answers
will necessarily provide direction. Short answers will just annoy you, full
answers will reveal too much, and frankly, there's little upside in replying.
If you'd make a good early-adopter customer for us, contact us and we can have
the discussion. But equally, I see that you haven't replied to my call for
early-adopters!

One I can definitely answer: yes, full SQL vocabulary support (though we
haven't implemented everything yet!)

~~~
moe
_Short answers will just annoy you, full answers will reveal too much_

Oh well, a simple yes or no would be fine, really.

However, I guess it's safe to assume then, that you're not ACID and that data
_will_ have to be rearranged to accomodate your system. That's my take-home
because a simple "yes" to either question would not have revealed anything.

~~~
justinsb
Well, if you promise it won't annoy you: Yes, Yes, Correct, Yes (for non-
pathological workloads), Choice is good

------
FooBarWidget
How does it scale? After some reading it seems to be just a hosted database
service with several vertical levels of scaling. What happens if Tera instance
can't handle your load? Can FathomDB scale horizontally without sharding?

~~~
justinsb
Sorry - it's difficult to be totally clear in 6 minutes! We have two offerings
... a fully-managed MySQL database-as-a-service which lets you grow up to the
biggest server on the cloud, and a new database which we've built ourselves,
which we've announced but isn't publicly available yet.

The scalable technology does horizontal scaling, and no - you don't have to
shard. Of course, in some sense your data is sharded, because it is
distributed across multiple machines, but we're not doing anything that you'd
consider sharding in anything other than the most pedantic sense!

------
stephenjudkins
This sounds like a great, very valuable service. However, as far as I can
tell, this service is not fundamentally different than running a MySQL server
with replication. Nothing indicates it's "scalable" in the way that say,
Cassandra, is. Replication is a perfectly adequate system for many people but
it has its limits, especially for a write-heavy workload.

If FathomDB is more sophisticated than a highly managed MySQL-as-a-service I
would be very curious to know.

That said, many of the complaints about scaling out MySQL stem from the
painful management overhead of running a replicated setup, managing backups,
etc. If FathomDB can vastly reduce the cost and difficulty of doing these
things it may move MySQL a long ways towards addressing many of the reasons
people are moving away from it.

"The end of NoSQL" is just another piece of claptrap that I've seen recently
on this and related topics. I would agree that "AltDB" might be a better term
simply because it's less inflammatory and better expresses the goals of many
of the diverse projects now lumped under "NoSQL".

~~~
justinsb
We need to work on our messaging! We're offering two different technologies: a
fully managed MySQL as-a-service, and the new scalable database technology.
But they're both relational databases in the service model.

With the MySQL-as-a-service 'traditional' tech, we take care of the backups,
monitoring etc. We'll offer fully managed replication in future. As you say,
these basic steps will make running MySQL much more attractive.

We've learned though, that there are still problems even here. You still have
to think about how big your server should be. You still have to think about
what happens when you outgrow the biggest server your cloud offers.

The new technology does scaling across machines in the same way that Cassandra
promises, or that Oracle's Exadata does for relational DBs. It lets you start
on a shared server and grow seamlessly to the point where you're running
across multiple servers. But it's still early days for that tech (after all,
lots of people still think it's impossible, despite the fact that Oracle is
happily selling it!), and so we're not opening it up publicly yet, whereas the
standard hosted MySQL is publicly available.

~~~
moe
_The new technology does scaling across machines in the same way that
Cassandra promises, or that Oracle's Exadata does for relational DBs._

Care to elaborate a little bit more on your approach?

The claim you make (linear horizontal scalability) implies you have either
created your own RDBMS or patched an existing one in a _really_ exciting way.

I'm curious about how you managed in such a short time-frame what so far only
Oracle can offer (with 20 years of experience under their belt)?

I'm also curious about when your new database offering will be available for
public testing? Because currently the exorbitant claims make this smell a lot
like vaporware...

~~~
justinsb
We're looking for early-adopter customers with hair-on-fire problems with
their database that would also make good reference customers in a few months.
If you fit the bill, please contact me!

Otherwise, sorry, but you'll just have to wait :-)

To put the effort into perspective: it's not a new DB from scratch - that
would be a Herculean effort - just the lowest levels. It's not based on
MySQL/Drizzle; it's based on a different open-source DB with a friendlier
license and more hackable code. It's a new technique for scaling cross
machines, which is refreshingly simple, which is why you're not seeing too
many details :-) It's not just Oracle that knows how to scale databases, BTW;
Greenplum, Netezza, Vertica etc. have all built distributed databases
(admittedly for OLAP workloads) in relatively short timeframes.

~~~
codeslinger
I'm quite interested in how you're handling joins in a manner that is
scalable. What tradeoffs did you have to make? Do you support full join
semantics, or are some types of joins not supported? Its not that much of a
stretch to scale out an RDBMS not using any joins, but I can't bring myself to
really take your claims seriously until you elaborate on how you handle joins
(if at all).

~~~
rbranson
I think that FathomDB's product is scalable to the degree that the existing
database vendors are able to scale using multi-node clusters. Oracle's RAC
tops out at 100 nodes. All of the other vendors (Teradata, Greenplum, etc)
have some niche or secret sauce that allows them to scale vertical markets
like OLAP. I don't see them scaling to Google or Facebook scale with an ACID-
compliant relational database. They aren't really bringing anything new to the
table other than a cheaper price point.

You bring up a very good point with the join issue. In addition, I'm wondering
how they're going to scale writes. This is the bottleneck that eventually
chokes RAC out, as it has to hold true to ACID principles. Even without ACID
guarantees, eventually the system would spiral down into a chaotic quagmire of
inconsistency as it scales. You've essentially got to have an locking and/or
arbitration system that can quickly return an absolutely-positively we-wrote-
this-in-a-consistent-way after a write query is executed. Nevermind trying to
execute MVCC transactions over a cluster of nodes.

SQL can be scaled, ACID can't. Without ACID, the warm fuzzy feeling SQL gives
you isn't quite as warm and fuzzy.

------
clintavo
A tad off topic but...Does anyone actually know _how_ to get a FathomDB
account? I thought, as a long time Rackspace customer and the promo code
"RACK" they promote on their site, it would be relatively simple, I've tried
to register three times over the past few months, but I never hear from a peep
from FathomDB........

------
justinsb
Not sure whether it's the end of NoSQL, but we're certainly working to deliver
the promises of NoSQL while retaining the power of SQL.

To borrow from a much better man than I... This is not the end. It is not even
the beginning of the end. But it is, perhaps, the end of the beginning.

------
rbanffy
You know... Most of these nosql databases feel oddly nostalgic for me. It's
the kind of stuff mainframe folks were using in the 70's with some clustering
thrown in.

~~~
rbranson
What? IMS?

~~~
davidmathers
_What? IMS?_

That would be XML. I'm guessing: <http://en.wikipedia.org/wiki/CODASYL>

------
davidmathers
Is this drizzle?

~~~
justinsb
No - the scalable database is our own technology. We looked at writing a
MySQL/Drizzle storage engine, but there's a lot of pieces that are different
that stretch up through the stack, and I was a bit concerned over the MySQL
licensing now that it's part of Oracle.

In future we could build a MySQL/Drizzle storage engine, and we'll probably
run Drizzle-as-a-service once we start to see customer demand.

