
Moving From Oracle to CouchDB: Data Management at CERN - mbroberg
https://cloudant.com/blog/moving-from-oracle-to-couchdb-data-management-at-cern/
======
mje__
"In the end, Zdenek had a single-server CouchDB installation on the order of
hundreds of megabytes of metadata stored."

I think the title is a little inflammatory, no?

~~~
jarek
"[I]t came down to storing lots of binary attachments that could be easily
shared with other applications in the CMS. That was the main reason for the
move."

Yeah this is a non-story. A better headline would be "Why a small application
dropped Oracle for CouchDB". Compared with CERN experiments that generate
petabytes per year, this is like publicizing Google's technology choice for
its employee-list database. Not that ATLAS data goes into an Oracle database,
but...

Where are the zealous HN title mods when you need them, eh?

------
pjmlp
Misleading title. It is only about a single person that used to work at CERN.

CERN has tons of developer groups, each with his own set of technologies.

~~~
Create
[http://home.web.cern.ch/about/updates/2013/02/cern-and-
oracl...](http://home.web.cern.ch/about/updates/2013/02/cern-and-oracle-
celebrate-30-years-collaboration)

[http://ais.web.cern.ch/ais/presentations/eoug99/Implementing...](http://ais.web.cern.ch/ais/presentations/eoug99/ImplementingWorkflow.pdf)

google:// Oracle CERN Technical training: available places in forthcoming
courses

~~~
pjmlp
I used to work at CERN, what are you intending to prove with posting links
about Oracle at CERN without explaining them?

~~~
Create
see comment about Oracle.

[https://www.youtube.com/watch?v=bSEv4vF5CTs](https://www.youtube.com/watch?v=bSEv4vF5CTs)

------
Create
Cost has nothing to do with it: they get it "free", as in US contribution in
kind. And obviously, in the end of the day, Oracle profits much more.

[http://openlab.web.cern.ch/about/partners/oracle](http://openlab.web.cern.ch/about/partners/oracle)

------
fs111
hundreds of MB? What? Use sqlite and be done with it.

------
Groxx
Important note that I'm not seeing in other comments here: they started with
_both_ Oracle and CouchDB. This was basically a cleanup in favor of CouchDB,
since running two different kinds of DBs is best avoided if possible for
simplicity reasons.

------
benjarrell
Surprised it is not everyone else's reason: Cost.

Regarding database dumps, I'm curious how/why data pump[1] is not what he
needed.

1: Overview of Oracle Data Pump:
[http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_ove...](http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_overview.htm)

~~~
mattzito
Yeah, I'm not going to defend Oracle (or their pricing model), but in this
case it sounds like the person who did the migration was just not an Oracle
expert. Data pump is great for database dumps (and very fast), and SecureFiles
is basically tailor-made for storing large amounts of binary data. There's a
feature for schema versioning that would have allowed them to seamlessly run
multiple versions of the same schema exposed to different sets of clients.

Oracle is vastly more powerful than pretty much any of the open source
technologies, but it's eventually going to go away because of scenarios like
the one described in the article. Tech folks are starting their careers on OSS
technologies, learning on them, getting comfortable, and then when they stat
working on something like Oracle, they aren't familiar with all of the
capabilities under the covers, and migrate off of it to the more familiar
platform.

And the cost, of course.

~~~
rdtsc
> Oracle is vastly more powerful than pretty much any of the open source
> technologies

Does it have multi-master replication like CouchDB? What about append-only
storage of data (so can do live backup snapshots)? Or a REST-ful interface (so
can directly use it from web clients, via a simple proxy)? A web based data
browser and viewer like Futon?

Becuase those are very important features I like in Couch. Now I am not saying
it knowing Oracle doesn't have those feature, as I don't know Oracle much
(maye it does). But it seems to me that your claim how Oracle is a strict
super-set of all the other database technologies out there is a bit
hyperbolic.

~~~
mattzito
First off, I'm not saying it's _better_. It's heavy, complicated, incredibly
expensive, the really cool features are even more expensive, but _feature-
wise_ , it's an amazing example of what you can do inside of a relational
database engine if you have years and years and billions of dollars to spend
on R&D.

To answer your questions:

> Does it have multi-master replication like CouchDB?

Yes! It has several kinds of replication. Active/passive, active/passive with
the standby available for reading, multi-master, cascading replication
(master->standby->secondary standby). You can also do combinations of the
above, like master<->master->standby<-master->standby->standby (if you want to
get really crazy).

But it's a lot more than that. Synchronous? Sure. Asynchronous? Sure. Semi-
synchronous? Yes, I can say that I want to allow the standby database to get
up to X minutes out of sync with the primary before I switch over to
synchronous and force clients to block until I'm caught up.

Hey, what about file-based replication for items that are not even technically
managed by Oracle or inside the database? No problem.

What about failover? Well, Oracle can not only have its clients detect that a
database has failed and handle the failover automatically, but you can
actually have it fail over and automatically spin up another standby so you
don't have a SPOF.

I could go on for pages just on replication scenarios that Oracle supports.

> What about append-only storage of data (so can do live backup snapshots)?

Yes, absolutely, but to be honest, you don't need append-only storage to do
live backup snapshots in Oracle. You can do point-in-time consistent backups
while the database is serving transactions (it works under the covers
similarly to append-only, but the nuances are a little different). Not only
can you run backups this way, but you can actually request that your session
"see" a view of the database as it was an hour or a day ago. You can also
simply say, "return the database to the state it was as of X transaction or at
Y time", and the entire database can revert back to that state.

> Or a REST-ful interface (so can directly use it from web clients, via a
> simple proxy)? A web based data browser and viewer like Futon?

Yes, Oracle APEX, which is basically an Oracle front-end application server,
but lighter than a "real" J2EE stack, can do both of these things. If you
want, there's a fairly simple markup language, similar to ERB, that you can
use to write simple CRUD rails-like applications on top of APEX, but you don't
have to if you don't want to.

Again, I'm not saying that Oracle is _better_ , because all of these features
come at a complexity cost that is massive (and a technical debt that requires
highly skilled specialty technologists). If I were building an application
today, I would not base it on Oracle, because for what 90% of the world needs,
there are open source databases or data stores that fill the need just fine.
But you gotta hand it to Oracle for the sheer amount of technology they've
crammed into an RDBMS.

~~~
rdtsc
Thanks for responding. I had no idea about these features. Well heck you can
say it is better. That would be alright.

I have been living in the open source world and just never had to deal with
commercial DBs at this point.

So I agree with your point. A lot of these features are there but up coming
developers might not know about it and will always pick open source choices.

I guess PostgreSQL is the direct competitor at the moment and I have only
heard good things about, and Oracle is hated by large numbers of developers
(for reasons not necessarily related to technical features).

------
knodi
I wish they would have asked if there was anything about couchdb they don't
like or have concern about.

