
MongoDB and Realm make it easy to work with data, together – MongoDB - metheus
https://www.mongodb.com/blog/post/mongodb-and-realm-make-it-easy-to-work-with-data-together
======
slau
As an ex-Realmer, I'd like to congratulate everyone involved in Realm in the
past few years. Great engineering talent, extremely dedicated and motivated
people. This is the company that has helped me understand the value of good
marketing. I've also been inspired by what a good product owner can do.

Kudos to everyone. It's been a long road, and I'm glad that the codebase that
initially started as a text editor finally found a new home.

------
jinjin2
This is awesome! Two of my favorite products joining together. Now I just hope
they keep their promise and keep investing in Realm.

------
lkrubner
At some point I'll write up my notes about how I've been using MongoDB. I've
basically given up on SQL databases. Every architecture, for every scale, from
small to Enterprise, is really better handled by MongoDB, sometimes in
conjunction with Kafka (since any sufficiently large operation is
automatically heterogeneous and polyglot, with different database
technologies).

When you're a small startup and you're just starting up, you can create a
single MongoDB instance (ignore everything about you've heard about Web Scale)
and stuff data into it as needed, without thinking much about the structure.
You can add in contracts on your database functions, which slowly specify the
contract, as you learn more about what your project is really about. To get a
sense of that style of development, please see what I wrote in "How ignorant
am I, and how do I formally specify that in my code?"

[http://www.smashcompany.com/technology/how-ignorant-am-i-
and...](http://www.smashcompany.com/technology/how-ignorant-am-i-and-how-do-i-
specify-that)

MongoDB is great for ETL. You can pull JSON from 3rd party APIs and store it
in its original form, then later transform it into the different forms you
need.

In large Enterprises, you will inevitably be trying to get multiple services
and databases to work together. The old style for dealing with this was the
ESB (Enterprise Service Bus) or SOA (Service Oriented Architecture) but in
recent years most of the big companies I've worked with have moved toward
something like a unified log, as Jay Kreps wrote about in "The Log: What every
software engineer should know about real-time data's unifying abstraction". If
you haven't read that yet, go read it now:

[https://engineering.linkedin.com/distributed-systems/log-
wha...](https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

In this context, MongoDB can offer a flexible cache for the most recent
snapshot your service has built, based off of what it read from Kafka.

Some people are sabotaged by MongoDB, and they start treating canonical data
as a cache. Obviously that leads to disaster. I believe this is what happened
to Sarah Mei. Her experiences caused her to write "Why You Should Never Use
MongoDB"

[http://www.sarahmei.com/blog/2013/11/11/why-you-should-
never...](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-
mongodb/)

The one rule I would suggest is that you always need to be clear, in your own
head, which collections are canonical and which are cache. When I talk to
teams who are new to this, I tell them to use a naming convention, such as
adding a "c_" to the start of every collection that is canonical. All other
collections can be assumed to be caches. And the great thing is, it is very
cheap to create caches. You can have 20 caches for the same data, in slightly
different formats. You can have one cache where the JSON is optimized to what
the Web front-end needs, and another cache where the JSON is optimized for the
mobile app, and another cache where the JSON is optimized for an API for
external partners. Just don't fall into the trap that Sarah Mei mentions,
where you treat everything as a cache. You need to be clear in your head which
data is canonical. If you are using Kafka the way Jay Kreps mentions, then the
data in Kafka is canonical and everything in MongoDB is a cache. But at
smaller operations, I've used MongoDB to hold both the canonical data and the
caches, in different collections.

~~~
tmountain
_When you 're a small startup and you're just starting up, you can create a
single MongoDB instance (ignore everything about you've heard about Web Scale)
and stuff data into it as needed, without thinking much about the structure.
You can add in contracts on your database functions, which slowly specify the
contract, as you learn more about what your project is really about. To get a
sense of that style of development, please see what I wrote in "How ignorant
am I, and how do I formally specify that in my code?"_

This strategy seems like it forgoes what I consider an important step in any
project, which is, thinking critically about your data model and getting that
right before you start building code on top of that structural foundation.

I could see doing what you're describing to build a prototype, which I would
then extrapolate my learnings from, and subsequently toss out, but this seems
like a dangerous way to get started with something that will end up in
production (and potentially maintained for years to come), as it glosses over
the importance of coming up with a really coherent data model, and let's face
it, data is the heart and soul of most projects.

Am I wrong?

~~~
lkrubner
" _I could see doing what you 're describing to build a prototype_"

It's very much for prototypes, and especially greenfield projects. If I was,
instead, doing something like building a new service, inside an Enterprise
that was already using something like the unified log architecture that Jay
Kreps has described, then I would certainly think hard about what the schema
would be for the particular service I was building -- after all, in such
situations you're never going to pull all of the data out of Kafka, so you
automatically have to figure out what part of the data you want. LinkedIn
currently stores 900 terabytes of data in its Kafka instance, and I'm unlikely
to write a new service that actually needs all of the 900 terabytes of data.
So merely by thinking about the question "What of this data do I need?" I'm
already implicitly thinking about a schema.

Having said all of that, how often have you written a service where you got
the schema 100% correct on your first try, and no further changes to the
schema were needed. Possibly you are smarter than I am, but I personally have
never done that. All of my first attempts need later adjustment.

------
zubairq
Can someone confirm that Realm raised USD 40 M and was acquired for USD 39 M?

~~~
ljhaywar
TechCrunch is reporting those stats:
[https://techcrunch.com/2019/04/24/mongodb-to-acquire-open-
so...](https://techcrunch.com/2019/04/24/mongodb-to-acquire-open-source-
mobile-database-realm-startup-that-raised-40m/)

------
VWWHFSfQ
I'm sure mongodb is fine now but it's too late. I don't care. I'll never use
it again or recommend it to anyone

