
Azure Cosmos DB, a globally distributed database - andysinclair
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction
======
ahelwer
Designed with TLA+! :D Small interview with Leslie Lamport:

[https://techcrunch.com/2017/05/10/with-cosmos-db-
microsoft-w...](https://techcrunch.com/2017/05/10/with-cosmos-db-microsoft-
wants-to-build-one-database-to-rule-them-all/)

Hope Cosmos team releases a whitepaper on their experiences with the language.
I'd heard snatches of gossip here and there that TLA+ was used inside Cosmos,
but no concrete details.

edit: apparently there's also a video of Lamport talking about this
[https://www.youtube.com/watch?v=L_PPKyAsR3w](https://www.youtube.com/watch?v=L_PPKyAsR3w)

~~~
dharmashuklaMS
Cosmos DB codebase consists of multi-million lines of C++ code. It is fully
asynchronous and like all large scale (stateful) distributed systems, it is
also extremely complex. The database engine of Cosmos DB (including the b-tree
and the log structured storage engine) is fully latch free; the engine,
resource governance subsystem, global distribution infrastructure, partition
management etc -- are all deeply integrated. Each of these subsystems have
extremely complex state machines which are hard to describe with the required
degree of _precision_ using the English language. Any correctness bug that
gets introduced because of lack of precision, can potentially lead to data
loss, corruption, partial or complete failures of the entire service. This is
where TLA+ comes in.

Background: Dr. Leslie Lamport's work has been a constant source of
inspiration for the Cosmos DB team. A few of the engineers on the team had
learnt TLA+ initially on their own and started seeing its benefits.
Subsequently, other members of the team started applying it as well. Leslie
had also taught a fantastic class on TLA+ (across Microsoft), which engineers
on the Cosmos DB team attended. It was a wonderful, once in a lifetime
opportunity for the team to learn TLA+ from Leslie.

To be clear, Leslie personally didn't write any of the TLA+ specs for Cosmos
DB. It was Cosmos DB engineers who wrote the TLA+ specs to specify & verify
the design (incl. consistency models). The net result is that TLA+ made Cosmos
DB a more robust system, which is crucial to offer strict and comprehensive
SLAs encompassing availability, consistency, throughput and latency at the
99th percentile. Further, writing TLA+ specs for the five consistency models
which we have exposed (as well as many that we have experimented, internally)
enabled us to precisely define the semantics for each of the consistency
models. This in-turn enables developers building apps on top of Cosmos DB, to
rely on the well-defined semantics.

Hope this is helpful.

~~~
ahelwer
Thanks, Dharma! I probably actually TA'd the TLA+ class which your engineers
attended ;)

Still hope you write a whitepaper. The AWS paper is super valuable when making
the case for TLA+ use in industry.

------
dharmashuklaMS
Hi, This is Dharma from Azure Cosmos DB team. We are super excited to make the
service available today.We published the first of the series of technical blog
posts here -> [https://azure.microsoft.com/en-us/blog/a-technical-
overview-...](https://azure.microsoft.com/en-us/blog/a-technical-overview-of-
azure-cosmos-db/). Would love to answer any Cosmos DB questions.

~~~
daxfohl
Are graph ops/queries atomic? i.e. if you run a tree query on a tree-graph, at
the same time you're re-parenting a treenode, is there a chance that the node
could end up in the result tree twice or zero times?

Also, if they're atomic, are they optimistic or pessimistic transactions? Also
if they're atomic, does that mean queries done on the write server? (My
understanding is that readonly transactions on read-replicas are not
supported, or at least they weren't under DocDB).

Any lay-programmer insight into what's going on under the hood, or at least
performance/atomicity implications, without giving up too much proprietary
info, would be appreciated.

~~~
dharmashuklaMS
Read only transactions existed in DocumentDB too. DocumentDB was a strict
subset of the capabilities that Cosmos DB provides/existed underneath. Hence,
read only transactions exist in Cosmos DB too. The first technical overview
blog post attempted to provide you a high level overview. We are hoping to
cover specific areas in either future blog posts, conference publications.

------
jasondc
> Latency: 99.99% of <10 ms latencies at the 99th percentile

Impressive SLA to guarantee, I'm curious if this will hold up in all random
customer workloads that are coming, e.g. updating a lot of fields in a large
document (or just a very large insert).

~~~
yazaddaruvala
"For a typical 1-KB item, Cosmos DB guarantees end-to-end latency of reads
under 10 ms and indexed writes under 15 ms at the 99th percentile, within the
same Azure region."

i.e. 1 KB item; Same Azure region;

This now seems more plausible.

One thing I'm curious about is if they tested load on a single partition, or
if they only tested latencies for random access.

~~~
aravindkr1
Cosmos DB guarantees both low latency and that you can achieve your
provisioned throughput with SLAs. Latency is guaranteed at p99 regardless of
storage size or number of partitions.

~~~
yazaddaruvala
Sorry, I'm not talking about "number of partitions" but if I "forcibly"[0] hit
the same partition, will the SLA hold?

[0] i.e. If I somehow pick keys which are on the same partition.

------
judah
This isn't a new database. This is a rebranding of the generically-named Azure
DocumentDB, plus some new features.

~~~
sealord
Not exactly. DocumentDB was primarily a document store. Cosmos allows you to
store graphs and KV pairs as well.

~~~
jchrisa
I was unable to glean from a quick read of the consistency documentation, does
CosmosDB support uniqueness and foreign key constraints?

~~~
sealord
Afraid not - there's no real concept of constraints with CosmosDB.

------
joshuatalb
This feels like MS' version of Google's Cloud Spanner that's GA in a few days.
Same kind of marketing too.

~~~
strmpnk
Having evaluated both, I'd say Cloud Spanner is quite a bit behind in some
regards. It's not that Spanner itself can't do something but that Cloud
Spanner hasn't productized some important features like multi-datacenter
failovers. It's certainly coming but CosmosDB (aka. DocumentDB) does a lot of
this today (and has been for awhile as this is not entirely new).

~~~
azurezyq
But from Cosmos DB's doc, cross-dc strong consistency seems not even
supported.

[https://docs.microsoft.com/en-
us/azure/documentdb/documentdb...](https://docs.microsoft.com/en-
us/azure/documentdb/documentdb-consistency-levels)

"Azure Cosmos DB accounts that are configured to use strong consistency cannot
associate more than one Azure region with their Azure Cosmos DB account."

~~~
strmpnk
I didn't claim cross-dc consistency. I said failover, which is shockingly hard
to make many competitors do. The key here is that only one DC can take writes
but the failover works transparently with your client (also with clear SLAs).

~~~
lern_too_spel
Why do you need failover if you have global multimaster like Spanner?

~~~
aliuy
An interesting thing to point out is the current beta for Cloud Spanner does
not have multi-region deployments... and instead, allows you to do single-
region deployment in your choice of 3 (not >30) regions:

[https://cloud.google.com/spanner/docs/instance-
configuration](https://cloud.google.com/spanner/docs/instance-configuration)

------
voellm
One of the best parts of the perf SLA is we did it with all Data Encrypted at
Rest. I'm biased. I lead security for CosmosDB.

~~~
sargun
Why is encryption at rest difficult? I presume it's all AES (hardware
accelerated), with some key derived at system boot time?

Are y'all doing encryption in the D/C, or just on the WAN? If you're doing it
on the WAN, any luck using MACSEC?

------
henriksen
Talk about the foundations of Cosmos DB:
[https://www.youtube.com/watch?v=Yfmw7swCtZs](https://www.youtube.com/watch?v=Yfmw7swCtZs)

------
rattray
I find this product slightly befuddling.

It seems like a "just throw all your data in this" kind of database, probably
intended for everything but core application relational data (so, good for
analytics, messaging, etc).

It sounds like the atom-record-sequence model at the heart of it is pretty
key, but there's not a lot in the article about what that is and how it works.
Is this a well-understood data structure used elsewhere?

The project seems very ambitious, and I could see it being used pretty heavily
at a lot of companies. Thoughts?

~~~
datasage
Its not uncommon to in sass stacks to have multiple data stores, each
specialized for different use cases. I do see it appealing to have one system
that can support each type of use case and able to tune that as needed.

------
vikestep
I can't seem to find the old DocumentDB prices any more, but it seems like
it's a lot cheaper now? Also, is User-Defined Performance something new? since
last time I looked into DocumentDB you had to pay the monthly RU fee per 10GB
disk.

One more thing, as someone who went from DocumentDb to Azure Storage (Tables)
back in April 2016 because of the higher price, slower queries, and
scalability problems, is there anything that may make Cosmos DB a better
option?

------
lobster_johnson
Is Cosmos related to the work on Corfu/CorfuDB [2] [1] in any way?

[1] [https://www.microsoft.com/en-
us/research/publication/corfu-a...](https://www.microsoft.com/en-
us/research/publication/corfu-a-shared-log-design-for-flash-clusters/)

[2] [https://github.com/CorfuDB/CorfuDB](https://github.com/CorfuDB/CorfuDB)

~~~
GovindMS
AS Dharma mentioned - Azure Cosmos DB has been many years in the making. Azure
Cosmos DB started as “Project Florence” in late 2010 to address developer the
pain-points faced by large scale applications inside Microsoft. Observing that
the challenges of building globally distributed apps are not a problem unique
to Microsoft, in 2015 we made the first generation of this technology
available to Azure developers in the form of DocumentDB.

The database engine design is inspired on LLAMA
[http://db.disi.unitn.eu/pages/VLDBProgram/pdf/research/p853-...](http://db.disi.unitn.eu/pages/VLDBProgram/pdf/research/p853-...),
Bwtree - >
[https://pdfs.semanticscholar.org/7655/9c6cc259c6ab5baf7bd19d...](https://pdfs.semanticscholar.org/7655/9c6cc259c6ab5baf7bd19d..).
and schema-agnostic indexing techniques ->
[http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf](http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf).
Please note that these papers are significantly behind the current state of
the implementation. The most crucial aspect that these papers dont cover is
the integration of the database engine with the larger distributed system
components of Cosmos DB including the resource governance, partition
management, and the implementation of replication protocol /consistency models
etc. Our goal is to publish all of the design specifications including TLA+
specs over time.

------
yunong
What's the CAP tradeoffs of Cosmos? It's not clear to me looking at the SLA
docs.

~~~
aliuy
CAP only talks about the uncommon unhappy path (given a network partition,
what is tradeoff between availability and consistency). PACELC theorem builds
on CAP to describe that even in the absence of a network partition, there is a
trade-off between latency and consistency (in other words, describing the
tradeoffs associated with BOTH the common happy path and uncommon unhappy
path).

Azure Cosmos DB offers 5 well-defined consistency models for you to choose
from, so that you can choose the right tradeoffs for a given application or
scenario. This way, you aren't stuck choosing between the hard extremes of
Strong and Eventual consistency.

See: [https://docs.microsoft.com/en-
us/azure/documentdb/documentdb...](https://docs.microsoft.com/en-
us/azure/documentdb/documentdb-consistency-levels)

------
willchen
Very interesting DB service. If I'm reading the docs right it sounds like you
can't do JOINs across documents?

[https://docs.microsoft.com/en-
us/azure/documentdb/documentdb...](https://docs.microsoft.com/en-
us/azure/documentdb/documentdb-sql-query#a-idadvancedaadvanced-database-
concepts-and-sql-queries)

~~~
extesy
Not using DocumentDB API but you can kinda do joins using graph traversal
Gremlin API.

------
rectalogic
I don't see CosmosDB on the HIPAA compliance list, anyone know if there are
plans to add it? [https://www.microsoft.com/en-
us/trustcenter/compliance/hipaa](https://www.microsoft.com/en-
us/trustcenter/compliance/hipaa)

~~~
GovindMS
[https://azure.microsoft.com/en-us/blog/azure-compliance-
docu...](https://azure.microsoft.com/en-us/blog/azure-compliance-documentdb-
certified-for-iso-27001-hipaa-and-the-eu-model-clauses/)

------
tracker1
From the intro page[1]... Many of the descriptions comparing to NoSQL are
wrong. There are plenty of NoSQL options that have similar features, though it
isn't universal, it can and often is there. Cassandra, for example, probably
does just as well in multi-zone/dc concurrency. Consistency options are also
similarly tunable. Cockroach 1.0 was announced earlier as well.

It's not that I don't appreciate the option. This seems far closer to what
DocumentDB should have been earlier on. Though tbh, I think Storage Tables are
already pretty useful.

[1] [https://docs.microsoft.com/en-us/azure/cosmos-
db/introductio...](https://docs.microsoft.com/en-us/azure/cosmos-
db/introduction)

~~~
smithkl42
Azure Table Storage? Uggh. Nasty. I've tried half a dozen times to use them,
and every time I've given up. Great for write-only data that you never want to
see again. Horrible for real-world querying.

~~~
dharmashuklaMS
Cosmos DB has native extensibility to support various APIs Azure Table Storage
"APIs" is one of them. If you are a Azure Table Storage customers, by virtue
of accessing Cosmos DB using the Table Storage API, you can now get all of the
capabilities of Cosmos DB incl. automatic indexing, global distribution etc.

Your Table Storage queries should be really fast with Cosmos DB since, Cosmos
DB supports efficient indexing and query.

------
VikingCoder
I'd like to see an in-depth comparison with Google Cloud Spanner.

~~~
afeezaziz
Is there a real life case study of people/startup using Cloud Spanner? I am
wondering what are the use cases of Spanner that cannot be fulfilled by other
Google Cloud Platform products.

~~~
random3
relational model globally distributed ACID transactions 99.999 availability

to name a few

~~~
arosenbaum
Please post link where they talk about transactions - I can't find it in the
docs anywhere. Availability SLA is clearly not 5 9's -
[https://azure.microsoft.com/en-
us/support/legal/sla/cosmos-d...](https://azure.microsoft.com/en-
us/support/legal/sla/cosmos-db/v1_0/)

~~~
nindalf
random3 is replying to someone asking about Google's Spanner, which does have
those transactions.

------
hoodoof
Does it provide search?

It's a strange thing, but almost all new database technologies seem to leave
search as an afterthought for some later day instead of starting on day one
with the assumption that "it's all about search".

A database system that doesn't support rich search capabilities is restricted
to very limited types of applications.

Often search is left unimplemented for years, or perhaps never implemented.

~~~
harigov
It does support search through SQL like queries. You can use existing
functions or implement new ones in Javascript. If you want free text search,
you would be disappointed, but otherwise it is pretty decent.

~~~
hoodoof
So half marks on search for Azure Cosmos DB.

If free text search is such a hard problem then you'd think that would be even
more reason to start with solving that toughest of all problems. If it's a
really hard problem then it will be even harder to retrofit later into some
system that is already architected and built.

A light spanking for the architect. No search would have been a thorough
spanking.

~~~
harigov
The good news is that you can configure all the data to be copied over to
Azure Search on a regular basis and then use search capabilities provided by
Azure Search. This may sound like a complicated thing to do but the
functionality provided by Azure makes it ridiculously easy to configure
something like this.

~~~
hoodoof
Search should be a first class feature of a database not an addon.
Every.Single. Time. I have ever needed to "just add on search" it has resulted
in deep pain.

Nothing you say will convince me that it's just a super easy dance in the
daisies to "just do X" or "just do Y" and suddenly its database search heaven.

Database application architects need to wise up that the world revolves around
search - built in, not add on.

------
sargun
Does CosmosDB have any relationship to Microsoft Cosmos
([http://web.stanford.edu/class/ee380/Abstracts/111026a-Hellan...](http://web.stanford.edu/class/ee380/Abstracts/111026a-Helland-
COSMOS.pdf))? Or is this another case of Dynamo / DynamoDB?

~~~
GovindMS
Hi ! Sargun, If you see Dharma's detailed reply which consists of papers and
history from 2010. This is a different effort with a different focus.

Azure Cosmos DB has been many years in the making. Azure Cosmos DB started as
“Project Florence” in late 2010 to address developer the pain-points faced by
large scale applications inside Microsoft. Observing that the challenges of
building globally distributed apps are not a problem unique to Microsoft, in
2015 we made the first generation of this technology available to Azure
developers in the form of DocumentDB. Since that time, we’ve been steadily
adding new capabilities both in the database engine as well as, larger
distributed system components. Azure Cosmos DB is the result. It is the next
big leap in globally distributed, at scale, cloud databases. As a part of this
release of Azure Cosmos DB, DocumentDB customers, with their data, are
automatically Azure Cosmos DB customers. They now have access to the new
system and capabilities offered by Azure Cosmos DB today as well as, as we
keep evolving the service.

The database engine design is inspired on LLAMA
[http://db.disi.unitn.eu/pages/VLDBProgram/pdf/research/p853-...](http://db.disi.unitn.eu/pages/VLDBProgram/pdf/research/p853-...),
Bwtree - >
[https://pdfs.semanticscholar.org/7655/9c6cc259c6ab5baf7bd19d...](https://pdfs.semanticscholar.org/7655/9c6cc259c6ab5baf7bd19d..).
and schema-agnostic indexing techniques ->
[http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf](http://www.vldb.org/pvldb/vol8/p1668-shukla.pdf).
Please note that these papers are significantly behind the current state of
the implementation. The most crucial aspect that these papers dont cover is
the integration of the database engine with the larger distributed system
components of Cosmos DB including the resource governance, partition
management, and the implementation of replication protocol /consistency models
etc. Our goal is to publish all of the design specifications including TLA+
specs over time.

------
anand_MSFT
See
[https://www.youtube.com/watch?v=Yfmw7swCtZs](https://www.youtube.com/watch?v=Yfmw7swCtZs)
by Turing Award Winner, Dr. Leslie Lamport, as he talks about Azure Cosmos DB

------
dagi3d
If I understood it correctly, they mentioned they offer horizontal scalibility
for their databases and I wonder how does it work for the graph data model

~~~
aravindkr1
This is covered in [https://docs.microsoft.com/azure/cosmos-db/gremlin-
support](https://docs.microsoft.com/azure/cosmos-db/gremlin-support). You can
specify a partition key for your graphs for scale out, and access vertices and
edges using the partition key + item key ("id") like g.V(['USA', 'Seattle']).

------
hoodoof
Databases is one area that Amazon is way behind Google and Microsoft. DynamoDB
is thoroughly awful so good to see some competition.

------
jupp0r
> something no other database service can offer.

Needs to be updated since Spanner was released

------
dmarlow
I want a competitor to Google Cloud SQL in Azure.

~~~
curiousDog
It's called Azure SQL DB (unless you want Mysql, then you'd want IaaS SQL DB)

~~~
dmarlow
With the exception that Azure SQL doesn't horizontally scale across
regions/datacenters like Google Cloud SQL does.

~~~
curiousDog
I don't think Google Cloud SQL does either in a fully managed way. Spanner
OTOH does that but it's NewSQL. Traditional sharding has been supported for a
while: [https://docs.microsoft.com/en-us/azure/sql-database/sql-
data...](https://docs.microsoft.com/en-us/azure/sql-database/sql-database-
elastic-scale-introduction). you can create databases where ever you want.

------
martinknafve
No backup/restore?

~~~
dharmashuklaMS
Cosmos DB does local persistence and replication both within a region and
across any number of regions.All data is durable and made highly available via
replication. You dont need to take backups or restore them for ensuring
durability or availability. See (1) [https://docs.microsoft.com/en-
us/azure/cosmos-db/introductio...](https://docs.microsoft.com/en-
us/azure/cosmos-db/introduction) and (2) [https://docs.microsoft.com/en-
us/azure/cosmos-db/introductio...](https://docs.microsoft.com/en-
us/azure/cosmos-db/introduction#global-distribution). That said, if you need
backup/restore for the cases where you accidentally delete your data and want
to resurrect it, Cosmos DB automatically takes backups for you periodically.

~~~
martinknafve
Actually, I wish Microsoft would stop referring to replication when asked
about backup. It's the modern way of saying "You probably don't need backup,
because you have RAID". There's a reason Azure SQL Database has self-service
point-in-time-recovery despite also having replication.

Do you see many use cases where there is no need for backup to protect against
accidental deletion, overwriting, deletion by application vulnerabilities and
so on?

As far as I understood the backup docs, I should contact Microsoft Support
within 8 hours if any of those things happens. Is that still correct? What if
we don't notice the issue until 7 days later?

~~~
Ecio78
Totally agree with you. Even if you have delayed copies, you still need proper
backup in place. They recently implemented long term retention for Azure SQL
backups in Azure: [https://azure.microsoft.com/en-us/blog/azure-sql-database-
no...](https://azure.microsoft.com/en-us/blog/azure-sql-database-now-
supporting-up-to-10-years-of-backup-retention-public-preview/) as it was only
allowing 'til 35 days. I would expect that something similar is provided also
for this type of DBs

~~~
martinknafve
One can hope. Blob and table storage still have no form of backup ~8 years
after introduction.

When I've asked they have been referring to replication and to call them if we
accidentally lose data. But we need to contact them within two hours otherwise
it's too late. And of course Azure Support never responds that quickly when I
submit a case to them.

------
Dimi9909
is this similar to DynamoDB in AWS?

