
We built a distributed DB capable of running across 100s of global locations - ctesh
https://www.macrometa.co/blog/why-global-edge-fabric
======
ignoramous
Great work. Building a distributed database isn't easy at all and takes
considerable effort. I'd like to see more on failure scenarios. Here's a few
preliminary questions I have (admittedly, I haven't read the entire post,
apologies if my queries are answered already):

What is the SLA for durability and availability of the db?

\- how are scenarios like edge locations going down for multiple minutes
handled? Are the writes lost?

\- how many replicas of data at a single edge location?

\- Are there limits on the table size (how does one enforce this in a multi-
master mode)?

\- what's the SLA on replication time to say, at least 50% edge locations, and
then to 100%?

Are DDL operations allowed? If so, how are the conflicts handled?

If data is stored in LSMs underneath, how are geo location queries handled? Is
there an index/materialised view? If so, how long does that take to generate?

What's the TPS/QPS supported for the KV interface?

\- Is there a scenario where a set of operations at X TPS across edge
locations might then take a long time to converge globally?

It'd be great if you can list down scenarios where the db can lose write to
'true conflicts' in a FAQ somewhere.

\- For instance, what happens in a create-delete-create scenario where a table
is created at one edge location, deleted at another, created at a third edge
location; all the while when there's writes and reads happening globally?

Thanks.

~~~
ctesh
Thanks for the truly thoughtful questions - I will put together a short faq
and post a link and a a tl/dr answer here.

We also have a more technical internals paper almost completed and will
publish next week.

Thanks again! This is such a helpful comment.

------
CharlesW
Shouldn't this be a "Show HN", since the submitter is the company
CEO/President?

~~~
durga_mm
Hi CharlesW - Thanks. Probably we will publish another post with details on
how people can try out the global edge fabric. That probably would be good fit
for "Show HN"?

Regards

------
ctesh
The TL/DR on how:

1\. Causal consistency with the ability to create collections (tables) with
strict consistency - uses vector clocks and not wall clock timestamps for
ordering DB operations

2\. Streams to propagate DB changes from one geo location (node) to another
with guaranteed ordering and reliable delivery

3\. Generalized Operational CRDTs to make all DB operations Associative,
Commutative, Idempotent and Distributed (the new ACID)

4\. One data model - multiple interfaces - store and query your data as
Key/Value DB, DocumentDB, Graph DB, StreamDB, Geo location DB (query by
lat/long/altitude)

5\. Automatic GraphQL and REST api generation for your schema

6\. Multi master - query and update any of our 25 global locations and get one
consistent view of data globally

7\. Access via CLI, REST, GraphQL (built in server), client drivers for
python, Javascript today (Java, Go, Ruby in the works)

~~~
sagichmal
This is buzzword salad, not a coherent technical summary.

~~~
ctesh
ok Take 2:

1\. We use causal consistency using vector clocks to establish the causal
order of operations instead of using Time stamping and Network Time Protocol
which are unreliable over WAN distances

2\. We share Database state changes/updates between each Point of Presence
(POP) to the others in the cluster using Asynchronous Streams that use a pull
model instead of a push to maintain ordering of messages over the network.
This has the added benefit of allowing us to know exactly how current each PoP
is with changes.

3\. We dont use Quorum for inter PoP to establish consistency - there's a
white paper on our site that shows you the how and why of coordination free
replication. Gist is we have developed a generalized operational CRDT model
that allows us to get Associtive, Commutative, Idempotent convergence of
changes between PoPs in the cluster without needing quorum

4\. the DB is a multi master - you can access and change data at any PoP. Its
also multi model and lets you query your data across multiple interfaces such
as key/value, documentDB (JSON), Graph etc

5\. the DB automatically creates GraphQL and REST APIs for your schema taking
away the complexity and effort of a lot of boilerplate development on the
backend

6\. The DB is available as a managed service in 25 PoPs today - you can
request an account and we will give you one. WE will be generally available
with a free tier in April and you can signup online and self administer your
cluster

7\. You can access the DB via a CLI, GUI, or write code that accesses it via
REST, GrapHQL or using native language drivers in JavaScript, Python today (we
are working on other languages with a view on releasing them over the next few
months)

hope this helps...

~~~
the_arun
REST interface is good, but could we add business valdation before mutating
data? Because bulk of the work backend does is this business logic. How could
we do this?

~~~
ctesh
Hello and greetings the_arun.

The short answer is yes. Macrometa integrates a function as service (FaaS)
which can be hooked into the database and be triggered by events on a stream
or a data collection.

So you can for example do the following: Expose a RESTful or GraphAPI
(included deep nested queries in graphQL) for one or more collections - when
mutating, attach a validation function to the collection as a trigger that is
called before the mutation is applied to the DB. You can also have a trigger
that calls a function after the mutation is complete.

One can also do this on streams with functions being triggered to a specific
topic.

Lastly - there is full support for running containers as well and you can use
the endpoints exposed by the container as a trigger.

Oh and one more thing - the dB is real time. It will notify clients of updates
to collections automatically (like firebase).

Hope this helps..

~~~
the_arun
Thanks for the details. This sounds very similar to Amazon's DynamoDB? Are
there features to make macrometa better than DynamoDB?

~~~
ctesh
Shares some feature overlap with dynamodb (key/value and document dB
interfaces). Where we differentiate - global replications across all our 25
global POPs (50 by end of 2019). Integrated graphQL generator (rest as well),
real-time: dB will notify clients of changes to data I.e. no need to poll,
rightly integrated with streams and pub/sub, run functions and containers as
triggers or stored procedures to the DB, geo query: query by lat/long/height,
elastic search integrated (July 2019). There’s more - will announce in April

------
willvarfar
This is actually exciting. Most sales pitches for secret sauce services are
much flimsier and don't ooze the actual technical possibilities and promises
that this one does.

So, does it solve the double-spend problem or not? The prose is a bit
ambiguous on that.

I don't want to work that out for myself. I want to be told one way or
another, and perhaps given some simple examples like "users x and y both want
to do z at roughly the same time; depending on several factors like the exact
timings and network latencies, there are several outcomes. They are..." etc.

And of course we want to know what kind of realistic throughput and latency it
gets, how easy it is to add and remove POPs (and what data gets lost when you
do), etc.

Finally, it really needs to commission and publish a Jepsen analysis. Hasn't
every database vendor learned how critical that is to winning the programming
public's heart?

This seems to be a startup who is going to sell it as a service, and I hope
them all the best. The DB world is full of better tech that didn't make the
dent it technically deserved (hyperdex, tokudb, perhaps rethink etc) so a new
startup has to be both vocal and embrace Jepsen from the beginning. I think HN
is ripe for a new darling and whichever company becomes that darling might
capture a lot of the ill-informed web-scale-before-we-have-a-customer market
;)

~~~
ctesh
Thanks for pointing out the ambiguous bits - I’ll clean it up in the next few
days.

Yes it does solve the double spend - it let’s you mark (check box) a
collection with a SPOT property (single point of truth) which restricts the
number of edge region that collection is replicated across. Additionally the
spot collection is replicated across two separate data centers (and separate
availability zones in those data centers) to provide high availability. The
developer doesn’t need to deal with the complexity of where and how this is
done - they continue to access the collection like a regular one and the DB
will redirect the queries (and do things like joins) between the regular and
spot collection.

I’m writing a faq and will include your questions there.

Jepsen is in the plans - we hope to get a report out in the summer.

------
raghava
This looks pretty neat!

Considering that there's a FaaS layer too, which could be used along with the
DB layer offering, what degree of vendor-lockin are users getting into?

Also, this product seems only targeted at those direly needing edge computing,
thus missing out on many who might be using Firebase and Serverless
(Lambda/Functions etc) for the works. Is that intentional?

~~~
ctesh
You bring up a great point. One absolutely can use us as a firebase
alternative. We have a few customers using us with lambda and container
services directly on their favorite cloud provider in just 2 locations for
high availability of their apps.

The edge computing position is intentional in that we think the edge is a way
to build globally distributed apps and apis. Our goal is to make as simple to
build and run an app in 25 regions as it is to build an app against say
firebase or dynamodb.

------
zzzcpan
So, does it actually have CRDT operations? Doesn't seem like any of the
interfaces expose them. Can you use a convergent counter, for example?

Also product page [1] has some weird claims. There is no such thing as "Strong
Session Consistency", this is just another name for weak consistency
guarantees. And Strong Eventual Consistency, which is a thing, requires using
mergeable conflict-free operations that you don't seem to expose? The "smart
operation ordering using intent prioritization" is also neither strong
eventual consistency nor strong consistency.

[1] [https://www.macrometa.co/product](https://www.macrometa.co/product)

~~~
durga_mm
Hi zzzcpan - Yes. The global edge fabric internally uses CRDT operations. It
is a conscious choice to not expose the CRDT data types. The idea being the
developer work with APIs & Query layer as they do with other databases and
underneath the system take care of translating to necessary CRDT operations.
Convergent counter is in the works and will be available shortly.

The system has strong consistency within a region (aka datacenter) and strong
eventual consistency across data centers. Answering your question, yes we do
converge (merge) the changes across regions using conflict-free operations.

Intent prioritization rules comes into picture only when it is impossible for
the system to determine the intent of the developer when two conflicting
changes occur across regions.

I hope the above helps.

Regards

------
tckr
Where do I create an account?

