A similar process co-located with a Fauna region normally can also perform simple reads in small numbers of milliseconds.
Similarly, a browser client querying a lambda function multiple times from the other side of the world will also be quite slow, even if the lambda "thinks" its queries are fast because its database is right next to it.
It is not completely clear to me what else is going wrong, but the basic premise of the benchmark is invalid, and there are other errors in regard to Fauna. For example, index serializability only affects write throughput; it has nothing to do with read latency. The status page reports write latency, not read latency, etc.
For a more even-handed comparison of Fauna to DynamoDB, see this blog post: https://fauna.com/blog/comparing-fauna-and-dynamodb-pricing-...
Fauna is a temporal database, not a time series database. The code in the test that updates the score on every post after every read is creating new historical events every time it does that. These have to be read and skipped over during the read query which will continually increase latency proportional to the number of updates that have occurred.
By default, Fauna retains this data and makes it queryable (with transactional guarantees) for 30 days, unlike DynamoDB or Redis. Reducing the retention period would help a bit, but event garbage collection is not immediate so there will still be differences for heavily churned documents. Normally, having a few updates to a document or an index has no noticeable impact but in this case it appears to be swamping the other factors in the latency profile.
It is possible to manually remove the previous events in the update query; doing that should reduce the latency. Nevertheless, Fauna is not a time series database so this is a bit of an anti-pattern.
See the Fauna endpoint: https://71q1jyiise.execute-api.us-west-1.amazonaws.com/dev/f...
See the code: https://github.com/upstash/latency-comparison/blob/master/ne...
If you mean deleting Faunadb internal events, I do not know how to do. Can you guide me?
I believe you confused it with a Dynamo DB described in a paper Amazon published long ago. AWS DynamoDB has nothing to do with a eventually consistent design the paper describes. It was purely marketing gimmick to call the new AWS Service that.
I'm currently reading The DynamoDB Book, and even it's author acknowledges that DynamoDB is serverless more as a byproduct and shouldn't be used for that reason alone
But as a product developer all these highly integrated AWS services that can be provisioned with CloudFormation are pretty convincing.
What's Fauna's IaC story?
Upstash is not non-durable. It is based on multitier storage (memory + EBS) implementing Redis API. In a few weeks, I will add upstash premium which replicates data to multiple zones, to the benchmark app.
In the blog post, I mentioned the qualities where Fauna is stronger than the others: https://blog.upstash.com/latency-comparison#why-is-faunadb-s...
In my code, Upstash database was eventually consistent. Similarly the index in the FaunaDB was not serialized.
But both of those should not affect the latency numbers in the histogram because those numbers are all read latency.
An even handed comparison from the authors of FaunaDB?
This part isn’t correct. The two follower replicas can serve a consistent read even if the leader is behind / down. And there is no guarantee the primary even has the data persisted to disk when the subsequent read call is made.
As a general courtesy, when you see a dramatic difference, it’s better to involve parties or ask for review before publishing your results.
- FaunaDB is providing strong consistency and isolation; Upstash and DynamoDB are providing eventual consistency.
- FaunaDB is replicating the data worldwide and offering similar access everywhere; Upstash and DynamoDB are deliberately configured in the same AWS region as the lambda function.
The Dynamo transactions do increase latency, but not to the same degree as Fauna. However, they are not achieving the same level of transactional correctness either, or really any correctness at all in a multi-region context.
I guess if your partition time interval was known to be very short, like a flaky ISDN link, it could make sense for some use cases using retries, but then you should just get a better link.
CockroachDB discusses a multi-city vehicle sharing use case where multi-region transactions could be worth consideration, but I'm skeptical:
(Developers and students get all excited about distributed systems, CAP, etc. but as a DBA, network partitions are largely not solvable from both technical and business standpoints. The solutions that do work include using vector clocks, or investing in a very reliable network, which is what Google is doing with dedicated fiber.)
I was just interested if a ConsistentRead would change the latency.
Given data is always replicated to at least 2 of 3 storage nodes before ACK’ing, you can always just read from 2 different replicas and be sure you have the latest data.
(My understanding is shallow compared to real experts, but even so I know this is a deep topic and this is only one example of the type of thing you'd want to consider when figuring out whether to take this comparison at face value.)
AWS Aurora Serverless v2 is close to what I'm looking for but think it's held back by trying to make non-serverless technology serverless.
Fauna is close but I don't really need global consistency and don't want to pay the 150ms latency price for it: https://status.fauna.com/
Unless you meant I can limit "how global" my data is and that would improve write speeds?
What's slowing Fauna here are the global writes and the non-idiomatic FQL queries.
Right now the code is doing a bunch of separate queries, but in idiomatic FQL this would be done in a single transaction.
I'm going to do a PR to update the FQL code if the author accepts it.
I have to say the latencies you're getting are much higher that I've experienced with Fauna. Obviously it's expected for a KV database to be faster, but I'd be surprised to get more than 100ms of 50th percentile.
I'd love to do the same test from Cloudflare Workers instead of AWS Lambda. Could you please DM on Twitter to discuss this?
I just saw this comment by Evan from Fauna which explains why the latency could be so high:
Curious what Firebase latency would be in comparison, when called from the same GCP data center or AWS.
Fauna claims to route your request to the nearest data center so I'm interested in validating this. Seeing 400ms latency where I'd expect <50ms is important to me, especially on Lambda where you are billed waiting for the response.
However, the product in question (Upstash), looks very interesting: serverless storage with a pay as you go model with a maximum price cap is an excelent idea.
I would love to have more options in this space, especially in GCP to use together with Cloud Run.