Ambry: LinkedIn’s Scalable Geo-Distributed Object Store

ambry · on April 9, 2023

I've worked on Ambry at LinkedIn for a little while, I'd be happy to answer any questions about architecture or things we've done since 2016. I wasn't part of the original team. One thing I would call attention to from the article:

> it’s key-value based approach to interacting with blobs doesn’t support file-system like capabilities, posing more of a burden on the user of the system (who must manage metadata and relationships between entities themselves).

I think this trade-off is one Ambry's strongest design decisions. By giving up key-value access, Ambry gets to dictate the location of an object at write time. When a partition fills up, set it to read-only and create new partitions on new hosts. By having Ambry generate the blob ID, the system can embed information (like the partition number) right in the ID. With a key-value approach you need to worry about balancing (and re-balancing) the key space over your topology. With dense storage nodes, re-balancing is VERY expensive.

Also--most applications don't actually need key-value access. For storing something like media (think: LinkedIn profile photo), you've already got a database row for the user profile; now one of those fields is a reference to your object store. It might as well be a storage-generated reference instead of one where the application tries to manage reference uniqueness and ends up using UUIDs or something similar anyway.

Apologies for the new account, I try to keep my main HN account semi-anonymous.

metadat · on April 9, 2023

Are you saying when one sends a new blob to Ambry, it returns the an arbitrary key that must be used to retrieve the data in the future?

I always imagined (perhaps naively and incorrectly) that systems like S3 must have a middleware service layer to translate user-defined keys into storage addressing.

ambry · on April 9, 2023

That's right: You upload an object, and the HTTP 200 response includes a header that is the object ID. Your imagination is almost certainly correct: creating a mapping from user-defined key to storage location is really best done by a metadata store so that data doesn't need to be reshuffled with topology changes.

A classic distributed key-value store (such as Dynamo[1]) spends a lot of its design on how to consistently map user-keys to storage locations. Most of these solutions are focused on performance (so avoiding a middle-layer lookup) and accept the tradeoff that data will need to be re-balanced when doing things like expanding the cluster. This isn't an acceptable tradeoff for object stores that have much higher data density (100+ TBs per NIC)

[1]: https://www.allthingsdistributed.com/files/amazon-dynamo-sos...

rad_gruchalski · on April 10, 2023

The readme at https://github.com/linkedin/ambry starts with:

> Ambry is a distributed object store that supports storage of trillions of small immutable objects (50K -100K) as well as billions of large objects.

1. What is considered to be the size of a “large object”? 2. Can Ambry handle large object multipart uploads or would one have to build this themselves by storing chunks as separate objects?

ambry · on April 10, 2023

Internally, Ambry stores large objects as a series of chunks. At LinkedIn we've found that 4mb makes a good chunk size. So (for example), your 40mb upload could be streamed directly where Ambry would handle chunking it, or you could chunk it yourself. Client-side chunking has the advantage of cleaner resumption from a broken connection. After all the chunks are uploaded, Ambry creates (or the client asks Ambry to create[1]) a special metadata blob which contains a listing of all the chunk blob IDs. Clients can use the blob ID of the special metadata blob as a single reference to the large (composite) object without needing to worry about the underlying chunking for GET and DELETE operations.

Depending on context, anything larger than the chunk size could be considered a large object, but using this method Ambry easily handles gigabyte- and larger-sized objects.

[1]: https://github.com/linkedin/ambry/blob/9b7a49ac79b1678fd7fd7...

rad_gruchalski · on April 10, 2023

Thank you. This answers multipart uploads question.

Another one, if I may. Can a large object be uploaded to Ambry using chunking? Like s3, gcs, and azure blob storage do? The upload client splits an object into 8KB chunks (configurable) and uploads them in parallel using byte ranges. Each chunk upload has its own retry policy. This prevents a situation where bad egress would interrupt large upload as a whole. Essentially how sftp put operation works. This is different from the multipart upload discussed earlier.

Can this be done with Ambry? I don’t see any immediate mention of byte ranges in the source you linked.

ambry · on April 13, 2023

Short answer: Yes.

However, Ambry doesn't use byte-ranges for the upload. Each chunk can be uploaded separately and the client then requests a _stitch_ operation[1]. In that operation, the client specifies the list of blob IDs for each chunk (in order) and asks Ambry to create a metadata blob listing those IDs. Ambry then returns the blob ID for the metadata blob.

[1]: https://github.com/linkedin/ambry/blob/9b7a49ac79b1678fd7fd7...

donavanm · on April 10, 2023

Very similar interface to S3 multi part uploads and (IIRC) google cloud storages equivalent. Relatively easy to put that composition & resumption logic on the client side.

NoZebra120vClip · on April 9, 2023

Etymology: an ambry is a cabinet in a church for storing sacred items such as vessels or vestments. A common usage today is to store holy anointing oils.

https://en.wikipedia.org/wiki/Ambry

comvidyarthi · on April 9, 2023

I have worked on Ambry for a few years now. This paper is a bit old, but captures most of the core concepts of Ambry.

Some of the most fascinating parts of the journey after this paper have been scaling the system to support hundreds of GiBps of throughput per cluster, multiple workloads supporting other databases and stream processing systems, and rethinking the replication to make it compatible with public cloud. Checkout the GitHub repo to learn more https://github.com/linkedin/ambry