ExtFUSE seems really cool and great for implementing performant drivers in users...

dekhn · on Sept 6, 2023

I am curious to hear what solutions people have found for this- for example, does anybody cache S3 in cloudfront and then point S3 clients at cloudfront?

This comes up because we store a lot of image data where time-to-first-byte affects the user experience but the access patterns preclude caching unless we are willing to spend $$$.

eblume · on Sept 6, 2023

You may wish to investigate cloudflare's image API: https://developers.cloudflare.com/images/cloudflare-images/

If the reason you were unable to use a CDN cache was because your access patterns require a lot of varying end serializations (due to things like image manipulation, resizing, cropping, watermarking, etc.), then this API could be a huge money saver for you. It was for me.

OTOH if the cost was because compute isn't free and the corresponding cloudflare worker compute cost is too much, then yeah, that's a tough one... I don't have a packaged answer for you, but I would investigate something like ThumbHash: https://evanw.github.io/thumbhash/ - my intuition is that you can probably serve some highly optimized/interlaced/"hashed" placeholder. The advantage of thumbhash here could be that you can optimize the access pattern to be less spendy by simply storing all of your hashes in an optimized way, since they will be extremely small, like small enough to be included in an index for index-only scans ("covering indexes"). (I have not actually tried this.)

dekhn · on Sept 6, 2023

I'm an AWS shop, running science code, cloudflare is not a product I would consider.

laurencerowe · on Sept 7, 2023

I would consider making S3 the source of truth and mirroring your data to a regular server with a bunch of SSDs. If egress is substantial this may save you money. (When I built encodeproject.org AWS egress fees made it worth putting in a stateless proxy server so we only paid the lower Direct Connect fees.)

A 4U server with 24 8tb 2.5” $400 consumer sata ssds is probably the best bang for the buck. Probably about $20k plus hosting fees. That’s 192TB of storage.

dekhn · on Sept 7, 2023

I guess I should have explained better: we're running batch science jobs, or applications that serve up images to users, already in AWS. The user experience is already "fast enough" with an EC2 server with a bunch of SSDs, except that the dataset size is hundreds of terabytes and we don't know the access pattern ahead of time. S3 has a huge savings in terms of total byte cost, and for batch science jobs that already speak S3 natively or for workflows where the engine itself can do S3 to local staging we have good performance.

See https://www.nature.com/articles/s41592-021-01326-w figure 1 a and b for a demonstration of the distinct time-to-first-byte for uncached data (local POSIX, S3, and nginx/http). I'm trying to find a way to accelerate that TTFB for a dataset that lives in S3 and we don't know which parts could be cached before a user shows up and clicks on a dataset.

(I know it's a weird use case. It's not one I particularly want to support, as my own interests are more in the large-scale batch compute than the interactive user).

laurencerowe · on Sept 7, 2023

> See https://www.nature.com/articles/s41592-021-01326-w figure 1 a and b for a demonstration of the distinct time-to-first-byte for uncached data (local POSIX, S3, and nginx/http).

Looking at your figure, almost a second to download a 128KB chunk with HDF5 seems extremely high. How many requests are being made here? Is the HDF5 metadata being re-read from scratch for each iteration of the benchmark?

It might be interesting to trace log each http request being made and its timing during an iteration of the benchmark.

Using AWS CloudShell I see range requests for the last 128KB of a 512MB S3 file without authentication taking about 25ms when reusing a connection and 35ms when not.

> I'm trying to find a way to accelerate that TTFB for a dataset that lives in S3 and we don't know which parts could be cached before a user shows up and clicks on a dataset.

If latency from S3 turns out to be the problem I'm not sure there's any alternative but paying for the space your data uses on fast storage. The cheapest option on AWS will likely be EBS SSD volumes. That runs to $0.08/GB/month or $8000/month for 100TB (4x more than S3) before IOPS charges. You could try they EBS HDD volumes but they do not advertise latency figures and are still 2x S3 pricing.

dekhn · on Sept 7, 2023

We don't use HDF5. Just look at the numbers for OME-NGFF and TIFF.

We have already tried EFS and it worked for our needs but cost about the same as your EBS. We had to enable the "EFS go fast" button (at least it exists!) which greatly increases the cost.

ashishbijlani · on Sept 6, 2023

A high-latency network will certainly become the bottleneck, but ONLY for file reads/writes. Metadata attributes (e.g., symlinks, dentries, inodes) are maintained by the file system (FUSE). Caching them in the kernel with ExtFUSE will yield faster metadata ops (e.g., lookups).

laurencerowe · on Sept 7, 2023

What userspace operations do these metadata attribute lookups map to? Listing a directory will incur an S3 list request, so another a network call though I’m not sure how fast that is.

Note, it’s not that the network is slow, it’s that object storages aren’t designed for low latency access (but with parallel reads can serve data at high bandwidth.)