> Though it has Python bindings, OpenSlide is implemented in C and reads files u...

deathanatos · on June 20, 2023

The OP here is using Azure Blob Storage, which is essentially Azure's S3 competitor. Like S3, it accepts the Range header.[1] (I'm presuming Blob Storage was modeled after S3, frankly. The capabilities and APIs are very similar.)

Similarly, GCP Cloud Storage appears to also support the Range header[2].

[1]: https://learn.microsoft.com/en-us/rest/api/storageservices/g...

[2]: https://cloud.google.com/storage/docs/xml-api/get-object-dow...

dekhn · on June 20, 2023

Even with byte-ranges in S3, I know that some standards like OME report very poor performance compared to local filesystems (their use case requires very low latency to show various subrectangles of multidimensional data, converted to image form, for pathologists sitting in a browser). They have been exploring moving from monolithic tiff image files to using zarr, which preshards and does compression in blocks.

crabbone · on June 20, 2023

"Local filesystem" is not a thing... When you mount NFS on your laptop, is it local to your laptop or not? What if you have a caching client?

In other words, "local" or "remote" is not a property of filesystem.

Various storage products exist that try to solve the problem of data mobility, that is moving data quickly to a desired destination (which is usually "pretending to move" but in such a way that the receiving end can start working as soon as possible).

For open-source examples see DRBD. There are also some proprietary products that are designed to do the same.

dekhn · on June 20, 2023

Local or remote is a property of a filesystem. This has been conventionally understood as whether the block device or the file service uses a directly attached to the host bus, versus more indirectly through a NIC or other networking technology. Of course, this idea breaks down pretty quickly; many servers used SAN, "storage area network" which gave local-like performance from physically separated storage devices over a fiber optic network. And as you point out, you can "remote" a block device since block devices are really just an abstraction.

I don't see what your point is; many applications support multiple storage backends, which is what I was referring to. The performance issues I was discussing were comparing applications that use the host system's VFS layer, compared to the S3 API layer.

crabbone · on June 21, 2023

> I don't see what your point is;

You compared "local filesystem" to performance of S3. But you have no idea what are you comparing to what. Both are undefined because neither you nor anyone reading what you wrote can know what you are measuring.

Like I said, there's no such thing as "local filesystem". You invented / repeated this term after someone who invented it on a spot. You / them didn't have a coherent explanation to what it means. Now nobody can understand what is that you are trying to say.

In essence, you are counting angels on the tip of a needle.

Also, this is not about block devices. Filesystems are programs. A lot of them are distributed programs. They run on many computers at once. I'll repeat the example I gave earlier with NFS: it's a distributed system, it has a server and a client. Both of them are the filesystem. But you cannot say that it runs "locally" or "remotely" because it's both or neither, or whichever one you choose... i.e. it's a worthless definition.

dekhn · on June 21, 2023

I don't know where you are coming from; everything I'm saying is entirely consistent with how the industry (I work in IT and "local filesystem" and "remote filesystem" are terms we all use, including with our storage vendors) talks about storage.

Here's an example paper comparing filesystems and their performance for precisely this kind of problem: https://www.nature.com/articles/s41592-021-01326-w figure 1A, B I work in this field and the figure text makes perfect sense to me and all my coworkers.