Welcome to Apache OpenDAL

fishpen0 · 2023-12-22T12:32:38 1703248358

It’s a shame they have committed to never supporting Bigquery and other cloud provider services. We really need something like this at my org but there’s no compelling reason to move all of analytics off of BQ nor is there a good reason to add yet another copy layer between the apps and the analytics tools

We literally have an entire set of teams building a DAL internally and its basically a forever project while existing app engineers just keep doubling down on table federation and other stuff that breaks lineage and ownership models

sanderjd · 2023-12-22T18:19:33 1703269173

I don't really grok how the API of this - which appears to be very file-oriented - would be adapted to querying a sql interface like bigquery. How would that look?

erickj · 2023-12-22T12:42:52 1703248972

OOC what's the scale of data storage that you're using with bq?

janejeon · 2023-12-22T07:51:22 1703231482

This is really interesting. For a project I was building I figured out that having a sort of "universal access layer" for all types of storage was a requirement, and this looks like it's right up my alley.

xuanwo · 2023-12-22T08:14:52 1703232892

I'm an OpenDAL committer. Thank you for your interest! I'm happy to assist with integrating OpenDAL into your project.

contravariant · 2023-12-22T10:48:15 1703242095

Just because the author of this piece seems to be looking at this post, some minor nitpicks regarding spelling/phrasing:

> GCS has native JSON API which more powerful

Missing a verb.

> OpenDAL needs to implement features in zero cost way which means:

"in a zero cost way" is probably better.

> What OpenDAL does?

What does OpenDAL do?

> Free to zero cost

This is a tricky one, right now it sounds a bit weird because you're turning 'to zero cost' into a verb, suggesting users would have to 'zero cost' things themselves. The problem is that 'free' and 'at zero cost' mean the same thing, so it's hard to use both in the same phrase. Free of cost could work.

xuanwo · 2023-12-22T11:19:30 1703243970

Thanks a lot! I will fix those spelling issues ASAP.

xuanwo · 2023-12-22T14:00:52 1703253652

Addressed in https://github.com/apache/incubator-opendal/pull/3805

verdverm · 2023-12-22T11:38:34 1703245114

Did you notice the "edit this page" link at the bottom?

I always put those buttons on my sites and get a lot of these minor edits like this on GitHub.

I'm curious in the user research sense

contravariant · 2023-12-22T13:07:04 1703250424

I hadn't noticed it actually, though it's also a bit inconvenient for me to write a git commit at the moment.

Hope this helps.

verdverm · 2023-12-22T13:38:10 1703252290

Yea, very normal response to dropping someone into an edit page on GitHub. That option works really well for typos

I put three links on my site pages

1. edit page

2. Open issue for page (autofill some details like page name in the subject)

3. Open issue for project

Sounds like you put my 2 here

elif · 2023-12-22T13:56:55 1703253415

Free and zero cost do not mean the same thing in software

AdrianoKF · 2023-12-22T11:56:22 1703246182

Does this intend to fill a similar spot in the Rust ecosystem as fsspec (https://filesystem-spec.readthedocs.io/en/latest/) does for Python, or am I getting the wrong idea?

cjalmeida · 2023-12-22T17:13:30 1703265210

I was thinking about this as well. Arrow/PyArrow integration is very useful.

lijok · 2023-12-22T18:41:45 1703270505

Looks great. Very impressed with the vision (although language could be improved), and the "wont implement" decisions made.

I would provide a bit of challenge on tenet 2 however. Supporting "storage_class" for S3 is a compromise that clearly has to be made, and yet, it appears you're not realizing you'll have to make other compromises like it in the future. I would suggest a storage-specific configuration class for each storage backend, and then you wont need to make these arbitrary concessions. The power of OpenDAL will be its standardized data API, not its simplified configuration.

I'm also not convinced that the project should implement OpenDAL Gateway. I cannot see how it will provide anyone any value other than making things more confusing.

sbt567 · 2023-12-22T06:07:47 1703225267

Very interesting. Is this a kind of "programmable rclone"?

tison · 2023-12-22T09:26:11 1703237171

Sounds likely.

The core part of OpenDAL is a Rust crate that provides fs-like APIs over different storage backends, but we also investigate providing other interfaces like a CLI. We have an experimental binary named `oli`[1].

You're welcome to start a discussion[2] to share how you use rclone and we may find it fit in OpenDAL's scope :D

[1] https://github.com/apache/incubator-opendal/tree/main/bin/ol... [2] https://github.com/apache/incubator-opendal/discussions

calvinmorrison · 2023-12-22T10:01:49 1703239309

Sounds like a job for 9p

prabir · 2023-12-22T09:17:12 1703236632

Was surprised that rust didn’t have vfs libraries. Created my own async-vfs crate but now using opendal for a Nextcloud alternative that I have been working in rust.

xuanwo · 2023-12-22T09:30:56 1703237456

Thanks for using OpenDAL!

NuSkooler · 2023-12-22T21:39:54 1703281194

This looks very promising. I love that Apache went straight to Rust vs the usual Java or similar.

elsadek · 2023-12-22T09:14:00 1703236440

This reminds me of Mule ESB when it was in its beginning.

latchkey · 2023-12-22T18:07:37 1703268457

How it ended: In 2018, Mulesoft was acquired by Salesforce for $6.5 billion in a cash-and-stock deal.

gavinray · 2023-12-22T16:27:46 1703262466

This is essentially Hasura without a networked API or the ability to make queries that do cross-datasource joins

j-a-a-p · 2023-12-22T10:27:57 1703240877

Slightly related: https://xkcd.com/927/

andrewstuart · 2023-12-22T06:36:51 1703227011

A programmer friend and I had a running joke that it was impossible to work out what any given Apache project actually does.

crancher · 2023-12-22T07:02:28 1703228548

"OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way."

Generally agree, but this one seems clear enough.

TeMPOraL · 2023-12-22T10:45:39 1703241939

The quote by itself is so generic it's useless.

That's my peeve with marketing advice this day. Describing what the product will do for user in emotional/vague terms carries zero information relevant to evaluate the product and make a use/purchase decision. It's either treating the customer as a generic unsophisticated idiot, that can't understand what the product actually does, and just needs to be told it'll make them happy, or it's a pure manipulative play. Either way, this is not the style OSS projects should pick up.

bryanrasmussen · 2023-12-22T10:57:14 1703242634

>The quote by itself is so generic it's useless.

to me it means - you will be able to write code that allows you to save and retrieve data from various online services that offer data storage without having to know the particulars of each one of those services apis.

Is this what it does? And if so I wouldn't think that was useless information, at least it is useful enough for me to determine should I Read on?

Kinrany · 2023-12-22T14:21:53 1703254913

It is accurate but not concise. The explanation should start with the purpose of the project described in about one sentence, then elaborate by adding context that may or may not be known to the reader.

xuanwo · 2023-12-22T14:26:16 1703255176

Could you offer some guidance? This slogan is the clearest explanation I've come across. I'm open to better suggestions! Thanks in advance.

Kinrany · 2023-12-22T16:50:33 1703263833

I linked the overview page because the image on it explains the project better than any description I could find.

One of the issues I see is mixing "what" and "why".

The "what": an abstraction for accessing object storage services, implemented as client libraries in multiple languages plus utilities built on top.

The "why", the problem to be solved: there are many services that provide key-value storage with values being possibly very large; it is reasonable to want to write code that works with any of these services but there is no common interface to write this code against.

Caveat, I may very well be misunderstanding the project.

Questions I have:

- how does it compare to object_store

- what is the philosophy of the project; in what way is it opinionated about the design of object storage services

fatherzine · 2023-12-22T17:05:19 1703264719

the slogan is generic enough to cover anything from curl to jdbc. be more specific.

sanderjd · 2023-12-22T18:22:28 1703269348

I think the quote seems to describe the project well. It sounds generic because it does a generic thing.

cybrox · 2023-12-22T10:49:17 1703242157

I does seem clear enough but I don't think the marketingy vague description does its actual functionality justice here.

speedgoose · 2023-12-22T07:21:43 1703229703

I agree it took me some time to understand what it does.

It seems to be a Rust library/crate allowing filesystem like operations on many storage systems, as long as you can do basic key/values things. From your filesystem, to redis, S3, sqlite, etcd…

They also provide Python and NodeJS bindings.

The operations are apparently (not supported by all storage systems):

stat read write create_dir delete copy rename list scan presign blocking

tison · 2023-12-22T08:26:23 1703233583

It's hard to me ever on OpenDAL also.

But now I have an image that you can use it either as:

1. A drop-in replacement to S3 SDK (AWS's Rust SDK is ...); 2. A quick way to support your users to configure different cloud storage they have (release you from supporting multiple cloud OSS backends with different SDKs).

A few increasing DB projects use OpenDAL in the second way, like Databend, GreptimeDB, QuestDB, RisingWave, etc.

wey-gu · 2023-12-22T09:43:46 1703238226

So does project like Mozilla sccache.

- https://github.com/mozilla/sccache

asah · 2023-12-22T13:19:31 1703251171

[flagged]

jpitz · 2023-12-22T16:54:54 1703264094

Looking at the kind of APIs they are supporting, this appears to be aiming at filesystem-like behavior. If I was building such a thing, I would leave SQL out of scope as well.