westonpace's comments

westonpace · 2024-04-30T15:35:28

Lance dev here. We are working on a new version of our format[1] as well. We are watching Nimble too. If they are interested in solving our use cases then that is less work for us.

At the moment it is not clear that is the case. However, it is too early to tell. Our biggest concerns are:

- Good integration with object storage

- Ability to write multi-modal data without exhausting memory

- Support for fast point-lookups (with the option of cranking up the amount of metadata for richer lookup structures that will be cached in RAM)

Both Nimble and Lance are not intended to replace Parquet/Arrow. Parquet and Arrow are designed to be spread throughout a solution as a universal interchange format. E.g. you will often see them all throughout ETL pipelines so that different components can transfer data (even if it isn't a ton of data). With Arrow and Parquet interoperability is a higher priority than performance (though these formats are fast as well). They are developed slowly, via consensus, as they should be.

Nimble and Lance are designed for "search nodes" / "scan nodes" which are meant to sit in front of a large stockpile of data and access it efficiently. There are typically only a few such components (usually just a single one) in a solution (e.g. the database). Performance is the primary goal (though we do attempt to document things clearly should others wish to learn / build upon). I'd advise anyone building a search node or scan node to make the file format a configurable choice hidden behind some kind of interface.

[1] https://blog.lancedb.com/lance-v2/