
Rust DataBase Connectivity (RDBC) - andygrove
https://andygrove.io/2020/01/rust-database-connectivity-rdbc/
======
CodesInChaos
1\. Can `Box<dyn RowAccessor>` even work like that, considering it's not
object safe? My understanding is that adding the `Self:Sized` bound to `get`
makes it compile, but also means that this method won't be available in a
dynamic context, so the accessor you get from the rowset is completely
useless.

2\. Why do the accessors return results? IO errors should already happen when
loading into the rowset. So at that point the only error the column accessor
could run into is an out of bounds index, which is a panic worthy programming
error.

3\. Why do the accessors return options? Shouldn't that be absorbed into the
generic `T` and only be used for nullable columns?

4\. Why return owned accessors (boxes) instead of references?

5\. Columnar datastorage without any low level access to the in-memory
representation seems rather pointless, for a performance point of view. You
can't access it with SIMD instructions and incur an indirect call overhead for
each accessed element. Do you expect people to downcast the column accessor to
a specific type for high performance access?

IMO abstracting over storage formats via dynamic dispatch is the wrong
approach. The proper way is making all types take a generic a parameter for
the format.

~~~
twic
What you write sounds extremely sensible to me! Thoughts:

2\. Loading the rows might be lazy, using a cursor etc, so you could encounter
IO errors while traversing the rowset. You really, really want lazy rowsets
for iterating enormous results without having to materialise the whole lot in
memory.

5\. Maybe as well as element-by-element access, there should be some sort of
bulk access to get a buffer full of elements in one go.

A. I'd like to see some way to use a ColumnAccessor, or some other thing, to
access elements in a row. I would like to write code like:

    
    
        use std::io::Result as IoResult;
        
        struct Column<T> {
            column_type: std::marker::PhantomData<T>,
        }
        
        struct Row {}
        
        impl Row {
            fn get<T>(&self, column: &Column<T>) -> T { unimplemented!(); }
        }
        
        struct RowSet {}
        
        impl RowSet {
            fn get_column<T>(&self, name: &str) -> IoResult<&Column<T>> { unimplemented!(); }
        
            fn get_row(&self, index: u64) -> IoResult<&Row> { unimplemented!(); }
        }
        
        pub fn main() -> IoResult<()> {
            let rows = RowSet {};
            let weight_column = rows.get_column::<f32>("weight")?;
            let row = rows.get_row(23)?;
            let weight = row.get(weight_column);
            Ok(())
        }
    

The point being that i can do the lookup of the column once, ensuring that it
exists and has the type i expect, and then safely extract column values from
rows later on.

~~~
CodesInChaos
2\. I understood the `RowSet` as representing a batch, not the whole data set.

A. I agree that the column should only be requested once, similar to what you
propose. Though I'd move the lifetime into the `Row` type instead of returning
a reference. Unfortunately you don't totally escape the runtime check, since
you still need to check if the column and row come from the same RowSet.

~~~
twic
> I'd move the lifetime into the `Row` type instead of returning a reference

I'm afraid i don't understand; which lifetime, and which returned reference?

> Unfortunately you don't totally escape the runtime check, since you still
> need to check if the column and row come from the same RowSet.

True, but at that point it's a programmer error, so at least you can panic
rather than returning a Result!

------
peatmoss
I appreciate that the author here takes pains to review existing standards and
implementations like ODBC/JDBC, and also reviews newer ideas like columnar
stores and projects like Apache Arrow. It inspires confidence when I see
engineers do some degree of review before swashbuckling their way into new
code. It’s like the literature review in graduate school: first read, then
code.

~~~
xxxtentachyon
Andy Grove is a PMC on Apache Arrow, and a great columnar store community
member. It was probably less of a literature review for him than a reflection
on past experience.

~~~
peatmoss
I didn’t have that context—thank you. Still, I guess it’s an endorsement when
systematic review of prior art can be mutually confused with experience :-)

------
pimeys
There are also other projects working solving similar problems.

The other is sqlx[0] serving as an asynchronous crate connecting to mysql and
postgresql that validates the queries at compile time and is built as a new
ground-up implementation using async/await and async-std.

And our project is quaint[1] which builds on top of existing tokio-based
database crates giving a unified interface and a query builder.

[0] [https://crates.io/crates/sqlx](https://crates.io/crates/sqlx) [1]
[https://crates.io/crates/quaint/](https://crates.io/crates/quaint/)

------
cwp
I've never understood the appeal of this sort of thing. The idea is create a
common interface to a bunch of different databases. But you still have to
write or generate SQL that is specific to whatever database you're actually
using. Outside of a very narrow range of apps (eg SQL-IDE kinds of tools),
you're going to be both coupled to a specific database and limited by lowest-
common-denominator design constraints in the interface library. Ugh.

~~~
pjmlp
Having to fine tune SQL, or change store procedure syntax call, is a tiny
change, compared to rewriting from scratch the complete database binding code.

Plus for many kind of applications, the ANSI SQL coverage across RDMS is quite
ok.

------
ragerino
I think the initiative is great. I did lots of Java database development.
Starting with JDBC, circumventing J2EE with Spring and Hibernate, and ending
up doing JPA with Spring. I haven't seen anything similar in Rust.

Therefore I recommend looking at JPA for object relational mapping, and Spring
Frameworks JdbcTemplate for things like basic CRUD support and JPA
abstraction.

~~~
sverhagen
These tools are my daily ones too, but they're abstractions on top of JDBC and
as such a different gap than what the author is trying to fill.

------
eb0la
The work made by this guy is incredible. Apache arrow commiter (Datafusion),
wrote SQL crate, BAllista (think spark+k8s-java+rust), and now this.

I feel too much envy.

------
thayne
> RDBC is specifically for the use case where a developer needs the ability to
> execute arbitrary SQL against any database and then be able to fetch the
> results.

So RDBC solves the problem of different connection protocols. But what do you
do about the fact that writing a query that works for every database is
extremely difficult?

~~~
spullara
You don't do anything about that in this context. The queries you write must
target the database you are going to use.

~~~
thayne
Right, I'm just curious what the use case is where you want to be able to
abstract over the connection for a variety of relational databases, but don't
need to abstract over the differing syntax for the queries.

~~~
jonfk
The use case would be the same as the one covered by jdbc and odbc. That is
creating a standard interface to connect to databases at a lower level than
the orm. This could then be used by an orm or query builder to communicate
with the database. That’s how for example I can use Spring JPA and if I write
either fairly portable sql or use the crud repository methods, I can connect
to multiple different sql databases by simply providing a different jdbc URL.

How to handle the differing syntax of each databases would be handled at a
higher level, by the query builder or orm. The orm could decide to let the
user decide for themselves whether to write portable sql or not and fail at
runtime, or like diesel in rust, enforce the useable features through its
types.

In my experience that is something that is currently missing in the rust
ecosystem. Each library seem to have its own connection logic and if I want to
write a program that could run on different databases in rust right now, I
have to write different code to connect to each database I would like to
support.

Diesel has been serving me really well but I currently need to choose a
specific backend when I am building a query, and implementing the above
dynamic style of connection would be quite cumbersome to implement. It would
be nice if we could choose another a standard sql backend where I would only
be able to use a subset of features supported by most sql databases and build
queries against that. Hopefully the RDBC project could help with something
like that.

------
ww520
This is great. Using the traits as the interface is clever. That means as
along as a DB specific driver implements the traits, it will work.

~~~
snuxoll
That’s the point, just like JDBC, ODBC, DB-API, ADO.Net, etc.

------
dingribanda
Efficient access is more than the interface on the client. The wire format
matters. If the server sends the data in the row form from a columnar stored
table in row form, it wont be efficient, there will be too many
transformations. Perhaps a way for the client to tell the server what format
it wants the data may be useful.

------
pylua
Why not use macros to map the result back? I understand that that is not
consistent with jdbc and may be crossing into ORM territory, but it does seem
like a clean way to handle it in rust.

