
RediSQL – A Redis module that provides a functional SQL database - geospeck
https://github.com/RedBeardLab/rediSQL
======
antirez
I think that what the module does is actually not clear here in the comments
I'm reading. It's mostly a nice "Hello World" module that shows how Redis can
act as a proxy, but it's just a small glue-module that calls the SQLite
library with the string passed as query. So you are running queries on a local
sqlite server, proxied by Redis.

While this can be a good programming example for newcomers, I suggest to look
at modules that are really implementing new DB paradigms on top of Redis. For
instance this module builds a graph DB on top of Redis data types, with a
query language and so forth:

[https://github.com/swilly22/redis-module-
graph](https://github.com/swilly22/redis-module-graph)

~~~
arielm
I completely agree. As someone who uses redis sql heavily I understand both
have their use cases and wouldn't want to merge both to begin with.

On top of that, this looks like a simple proxy, so it's only making things
less efficient.

OP, I'd suggest explaining the benefits better. Maybe I missed it but I don't
really get this at all.

~~~
siscia
The modules is born just to fulfill one of our needs: we had several micro-
services using redis and no DB at all, but then we find ourselves in the
situation where we needed SQL capabilities, redis module where being announced
and we give them a shoot.

What you mean by efficiency ? Operation at the second? Sure it is less
efficient than Postgres or of SQLite embed in your application.

However if you mean convenience and simplicity this solution may (arguably) be
efficient enough.

This module sit between a "full" client-server DB, a redis instance (that
doesn't provide SQL capabilities) and SQLite that is not client-server.

Being completely honest, how HN teach us, build a great technical product
without showing it as soon as it provide just a tiny amount of value is not
desirable.

The comment in this thread indicate that there is some interest and I got some
valuable suggestions and feedback on what to develop next.

~~~
antirez
Hello siscia. What's not clear to me if this is intended for production or if
it's an example. I understand the general problem of proving SQLite with a
network interface/API, and I guess, but I don't know, there are many solutions
for that. If you are trying to solve the problem of SQLite not having a
network interface by default, allowing the access from Redis via the Redis
protocol, then the implementation is not optimal AFAIK because it does not use
any threading but stops Redis while executing the queries in SQLite. Redis
modules allow for non blocking commands doing the bulk of the work in threads,
so I would suggest at least this approach if this is what you are doing. IMHO
also there should be a more sensible conversion of data types from the SQL
reply to the Redis protocol.

If Redis has to be stopped during SQL queries IMHO it's better to use another
network layer with SQLite.

~~~
siscia
Ciao Salvatore,

I claim that the module is alpha code.

There is a non-blocking implementation on the way. The problem that I am
facing is the tradeoff between using a single thread or multiple threads, but
I came to the conclusion that it depends on the use cases. (SQLite on write
lock a whole table no a row at the time, so I am expecting a single lockless
thread to be faster on write intensive workload and multiple thread be faster
on read intensive).

What you suggest as conversion of data types from SQL reply to redis ? Right
now a select return a nested array, the outer for the rows the inners for the
columns.

Thanks

~~~
antirez
Hey, glad to know you are going multi-threaded on that, since I believe it
will be very easy. To start you could just use a thread per request (not just
to start, probably even in the long run), immediately block the client via the
module API to do this, fire the thread, wait for the thread to complete the
work and accumulate the reply, and re-surrect the blocked client, feeding the
reply. AFAIK this can work very well.

About the reply, returning a nested array as you are doing may actually be
enough after all, even if the reply need to be parsed accordingly to the exact
query. The alternative would be to annotate the reply with field names and
things like that.

Another thing that probably could be improved: to provide direct support (by
passing multiple arguments) to quote strings to pass to SQL, in order to allow
the user to have an easy path to avoid security issues.

~~~
siscia
I tried to spawn a new thread for each request and it was embarrassedly slow,
maybe I was doing something wrong from my side, but I believe I will settle
for a pool.

About the reply the only thing that I see manageable is to include the name of
the columns as first row, I was considering the idea but I am not completely
sold yet.

I was thinking to bind a particular statement to a key in order to provide the
possibility to have standard queries between different clients

    
    
      REDISQL.CREATE_STATEMENT insert_user "INSERT INTO users VALUES($1, $2, $3)" 
      > OK
      REDISQL.EXEC_STATEMENT insert_user name, password, email
      > OK
    

or with multiple insert in the same transaction

    
    
      REDISQL.CREATE_TRANSACTION
      > "REDISQL.TRANSACTION_12345"
      REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name1 password1 email1
      > OK
      REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name2 password2 email2
      > OK
      REDISQL.EXEC_STATEMENT "REDISQL.TRANSACTION_12345" insert_user name3 password3 email3
      > OK
      REDISQL.COMMIT_TRANSACTION "REDISQL.TRANSACTION_12345"
      > 3 # as the number of statement executed

~~~
arielm
Just to throw my two cents into the mix. The idea of being able to _query_ a
single key in a SQL-like fashion is interesting, but I can't think of a
production implementation that would be useful enough.

Instead, I'd look at using redis in front of something like MySQL so some
operations go to redis and other right to the server.

Having to write SQL statements as redis EXECs isn't elegant IMO.

~~~
siscia
> Just to throw my two cents into the mix.

You have no idea how much those 2 cents are appreciated, really, I mean it.

> The idea of being able to _query_ a single key in a SQL-like fashion is
> interesting

Not sure I understood what you mean here. Would you mind to provide some kind
of example or to explain it further?

> Instead, I'd look at using redis in front of something like MySQL so some
> operations go to redis and other right to the server.

Extremely interesting, how you discriminate though ? If it is going to be a
"regular" cache you don't need a module, redis can already do it by itself.
What use case you were think of?

> Having to write SQL statements as redis EXECs isn't elegant IMO.

I do completely agree, the API is going to get more structured for sure.

------
dvirsky
I've been working on a secondary indexing module for redis with a subset of
SQL WHERE for selection predicates, if anyone's interested. It's not stable
yet, but it's coming along nicely.

The idea was actually NOT to create an SQL like database, but just proxy
ordinary redis commands via secondary index. So you select ids, and perform
HGET or HGETALL on them, etc. And the same goes for indexing - you perform
something like HMSET and "through" the index, and it performs the operation
while indexing the hash automatically.

Also, indexes can be used as direct data types with no automation, you just
insert ids and tuples of data into them, and you can query them and get ids
back.
[https://github.com/RedisLabsModules/secondary](https://github.com/RedisLabsModules/secondary)

------
EugeneOZ
I tried to do the same (before great Lua support, using PHP), but overall
performance was similar to MySQL (much more reliable than my experiment) so I
abandoned it.

So would be very interesting to see benchmarks and comparisons with MySQL - I
think right now we have more ways to implement it better.

~~~
EugeneOZ
Really interesting to hear reasons of downvoting. You think I'm lying or there
is comparison?

~~~
nik736
PHP.

~~~
EugeneOZ
PHP wasn't a bottleneck in this case, most of the time has been spent on
communication with DB (as usually in web apps). I think (and hope) this
solution can show better performance because of less communications required,
but numbers are more interesting than assumptions.

upd: comment about proxying sqlite eliminated this hope.

------
siscia
Author here, if you have any question please feel free to ask.

Also I have introduced the module in a blog post here:
[http://redbeardlab.tech/2016/12/13/redisql](http://redbeardlab.tech/2016/12/13/redisql)

~~~
pizza234
Which would be a use case, from a technical perspective?

The blog mentions:

    
    
        If you are writing microservices to avoid bothering the master of DB with your need ephemeral needs of structured database
    

and

    
    
        I see this module perfect for small dataset of unimportant data.
    

In other words, why would one use this module in place of a straight SQLite
database (possibly in-memory, if there is really such need for speed)? Such
module doesn't simplify anything (as far as I understand), since SQL is used
anyway.

I would understand this as an experiment (something like "ReDoom"), but since
there will be a paid version, I'm trying to position the product technology.

~~~
edibleEnergy
Well, SQLite has to be accessed over a filesystem. This sounds like it would
proxy to a sqlite db located on the Redis server. Still sounds dicey but I
could see the appeal if you didn't have the DBA skills necessary to set up a
decent PostgreSQL or other SQL server.

~~~
pizza234
Well, although the requirements which would lead to such setup are clearer
now, it's still smelly to me.

Under this conditions there is still a false dichotomy between RediSQL on one
side, and "complex RDBMS" on the other.

What about simple RDMSs like H2? Such solution would meet the requirements:

    
    
        - network-accessible RDBMS
        - no configuration
    

The only requirement I can think of which would make this solution inadequate
would be "No JVM on the system", which sound quite far-fetched.

------
partycoder
Redis and NoSQL data stores in general are for merge-on-write not merge-on-
read. That's why most of them are non relational (no relations, foreign keys
or joins).

This project encourages merge-on-read which is exactly the problem k/v stores
try to solve in another way (merge-on-write).

The problem with relational data stores is that once you have relations then
you need consistency and then you need moving operations to the data store
itself (stored procedures), and transactions, etc... and suddenly you need an
army of certified DBAs and a $1+ million dollar budget on datastores alone.

It's not the right way to go. Feel free to have SQL but adding joins is a bad
idea.

~~~
Jweb_Guru
The problem of consistency is related to supporting multi-key, read-write
transactions of any sort, and has nothing to do with support for joins, or
relations. You're not escaping from it somehow by using a NoSQL database.

~~~
partycoder
In a relational database you are almost certainly going to maintain data about
the same "entity" distributed across different tables. And that is encouraged
(hence the term "relational databases").

In k/v datastores, e.g: document oriented, you can just put all the data
related to one entity on one record and that's it... and dealing with multi-
key operations is discouraged rather than encouraged.

Why discouraged? because it's hard to load balance, creates the need for
synchronization and locking at a broader level, adds complexity to the
maintenance operations, and so on, so forth.

The SQL thing exists because dBase, one of the first database products, had
it. And then every product that entered the market had to go with it to
diminish friction.

~~~
Jweb_Guru
Okay, so your solution to consistency is "grab a big ol' lock over all
possible related data, and hope like hell that you got all of it and it can
live in one key." That doesn't really work when the amount of related data you
have gets large enough, and it's really easy to have an efficient SQL database
behind the lock (that's essentially how VoltDB works). Behind the lock, you
can still have multiple tables, joins, and everything people associate with
RDBMSes, without any of the consistency issues you mention.

Your discussion about what's "encouraged" or not is frankly irrelevant, as
key/value stores are super easy compared to general relational databases and
most RDBMSes are already quite good key/value stores (by contrast, most
key/value stores make lousy RDBMSes). The simple fact of the matter is: if
your key/value store doesn't support multikey read/write transactions, it has
strictly less functionality than a database that does, regardless of how you
use it, and if it _does_ support multikey read/write transactions it'll have
all the same performance issues you associate with RDBMSes (well, that's a
slight oversimplification because it may provide a more limited API).

------
gtrubetskoy
I did something similar a while ago (2012) (it even made it to the top of HN).

[http://thredis.org/](http://thredis.org/)

Thredis also added multi-threading to Redis. The code is here:
[https://github.com/grisha/thredis](https://github.com/grisha/thredis)

------
stevekemp
It's a cute project, I remember you posted it before. I remember somebody else
had a go at this a few years back

[https://groups.google.com/forum/#!topic/redis-
db/TLfwGfldgUU](https://groups.google.com/forum/#!topic/redis-db/TLfwGfldgUU)

~~~
siscia
It is a little different from Threadis since, in my understanding, threadis is
more a redis fork. While rediSQL is a redis module.

Using a redis module get you A LOT of thing from free and you really stand on
the shoulder of giants.

~~~
gtrubetskoy
> threadis is more a redis fork. While rediSQL is a redis module.

Thredis author here - Redis did not support modules back then, had I done this
today it prolly would have been a module (though it's been a while since I
hacked on Redis and I don't know how modules work)

~~~
siscia
Hi :)

sure, I took a deep look at your project and really appreciate you open source
it.

Do you have any feedback to share?

Cheers,

Simone

------
hoschicz
Where does it save the data? Just in a blob in Redis? Does it support
replication? How about persistence - - can I use standard Redis persistence?
I'm kinda new to modules in Redis.

Definitely could use this for migration off SQLite to a solid DB.

~~~
siscia
Just update the readme.

For now it works in memory, after Saturday it will save data in a standard
SQLite file.

Replication and SQL are fairly tricky beast together. What is your use case?

~~~
hoschicz
My use case is: one SQLite DB for every client, scaling by distributing
clients across servers. I would like to have all clients in one DB and make
scaling easier.

~~~
siscia
That is quite interesting and achievable in reasonable bound of complexity.

I could associate each DB to a redis key and stream all the UPDATES on each
db.

At this point you could simply listen to the stream and apply those updates to
a master DB.

What you think?

------
azureel
Hello siscia, do you have any benchmark against a sqlite Db on Ramdisk?

~~~
siscia
Hi Azureel, benchmark are a little tricky in this situation.

You are sending data to the redis instance, that read your statement and pass
it to the module, the module execute and finally reply.

And everything really depends on your network and load on redis itself.

However, to just give some number on my machine I do 1000 insert on 0.6
seconds, the longest insert took 0.01 second and overall I can claim a total
of 1000 / 0.6 => 1600 insert per second. Consider that each insert is a
isolated transaction.

The test I run is exactly the one you can find here:
[https://github.com/RedBeardLab/rediSQL/blob/381e3796ad31c231...](https://github.com/RedBeardLab/rediSQL/blob/381e3796ad31c231719380afb92352c54b244b8c/test/performance/rediSQL_bench.py)

Of course, if you start to insert JOINS everything is slowing down.

Also please consider that, at the moment, it is a BLOCKING version, while
executing the statements redis itself is waiting.

Finally I believe there are possibilities to improve this numbers.

------
xena
It'd be really interesting to use redis modules like this in the future to
have redis itself be both the main interface to the database for programs and
a caching layer.

------
m82labs
Fully in-memory SQL can also be accomplished using IMOLTP via MS SQL Express
on Linux now as well. And you get a fairly feature complete engine along with
multithreaded execution, no blocking, and some great client options. The only
limits right now are 4 cores and ~350MB of in-memory data per database.

~~~
nolok
My memory is a bit fuzzy right now, but isn't mysql itself providing that
since forever too? Or do you fully in memory db - with a hard copy on the hard
drive?

~~~
m82labs
I think most RDBM systems have a way to pin a table or database in memory. The
difference with the Microsoft solution is that it is a separate engine
designed explicitly for memory. So there is no locking or blocking of pages.
There are some other neat features as well, like hash indexes, but I have been
able to push an 8 core box to ~80,000 inserts per second from a remote client,
and over 200,000 per second for a test I ran on the host. I have not yet ran
tests for express edition, but I will be interested to see how it performs.

To answer your other question, you have the option to either keep it all in
memory (and lose on reboot) or commit the data to disk. In my testing I only
saw a ~10% performance penalty for committed data to disk (granted we have
extremely fast SSDs).

