
Show HN: Pydb – a lightweight database with Python syntax queries, using ZeroMQ - asrp
https://github.com/asrp/pydb
======
goerz
There's already an (albeit deprecated) debugger package called pydb
([http://bashdb.sourceforge.net/pydb/](http://bashdb.sourceforge.net/pydb/)).
It would be good to choose a different name for this, most importantly because
`pip install pydb` is already taken

~~~
asrp
Thank you! I looked up databases to see if there were obvious name clashes but
never thought of debuggers despite using `pdb` a lot.

I'm open to suggestions for names if anyone has one. (Although I guess it'd be
nice if the top discussion thread wasn't about naming.)

EDIT: I just saw you opened an issue on github so I'm happy to take name
suggestions there.

~~~
asrp
I've just changed the project's name to pyzdb. Thanks again!

------
jmduke
It might not hold up to heavy use cases, but I can already imagine a bunch of
ways this would make my life easier (I use Python scripts to handle a bunch of
social media stuff and basic analytics, for instance, where I use either text
files or some other proxy to handle state.)

The only request I'd have is a syntactically saccharine way of spinning up the
server within the client itself, which is in general an awful idea but would
make my life easier for toy use cases.

~~~
asrp
Thanks! I'm glad to see this looks potentially useful to someone other than
myself.

If the client could start servers (other than by importing from server.py),
what would be the intended outcome if two clients try to spin up servers at
the same time? Otherwise would a file lock do? In that case, just file locking
combined with pickle could possibly be enough for your needs.

~~~
amenod
+1 (I find it useful too). I actually started writing my own implementation of
something similar (named "PersistendDict" \- feel free to steal the name ;) )
but never finished it so I will definitely check out your library. Thanks!

On account of starting / stopping the server... I personally don't care if
functionality is there or not as I can always start server manually on dev
machine, and it should be started differently on production machines anyway.
But that's just my opinion.

------
ThePhysicist
I wrote a similar library -BlitzDB- a while ago:

[https://github.com/adewes/blitzdb](https://github.com/adewes/blitzdb)

It's a pure Python database engine with a MongoDB-like query engine and
support for three different backends: File (native), SQL (via SQLAlchemy) and
MongoDB.

The library transparently translates a large number of MongoDB queries into
SQL or its own native storage backend, and when using the SQL backend it can
do things that MongoDB can't, like queries spanning multiple relationships.

The latest version is not fully documented yet but I'm using it on several
production projects myself. I'm looking for a maintainer and contributors btw,
so if you're interested feel free to get in touch with me!

~~~
asrp
Interesting! I will have to take a deeper look at BlitzDB. I don't know much
about MongoDB (so probably wouldn't be a good maintainer) but the sample code
looks good. I can't tell how it handles nesting though.

------
daenney
What's the purpose/gain of layering ZMQ into this? I read the architecture bit
but I'm still unclear as to what benefit this brings. I guess it allows for
multiple clients to use the database at the same time? I can see how the
queueing thing is useful for writes if you don't want to have to handle more
than one at the same time for the sake of complexity, but wouldn't doing this
for reading slow things down unnecessarily?

~~~
asrp
As you correctly observed, I mainly used ZeroMQ for the fan-in, so I need only
consider one request at a time without worrying about chunking, disconnection
or other lower (socket) level issues.

For speed, the idea is that you could potentially have multiple read-only
servers answering queries simultaneous (all taking from the dealer). This
isn't fleshed out yet. It possibly involves splitting requests into two queues
for read and write requests (instead of only "run").

I'd be interested in hearing any info about potential slowdowns if you have
them.

~~~
daenney
Thanks for the reply, that clears things up!

> I'd be interested in hearing any info about potential slowdowns if you have
> them.

I figured if you have a central queue that everything needs to go through then
you'd also be limited to a single read at a time. But if you can have multiple
read-only secondaries then that's unlikely to be an issue.

------
jlpdyh
You could use `-e git://github.com/asrp/undoable` in requirements.txt to save
some steps.

~~~
asrp
Oh, thanks, I wasn't aware of that flag. Though for the moment

    
    
        pip install -e git://github.com/asrp/undoable#egg=undoable
    

tells me `setup.py` doesn't exist (because it doesn't). I haven't gotten
around to packaging yet not knowing (before today) if anyone's interested.

I'll probably look into that soon.

~~~
asrp
I decided to push a quick patch to github so installation with just `pip
install -r requirements.txt` is possible now. I'll look into proper packaging
later on. Thanks again!

------
emehrkay
Will this code not cause issues? I know that you aren't modifying args or
kwargs, in the _run method, but it just seems like a potential point of
failure or a python anti-pattern

    
    
        def _run(self, func=None, args=(), kwargs={})

~~~
asrp
Yes, indeed! Thanks for pointing that out. I actually saw that when I was
cleaning this up a bit for release and couldn't make up my mind.

I mean I'm not modifying args or kwargs now but if I did later, I could shoot
myself in the foot in a not so obvious way. But on the other hand, I don't
know a succinct way to express these default values. I'd probably go with
`args=None, kwargs=None` and then `args = args if args else ()`.

~~~
emehrkay
I typically do, but i don't know if it is the most "pythonic"

    
    
       def func(list_arg=None, dict_arg=None):
           list_arg = list_arg or []
           dict_arg = dict_arg or {}

~~~
moreati
It's not ideal. For instance if I as the caller wished to provide an empty
dict-like object (e.g. dict_arg=collections.OrderedDict()), then your code
would silently ignore it, and use a new dict.

Instead of checking for any object that evaluates to False, you should
explicitly check for None, e.g.

    
    
       def func(list_arg=None, dict_arg=None):
           if list_arg is None:
               list_arg = []
           ...

------
nerdponx
Why would I want to use this over SQLite?

------
vapemaster
Seems like this could be a good idea for small personal or temporary
dashboards. Especially those with viz powered by packages that work natively
with Python data structures like bokeh or plotly .

------
MeteorMarc
The syntax reminds me of PyDal:
[https://pypi.python.org/pypi/pyDAL](https://pypi.python.org/pypi/pyDAL)
[http://www.web2py.com/books/default/chapter/29/06/the-
databa...](http://www.web2py.com/books/default/chapter/29/06/the-database-
abstraction-layer#Shortcuts)

------
toumorokoshi
vanilla zeromq is a pretty bad choice for any database. zmq explicitly makes
no guarantees about reliable delivery, so losing random inserts or queries
here or there would be considered acceptable.

subscribers also lose the first few messages the publisher sends, unless you
make sure you start the subscriber first. The publisher will make no
indication of which messages are lost and which ones have actually been sent
to someone:

[http://zguide.zeromq.org/page:all#Getting-the-Message-
Out](http://zguide.zeromq.org/page:all#Getting-the-Message-Out)

I would suggest building something on top of request-reply instead: it's
actually possible to get build reliable delivery on that.

[http://zguide.zeromq.org/page:all#reliable-request-
reply](http://zguide.zeromq.org/page:all#reliable-request-reply)

------
webmaven
Rather reminds me of ZODB[0].

[0] [http://www.zodb.org](http://www.zodb.org)

~~~
tyingq
The interface reminds me of tied hashes and lists in Perl, or something like
this[1] for Python.

[1][https://github.com/bob2827/pydis](https://github.com/bob2827/pydis)

------
d0vs
Looks like a better shelve
[https://docs.python.org/3.6/library/shelve.html](https://docs.python.org/3.6/library/shelve.html)

------
BerislavLopac
Reminds me of TinyDB
[https://tinydb.readthedocs.io](https://tinydb.readthedocs.io)

------
gigatexal
Hmm this is pretty interesting. Going to check this out.

