Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Doculite – Use SQLite as a Document Database (npmjs.com)
160 points by thenorthbay on Aug 7, 2023 | hide | past | favorite | 56 comments
Hi!

As I was working on a side project, I noticed I wanted to use SQLite like a Document Database on the server. So I built Doculite. DocuLite lets you use SQLite like Firebase Firestore. It's written in Typescript and an adapter on top of sqlite3 and sqlite.

Reasons:

1) Using an SQL Database meant having less flexibility and iterating slower.

2) Alternative, proven Document Databases only offered client/server support.

3) No network. Having SQLite server-side next to the application is extremely fast.

4) Replicating Firestore's API makes it easy to use.

5) Listeners and real-time updates enhance UX greatly.

6) SQLite is a proven, stable, and well-liked standard. And, apparently one of the most deployed software modules right now. (src: https://www.sqlite.org/mostdeployed.html)

What do you think? Feel free to comment with questions, remarks, and thoughts.

Happy to hear them.

Thanks




I'm the core maintainer of the npm sqlite package that your library uses. I recommend that you don't make it a direct dependency, but use dependency injection or an adapter pattern that way your library isn't dependent on a specific package version of sqlite/sqlite3 (the npm sqlite package API is pretty static at this point though!).

The npm sqlite package used to have sqlite3 as a direct dependency in older major versions and most of the support issues were against sqlite3 instead of sqlite. Taking that dependency out and having the user inject it into sqlite instead removed 99% of the support issues. It's also really nice that sqlite has no dependencies at all.

If you go the adapter pattern route, you can support other sqlite libraries like better-sqlite3. Sometimes sqlite/sqlite3 doesn't fit a user's use-case and alternative libraries do.

Same deal with the pub/sub mechanism. You have the Typescript types defined to create abstractions. Would be nice to see adapters for Redis streams / kafka / etc, as in-memory pub/sub may not cut it after a certain point.

Great start on your library!


Hey! Thanks for the feedback. I'd just be worried people might not be using the correct libraries since I think they still provide different functionality.

So exporting one Database that allows people to pick which driver they want to use might be a simple and user-friendly solution (incl. better-sqlite3) which people have asked for.


Interesting project! I have lately been trying out these cool and perhaps in some way similar sqlite libraries:

- https://github.com/haxtra/kvstore-sqlite (Basic key-value store for SQLite databases.)

- https://github.com/haxtra/super-sqlite3 (Fast SQLite library with optional full db encryption, simple query builder, and a host of utility features, all in one neat package.)

- https://github.com/haxtra/live-object (Standard javascript object with built-in JSON serialization to file. Dreams do come true sometimes.)

All from github user: https://github.com/haxtra

I think the super-sqlite3 source might also be an inspiration for the 'driver' topic: "super-sqlite3 is a thin wrapper around better-sqlite3-multiple-ciphers, which extends better-sqlite3 (the fastest SQLite library for node.js) with full database encryption using SQLite3MultipleCiphers. super-sqlite3 then adds its own query builder and other convenience features."

And do check out this user's XRay (JavaScript object browser component) library for your preferred component framework.

In my bookmarks I also found these other related and interesting links: - https://dgl.cx/2020/06/sqlite-json-support (An article about SQLite as a document database, using the relatively new 'genrated columns' feature of sqlite 3.31.0, which you seem to be using)

- https://www.npmjs.com/package/best.db (easy and quick storage)

- https://tinybase.org (This project seems to be an even more similar idea to Doculite) https://github.com/tinyplex/tinybase (The reactive data store for local-first apps.)

Good luck with your project!


> I recommend that you don't make it a direct dependency, but use dependency injection or an adapter pattern that way your library isn't dependent on a specific package version of sqlite/sqlite3

Do you have examples of what you mean?

Do you just mean in the code or is this something about how it is imported as a dependency in package.json?


They would have to install sqlite and sqlite3 separately and feed the sqlite instance to your library when creating a new instance of your library. It would no longer be a dependency in package.json (maybe a devDependency when you write unit tests using it). Look at how the npm sqlite source code does it as an example.

https://github.com/kriasoft/node-sqlite

In the src/index.ts file, the `open()` function takes in an instance of sqlite3, vs the file importing it from the sqlite3 package itself.

An adapter / driver pattern would extend this where you can interchange usage of either the sqlite package or an alternative.

For example, let's say you have two SQLite drivers. They both allow you to execute a statement, but their methods and maybe parameters are named differently. One might expose a `exec()`, while the other exposes `run()` to do the same thing.

Your codebase probably is coded for one or the other, which means you can't freely interchange the drivers.

So what you do to support this is create a an interface with methods that describe what you want to do. In the above case, you might describe a method called `executeStatement()`.

Then you create two classes, one for each driver. They both implement `executeStatement()`, but under the hood, they'll run `exec()` and `run()` in their implementations.

So your main code now will accept an instance of anything that implements your interface, and instead of calling like `exec()`, you'll be calling `executeStatement()`, which under the hood calls `exec()`.

So it goes like this:

- Define common interface for interacting with different database libraries (the abstraction)

- Implementation class using that interface (the drivers)

- Your constructor takes in anything that conforms to that interface

- The user creates an instance of your implementation (eg SqliteDriver) and feeds in the instance of the driver (eg `new SqliteDriver(<output of npm sqlite open()>)`

- Your code calls the interface methods in place of the actual database calls instead

You had something like:

  db.collection('users').set(..)
So rather than calling SQLite `run()`, it'd be calling your interface method `executeStatment()` (which calls `run()`) instead in that `set()` call.

It looks like the "Bridge" pattern is what you want here:

https://www.phind.com/agent?cache=cll1zw1n60010la08mn89dd8g

The generated code is Java, but the idea and concept is the exact same that I've described above.


The one feature that I'd want out of this is atomic writes. If I have a document and want to increment the value of a field in it by one, I'm not sure that's possible with Doculite today: if two requests read the same document at the same time and both write an incremented value, the value is incremented by one, not two.

The way _I_ would expect to do this is something like this:

  const ref = db.collection('page').doc('foo');
  do {
    const current = await ref.get();
    try {
      await ref.set({ likes: current.likes + 1 }, { when: { likes: current.likes } });
    } catch {
      continue;
    }
  } while (false);
If `set()` attempts to write to the ref when the conditions in `when` are not matched exactly, the write should fail and you should have to try the operation again. In this example, the `set()` call increments the like value by one, but specifies that the write is only valid if `likes` is equal to the value that the client read. In the scenario I provided, one of the two concurrent requests would fail and retry the write (and succeed on the second go).


Interesting. Updating values via incrementing them is a use case I barely had in Firebase. I mostly only dealt with 1-time updates to values, e.g. by the user or scheduled jobs. In which scenario would the current design cause you problems?


Incrementing is only one possible use case. Any time you read data and then write back based on that data, you need to ensure that nobody wrote to the document in the interim. RDBMS do this with transactions.

Consider the case where a user is submitting an e-commerce order. You want to mark their order as processed and submit it for fulfillment. If you read the order to check if it's already submitted, two requests to submit it made at almost the same time (e.g., hitting the button twice) will both read that it's unfulfilled and try to each submit it.

By doing an atomic write you can be sure that at most one request submits the order.


Thank you for the feedback. Seems this is important.


I’m sorry to be this frank, but if you weren’t even aware of the importance of transactional safety for read-modify-write operations, you shouldn’t be in the business of creating such a library and advertising it. This is really basic database stuff, so you are only at the beginning of the learning curve regarding database topics.


If we allow ignorance to be the gatekeeper of progress, we will stifle the growth of every individual around us.

No one person can know everything. Teaching how to solve these problems will produce better software and better people too.

https://xkcd.com/1053/


I have no issues with the OP having that library as a side project and learning experience. But at their experience level it’s not suitable to be promoted publicly. To be fair, they only asked for opinions on it.


If I had a time machine, I'd use it to identify who in my life gave me this same "advice" and gift them a copy of their obituary from the future.

For a commercial product, yours is honest feedback.

But this isn't that. You popped into a literal show-and-tell to heckle some kid proud of his macaroni art. You didn't even critique the product, just insulted its maker. What the hell is wrong with you to think this is appropriate, or even helpful?


It’s possible to use optimistic locking and versioning to ensure that things haven’t changed underneath you unexpectedly.


Honestly, it's weird that you never ran into this. This is a requirement for any data store and I've never not used atomic updates at any company. Most basic example: what if two users load the same object at the same time and you want to increment a "seen" count...?


I guess it was sufficient to not have that kind of accuracy in most of the application to deliver user value. There probably were parts of the application where we used transactions. Could be a cool feature to build for this, though.



That's just sugar on top of atomic writes


Interesting! I thought it would be simpler to implement than full blown atomic writes, so I assumed it was different.


I'd like to enable the same in my startup.

What are you using for this today?


If you're using something like MySQL or Postgres, you can do locking with the built in transaction primitives (see, for instance, SELECT FOR UPDATE). Mongo has tools like findAndModify which can help.

If you're using SQLite you can use exclusive transactions to perform the read+write but I'm sure there's probably a more efficient way to go about it. You can craft an UPDATE that selects on the primary key and the condition and then use sqlite3_changes() to get back the number of records modified (and fail if it's zero), but that may not be possible with your setup.


Why do you have async reads and writes? There's no client-server setup here, using async / await just introduces pointless waiting.

https://github.com/WiseLibs/better-sqlite3


If you use the library on a server in a node.js environment, wouldn't it be useful to fetch data (e.g. Remix / NextJS)? Besides, I'm not sure if better-sqlite3 offers the listener functionalities I care about. Skimming the docs, it seems it doesn't.


better-sqlite3 is orders of magnitude faster than the async SQLite bindings. We found this to be true when testing SQLite options for Notion's desktop app anyways. The "why should I use this" bits sound boastful but are reasonable.

https://github.com/WiseLibs/better-sqlite3#why-should-i-use-...


Listener functionality could be something you'd have to write yourself, I suppose.

As for on the server, no. Sqlite is a c library, not a separate application- the work happens inside the node process. Regardless of how you do it, any call into sqlite is going to block. Adding promises or callbacks on top of that is just wasting CPU cycles, unlike reading from the filesystem or making a network request, where the work is offloaded to a process outside of node (and hence why it makes sense to let node do other things instead of waiting).

In fact, if you synchronously read and write within a single function with no awaits or timeouts in-between, you don't have to worry about atomicity- no other request is being handled in the meantime.


Yeah interesting. I'm wondering what happens when I'm starting to introduce things from "outside" the system that need async operations, like processing webhooks.


I assumed they were implementing the Firestore API (which I’m fairly certain is async).


Kind of nifty... Just curious if this is using the JSON functions/operators for SQLite under the covers?

https://www.sqlite.org/json1.html

Edit: where is the database file stored? A parameter for the Database() constructor seems obvious, but not seeing it in the basic sample.


Yes – I'm using JSON_extract and generated virtual columns https://www.sqlite.org/json1.html#jex Edit: the database is stored in a sqlite.db file in the cwd


Found where you're using those: https://github.com/thenorthbay/doculite/blob/c05d98c209d0031...

It looks like your tables have a single value column and a id generated column that extracts $.id from that value:

      CREATE TABLE IF NOT EXISTS ${collection} (
            value TEXT,
            id TEXT GENERATED ALWAYS AS (json_extract(value, "$.id")) VIRTUAL NOT NULL
      )
GENERATED ALWAYS AS was added in a relatively recent SQLite version - 2020-01-22 (3.31.0) - do you have a feel for how likely it is for Node.js users to be stuck on an older version? I've had a lot of concern about Python users who are on a stale SQLite for my own projects.


Often sqlite libraries just bundle SQLite instead of relying on the system one, better-sqlite3 does that just fine and has provisions for building against a custom SQLite version if you really need it.


Yup! I actually don't know about that. I figured this would be used for setting up newer, server-side Remix or Next projects rather than more dated ones. I could imagine that by keeping sqlite and sqlite3 up to date, people might also not be stuck on more dated versions of SQLite. The tracing/profiling functions were also only implemented recently in sqlite3 (node to C interface), in 2023.


Cool, might be worth taking an optional input for the path or "::memory::" for an in-memory database.


>DocuLite lets you use SQLite like Firebase Firestore.

Honestly, that sounds absolutely frightening to my ear. There are redis, there is mongo, orient, and plenty of document-based rapid-development databases which will be a substantially better solution than turning sqlite into Firebase.

Yes, you can use data change notification callbacks. It is a cool feature of SQLite, but did you know that their performance is a big concern in a large-scale database (document database grows very quickly by design)? What about COUNTs? Batching operations? Deadlocks? This will go out of hand quickly, because a hammer is used as a shovel here.


I just did a quick search and struggled to find anything about the performance issues you referred to - can you link something so I can take another look? Thanks for the suggestions.


From the code, it s easy to see that you create a WRITE transaction which has the side effect of triggering READ transactions. It is also important to understand that if you mix reads and writes you cannot do it well concurrently with SQLite, so every transaction will be sequential and blocking. Looking at the code further I understand that it will not scale well, since a single write can possibly trigger a waterfall of callbacks creating bottlenecks. The problem in your case is that sqlite_update_hook doesn't transport back data and that it is listening for changes in all tables having ROWID optimisation on. So first things first you only need one such callback and a different approach for integrating it in a document abstraction than table name and rowid predicate in a dozen of registered callbacks.

You will unveil more problems with SQLite for this job as you dig deeper. What you really want is fast writes and dumb querying by document id, whereas SQLite gives you an ultra querying suite that you don't utilize but still have to pay with slower writes. This is a classic system-design problem. Just try rewriting your collection and document abstractions using proper document data storage and you will see how less complicated it is.


If you are serious about transactionality, data consistency, and isolation levels, this sounds different from how you want to go. Both Firestore and Realm have shortcomings here. Fauna (where I work at, btw) and Surreal with FoundationDB on the backend, and Spanner are the only ones that can guarantee strict serializability in a distributed environment. I could argue that Fauna is the most turnkey (least pain to try, test, implement). With those "strict serializable" db's it is much easier to avoid data anomalies, as the ones mentioned in this thread.


Yup! I doubt this project will naturally evolve into whatever you're describing. And that's ok! This project might help people who want to use SQLite like Firebase, with a similar API and experience but without letting network requests increase latency (see: https://news.ycombinator.com/item?id=31318708).

Addressing main points like implementing atomic transactions (read and write operation on a doc) seems warranted since it exists in Firebase as well.


I forgot to share this very cool alternative approach to realtime reactivity, via websockets, by subscribing to actual raw queries from the frontend!

- https://github.com/Rolands-Laucis/Socio

- https://www.youtube.com/watch?v=5MxAg-h38VA&list=PLuzV40bvrS... (video updates of the code and functionality)


I'd see if you can easily port the on top of browser based sqlite in wasm, that's expand your user base and lead to some of the "holy Grail" in the offline first/sync systems


Why not just use IndexedDB in the browser if you don’t want an SQL database?


a) IndexedDB's API is horrid, so everyone wants to layer an abstraction on top anyways.

b) You can't use IndexedDB on the server, so you wouldn't be able to write sync code that runs on both the client and the server


The same reasons you wouldn’t use IndexedDB on the server?

Modern offline-first applications include a local backend every bit as complex and demanding as a server-based backend.


This is cool. There are a tonne of options for local KV stores that likely outperform this by a large margin, but the obvious benefit here is that SQLite is really simple to configure and operate!


I wouldn’t count on it. SQLite is unreasonably fast in ways you’d think it shouldn’t be.


Thank you!



That is an interesting approach.

> 6) SQLite is a proven, stable, and well-liked standard.

How does adding this adapter on top affect the stability?

Have you looked at SurrealDB? - https://surrealdb.com/

Seems like it would provide you with what you need.


> Alternative, proven Document Databases only offered client/server support.

We currently use Realm for this use case. It’s a local database just like SQLite, but with native support for documents and listeners. Did you try that out?


Just didn't find it quickly enough after some browsing. Also, using SQLite seemed so simple.


Thank you, this can be handy really. I would love to have compatible replicas (implementing both the same API and the same storage schema) for other languages like C# and Python.


Does this mean documents in a database, or a database as a document?

I've tried the second, but the time comes when you need to (re)order items which gets clumsy.


If I understood you correctly, Documents in a database, and database as a file. If else please let me know.


> Listeners and real-time updates enhance UX greatly.

how is this implemented?


I'm using SQLites underlying Data Change Notification functionality. It's exposed by one of the libraries mine relies on.


I read point 1 as against using SQLite at first. :-D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: