
DuckDB – An embeddable SQL database like SQLite, but supports Postgres features - pcr910303
https://duckdb.org/
======
simonw
I spent a while looking at this today. It's really interesting.

It's not based on SQLite at all (except for borrowing the SQLite shell
implementation) but it looks very much like SQLite, in particular:

\- It's designed to work as an embedded library, eliminating the network
overhead you usually get when talking to a database

\- Each database is a single file on disk

\- It ships as an "amalgamation" build - a single giant C++ file (SQLite is a
single giant C file)

Impressively, if you run "pip install duckdb" it Just Works - you can then
"import duckdb" and start using it, with an interface that looks very similar
to the sqlite3 module that ships with Python.

The key reason this exists is that it's a column store, with vectorized
operations across columns - making it ideal for analytical workloads. This
blog entry has some benchmarks that illustrate how well it works in that
regard: [https://uwekorn.com/2019/10/19/taking-duckdb-for-a-
spin.html](https://uwekorn.com/2019/10/19/taking-duckdb-for-a-spin.html)

It's also backed up by some strong computer science. It's by the academic
researchers behind MonetDB and includes implementations of a bunch of
interesting papers: [https://duckdb.org/docs/why_duckdb#standing-on-the-
shoulders...](https://duckdb.org/docs/why_duckdb#standing-on-the-shoulders-of-
giants)

It's a really interesting piece of software, and unlike many other "new
databases" it feels like it fills a very genuine gap in my toolbox.

~~~
webmobdev
If everything you said is accurate, somebody should make another pitch for
WebSQL (
[https://en.wikipedia.org/wiki/Web_SQL_Database](https://en.wikipedia.org/wiki/Web_SQL_Database)
) with a custom version of DuckDb!

Every browser maker was interested in implementing it but the W3C couldn't go
ahead with it because everyone chose to implement it using SQLite, where as
W3C required more than one db back-end implementation to move forward.

~~~
wffurr
Having sqlite included in all browsers with standard bindings would have been
great.

~~~
jansommer
It would! But it would also be a security nightmare. Exploit SQLite and you
could use it in Safari, Chrome and Firefox - everybody could be affected.

~~~
andyfleming
How would that be different from someone editing something in
window.localStorage?

~~~
jansommer
If that's all you need, it wouldn't make sense to include SQLite in the
browser. If you're going to include a relational database you might as well
use it

------
ksec
I really wish _ALL_ software project / framework / libary could follow the
lead here. Instead of your marketing page telling me how world changing
awesome tech you have, which really is a _consumer_ marketing strategy. Just
do it like DuckDB,

When to use DuckDB ; When to _not_ use DuckDB

~~~
FridgeSeal
We should make a list of technology that does this, because I know Clickhouse
also has a reasonably detailed page on when to use it and when to not use it
and why. Postgres also has a very nice “do and donts” wiki page.

~~~
stjo
Postgres “do and donts”:
[https://wiki.postgresql.org/wiki/Don%27t_Do_This](https://wiki.postgresql.org/wiki/Don%27t_Do_This)

------
adrianmonk
First of all, really neat project.

There's an issue with the Installation section of the web page, though.
Running Chrome on Linux, it says "System detected: Linux", which is right. But
under "CLI", it offers me a download of duckdb_cli-osx-amd64.zip.

Just in case it was just the zip file name that's wrong, I downloaded it, but
the duckdb binary inside is a "Mach-O 64-bit x86_64 executable" according to
the file command.

The correct file is available if I click on Other Installations
([https://duckdb.org/docs/installation/](https://duckdb.org/docs/installation/)).

Also, if I try it on running Firefox on Linux, it says "System detected: UNIX"
and gives me the OS X download.

~~~
mytherin
Thanks for reporting that! We will have a look at fixing it :)

~~~
alexchamberlain
FWIW I got the same issue on Android, though offering me the "correct" link
wouldn't do me much good either :)

------
viraptor
I was hoping from the title that it aims for postgres SQL compatibility, but I
can't find it explicitly mentioned in the docs. This really makes me think I
really want something like sqlite://memory which completely disregards speed
or even persistence. Instead you could say for example "open an in-memory
database that behaves like postgres 9" and run your tests against it. With
typical fixtures of 10 or so rows, you wouldn't even need a query plans - just
naive filte+copy over whole tables for most operations.

Currently I'm using
[https://pypi.org/project/testing.postgresql/](https://pypi.org/project/testing.postgresql/)
which spawns a temporary server per each test suite, which has quite a lot of
overhead.

~~~
c17r
Man, I’ve wanted the “compatible SQL engine with only RAM storage” for testing
for YEARS. Closest I got was some shenanigans with MSSQL’s LocalDB.

~~~
mStreamTeam
I use docker to spin up new postgresql DBs on the fly

~~~
GordonS
Same here, and I run the same set of migrations that run in production. To he
clear, this is only done once per test session, not for individual tests, and
the tests are written in such a way that they don't interfere with each other.

The overhead is actually pretty small, less than 10s. I'd saw too much for
unit tests, but we'll within the tolerable range for integration/functional
tests. Compared with the time I'd spend hacking together some brittle and
unrealistic in-memory alternative, I much prefer to use a real database.

------
awb
Cool project. Really appreciate the "When to use DuckDB" and "When to not use
DuckDB" sections.

Anyone using this in production? Also, any benchmarks vs. other DBs?

~~~
legg0myegg0
I work at a Fortune 100 company and we have this in production for our self-
service analytics platform as a part of our data transformation web service.
Each web request can do multiple pandas transformations, or spin up it's own
DuckDB or SQLite db and execute arbitrary SQL transformations. It fits our use
case like a glove and is super fast to/from Pandas.

------
sradman
3-month old HN thread [1] with good information from the authors and others.

“An embeddable column store for data science” (Pandas and dbplyer) would have
been a good description at that time but the addition of JDBC/Java support
expands the use cases.

[1]
[https://news.ycombinator.com/item?id=23287278](https://news.ycombinator.com/item?id=23287278)

------
mastazi
I think the title chosen by the poster misses the mark a bit, the main point
seems to be that it’s an OLAP DB optimised for analytics.

------
ryndbfsrw
The performance of DuckDB's predecessor, MonetDBLite, was/is stellar. DuckDB,
when I tried in 6 months ago, was a bit behind in terms of csv parsing or
aggregation by groups and didn't seem much faster than regular sqlite so I
didn't really jump at it. Does anyone here know if the performance improved
since then and how it compares to MonetDBLite?

~~~
mytherin
We are currently actively working on performance optimization, and have
recently finished implementing parallelization. Expect more performance
improvements soon :)

------
Piisamirotta
This is slightly offtopic but do you guys have any good guides/best
conventions how to save timeseries data to sqlite/similar? I'm no DB expert so
struggling with this. I'm having a dynamic number of measurements that usually
have the same timestamp. So one table for timestamp_id, signal_id and value?

------
jononor
What would be the recommended way to migrate data from SQLite and DuckDB?
Should I dump SQL from SQLite and load in DuckDB? Or use Parquet, or CSV
export/import? I think those interested in the tagline "like SQLite but more
features" might be interested in a quick howto/guide like that.

------
sterlinm
This talk they did with a database research group at CMU may be of interest as
well. [https://youtu.be/PFUZlNQIndo](https://youtu.be/PFUZlNQIndo)

------
microtherion
A suggestion for the web site: It would be good to mention the license more
prominently. I had to click through to Github to find out what it was, and I
believe this information would matter to more people than just me.

~~~
microtherion
Correction: As CRConrad, whose comment was inexplicably downvoted, pointed
out, the MIT License IS mentioned on the front page (And the Wayback Machine
proves it was mentioned before the HN post).

------
jsumrall
Very cool project! Seems basically to be a column store and stores everything
in a single file.

That reminds me of LMBD, which is similar in some ways, except bring a
key/value store. Very highly recommend checking it out too to see what
embeddable DB makes sense for your project.

[http://www.lmdb.tech/doc/](http://www.lmdb.tech/doc/)

------
chrismorgan
What is it that makes a database an OLAP database rather than a generic
relational database?

Expressed otherwise: why would I choose DuckDB over SQLite, or SQLite over
DuckDB?

~~~
FridgeSeal
OLAP databases can/are still Relational databases. The difference is that
they’re optimised for different workloads.

SQLite/MySQL/Postgres/MSSQL etc are all OLTP databases whose primary operation
is based around operations on single (or few) rows.

OLAP databases like ClickHouse/DuckDB, Monet, Redshift, etc are optimised for
operating on columns and performing operations like bulk aggregations, group-
bys, pivots, etc on a large subsets or whole tables.

If I was recording user purchases/transactions: SQLite.

If I was aggregating and analysing a batch of data on a machine: DuckDB.

I read an interesting engineering blog from Spotify(?) where in their data
pipeline instead of passing around CSV’s or rows of JSON, passed around SQLite
databases: DuckDB would probably be a good fit there.

~~~
chrismorgan
I gather that there may be some differences in functionality offered, but that
it’s probably not much. So I presume performance is the biggest area of
difference. In that regard: how big _are_ the differences in performance for
each category’s representative workloads?

~~~
legg0myegg0
Before the latest optimization, and only using 1 core, vs. SQLite we were
seeing 133x performance on a basic group by or join, and about 4x for a pretty
complex query. It was roughly even to Pandas in performance, but it can scale
to larger than memory data and now it can use multiple cores! As an example, I
could build views from 2 Pandas DataFrames with 2 columns and 1 million rows
each, join them, and return the 1 million row dataset back to Pandas in 2
seconds vs. 40 seconds with SQLite/SQLAlchemy... Pretty sweet. DuckDB is going
to be even faster now I bet!

~~~
chrismorgan
How about the other way? When will SQLite perform better than DuckDB?

~~~
FridgeSeal
If your workload is lots of singular INSERT or UPDATE’s.

------
3np
Please think twice before choosing an embeddable database for hostable
software. I see so many open-source projects that get hampered own the line
because they chose sqlite and there's now a SPoF that would require a
significant rewrite to eliminate - especially as sqlite depending on
configuration either corrupts or becomes extremely slow when stored on
distributed/network file systems. I assume this goes for duckdb as well.

Embeddable databases absolutely have legit use-cases, but I see them being
used for projects where it'd have saved so much trouble to use a proper ORM
(perhaps with sqlite support!) from the beginning.

~~~
brunoluiz
Not sure where an ORM would help in this case. If anything, it might get in
the way sometimes (in terms of performance), as some queries might not be well
optimised.

But yeah, people shouldn't just "follow the hype". They should always ask why
would use this and not that.

~~~
737maxtw
I think ORM might have been brought up because in many embedded DBs there is
some level of language native bindings that eliminate the need for an ORM. A
couple of examples that come to mind would be mnesia (KV store rather than SQL
but still an example) or some of the SQLite providers for c# which give you
their own mini orm (that you will have to refactor if you change out dbs
instead of using sqlite with an orm from the get go)

My preference for ORMs is still on the 'give me a SQL DSL and map my objects'
side. Let me use SQL as SQL, give me convenience in interfacing with the DB.
Complex features like object tracking tend to be nice up front but require
extra diligence down the line to avoid pitfalls, and tend to be risky from a
standpoint of composability.

Sorry. Almost went on an ORM rant there.

------
davidgaleano
Can multiple separate clients access the same single-file database at the same
time?

~~~
hfmuehleisen
One of the DuckDB authors here. The answer is yes, but only in read-only mode
at the same time.

~~~
davidgaleano
What about one single client writing and multiple clients in read-only mode?
Any problems with storing the file on network storage? Basically, how far can
you push it before it is better to just use PostgreSQL?

------
KingOfCoders
Sorry for my perhaps stupid question,

has anyone tried this with ESP32/Arduino? I would need an anlytical DB for an
hiking application (Temperatur/Pressure data etc.)

~~~
Avamander
Ooh, this sounds like a fun use-case but a power loss sounds super painful.

~~~
KingOfCoders
Would also store on MicroSD.

~~~
jimsmart
That’s part of the problem. Power loss during SD card write will, more likely
than not, cause corruption - FAT file systems do not have journaling.

~~~
KingOfCoders
Well yes. But then it's just hiking not emergency services, SD cards also do
break, so I would need dual slot etc.

------
torrance
Strict typing has been the one big feature missing from SQLite, that this
presumably brings.

~~~
gralx
Foreign keys enabled by default or by configuration setting would be another
big feature missing from SQLite3, but I couldn't find any mention of foreign
keys in DuckDB's documentation.

~~~
joppy
SQLite has foreign keys, but they are disabled by default for backwards-
compatibility reasons.
[https://sqlite.org/foreignkeys.html](https://sqlite.org/foreignkeys.html)

~~~
gralx
And it's not possible to enable them by default - there is no existing
configuration setting to override that design choice. They hope to include the
feature in SQLite4, as I understand it.

~~~
ComputerGuru
SQLite 4 was killed off in favor of just bringing a few changes to SQLite 3
iteratively.

[https://www.sqlite.org/src4/doc/trunk/www/index.wiki](https://www.sqlite.org/src4/doc/trunk/www/index.wiki)

------
aabbcc1241
Simple, embedded database deserve more attention, especially those with native
language support, e.g. map,filter,reduce methods on wrapped data that behave
similar to javascript array; and support to nested data structure (like auto
join and normalize).

(I made one using Proxy API, called ref-db. The cons is it require manual
indexing to speed up lookup by non-primary key)

------
xiaodai
Say I want to fix logistic Regression. Can I do that in database? Or have to
extract it out as a dataframe? Cna I stream the data?

~~~
winrid
If you want to do linear regression aggregations with any DB, one thing you
can do is store the coifficents and then aggregate them on request. You
sacrafice some accuracy for speed.

~~~
xiaodai
Model fitting. How to do?

~~~
prionassembly
Stochastic gradient descent.

------
cmrdporcupine
I love/hate it when I see pieces of software come out that would solve
problems I had 10 or 15 years ago and had imperfect solutions for. This would
have been great for a software project for call centre analytics I was
employed to work on back in the mid-2000s. I believe I used HSQL (embedded
Java DB) but it was really not the right tool. At the time I looked at MonetDB
but its architecture wasn't right for what I was doing. This would have fit
the niche perfectly.

This is really neat. Wish I had a problem I could use it for now.

~~~
tracker1
I used embedded firebird for a couple projects in the early/mid 2000's myself.
Wish it were a bit more popular.

------
ulanmaasan
So I tried a simple query in the live demo and it seems that the WHERE clause
isn't filtering anything. Am I missing something?

    
    
      SELECT * FROM part WHERE p_size = 7;

~~~
mytherin
Looks like a bug, likely related to an optimizer since adding count(*) does
produce the correct result. I will have a better look after the weekend. The
demo runs a relatively old version so it’s possible it has already been fixed
in the meantime. Thanks for the report!

------
agustif
I was discussing with a coworker what would be the best option, embbbedabble
or service for a database runnning for a Windows GUI app.

My bet was on embeddable, so SQLite, is that the right choice?

It seeems a nightmarish scenario to have 500 postgres instances running on
client machines and being accessible from the internet albeit with security
enabled and such...

am i right?

~~~
alecco
Countless mobile apps use SQLite.

~~~
agustif
In this case though I'm talking of a COBOL backend very old windows app.

We want to move data from COBOL files to a database, we're making our own
migration tools/scripts in python (In order to mainly rename weird name
table/column names to sanish ones), and we can target any SQL flavour/database
we want to insert that data at.. the question is what's the better call
here...

SQLite seems pretty good enough, at least until we can move our whole app to a
more web-friendly environment at which point something like postgres might
make more sense? but that could take a couple years... (we're a small
team/company in a very specific niche/locale)

~~~
737maxtw
SQLite works well for app files. Bentley uses it for all sorts of stuff and I
had way fewer corrupt files in their format than Autodesks. :)

Do note that sqlite itself doesn't have the kindest handling of date/times,
you are at the mercy of your data provider (unless you are going native)
Especially when timezones are involved.

If it's running on windows machines you could consider Sql Server LocalDB
(runs in user mode, i believe you are allowed to distribute as such in an
app). But it doesn't sound like that would make sense from what I've read for
your case and tech stack.

Sqlite is a good start because even if you are wrong, it should be pretty dang
easy to migrate to whatever is right. :)

~~~
agustif
First of all thank you for all this info.

Actually SQL Server was one of the other contendents, but I disregard it due
to it not being open source and having some hard caps in the free license (I
think it's 10GB)...

But, I didn't knew about SQLserver local db, so we will look into that!

------
kfk
How does duckdb compare to aws athena or spectrum? Both services offer sql
directly on parquet files. I guess you would say duckdb is not distributed but
if you create one on time when the user needs to run the sql on the parquet
you’d be in a similar use case?

------
stormdennis
So could the DuckDB, Postgresql combination function in a manner analogous to
Firebird which has both embedded and server modes. I'd find that really
exciting

------
dcl
I will definitely look at using this for some personal projects. The R API is
greatly appreciated. Thanks for alerting me to it.

------
akkishore
Can we create a view over multiple parquet files?

~~~
mytherin
Yes, totally possible. You can use UNION ALL to merge them together. We are
also working on globbing support for the Parquet reader that should be
released next week [1].

[1]
[https://github.com/cwida/duckdb/issues/773](https://github.com/cwida/duckdb/issues/773)

------
rilut
Really want to use this for EDA. I hope it has support for client apps like
DBeaver or something

------
pknerd
Any GUI Client to explore data?

~~~
infinite8s
Web or desktop?

------
stormdennis
This sounds great. I wonder if there are plans for an ODBC driver at any
point?

------
zmmmmm
Amazing work!

It was asking a lot, but still :-( that JSONB and range datatypes are missing.

------
threcius
Good name.

------
xgenecloud
Kudos, much needed. Neat documentation. Would love to try with JS APIs.

------
unnameduser1
Can someone recommend whats the best way to connect to a memory DuckDB
instance from php(php_fpm) and get return in usable format?

E.g. would be used for offline statistical analysis

Alterntively, any way to export cli sql results as csv?

------
secondcoming
Is this backward compatible with SQLite?

~~~
jononor
The on-disk format is different it seems. But there is an implementation of
the SQLite C API for interacting with the database. So it depends on which
level / what purpose you want compatibility.

------
mraza007
Just curious about these Do you think embeddable databases are common in car
navigation systems

------
RileyJames
This sounds like a solution I’d use. Embedded. Simplifies simple sites.

