
Recently minted database technologies that I find intriguing - biggestlou
https://lucperkins.dev/blog/new-db-tech-1/
======
petercooper
I edit a database newsletter – [https://dbweekly.com/](https://dbweekly.com/)
– so tend to always have my eyes out for new releases, what's coming along,
and what not. And I thought I'd share a few more things that have jumped out
at me recently in case anyone's in the mood for spelunking.

1\. QuestDB – [https://questdb.io/](https://questdb.io/) – is a performance-
focused, open-source time-series database that uses SQL. It makes heavy use of
SIMD and vectorization for the performance end of things.

2\. GridDB - [https://griddb.net/en/](https://griddb.net/en/) \- is an in-
memory NoSQL time-series database (there's a theme lately with these!) out of
Toshiba that was boasting doing 5m writes per second and 60m reads per second
on a 20 node cluster recently.

3\. MeiliSearch -
[https://github.com/meilisearch/MeiliSearch](https://github.com/meilisearch/MeiliSearch)
– not exactly a database but basically an Elastic-esque search server written
in Rust. Seems to have really taken off.

4\. Dolt – [https://github.com/liquidata-
inc/dolt](https://github.com/liquidata-inc/dolt) – bills itself as a 'Git for
data'. It's relational, speaks SQL, but has version control on everything.

TerminusDB, KVRocks, and ImmuDB also get honorable mentions.

InfoWorld also had an article recently about 9 'offbeat' databases to check
out if you want to go even further:
[https://www.infoworld.com/article/3533410/9-offbeat-
database...](https://www.infoworld.com/article/3533410/9-offbeat-databases-
worth-a-look.html)

Exciting times in the database space!

~~~
gen220
Has anybody tried using Dolt? It looks quite young.

I think this idea is really valuable, but I usually see it implemented as a
time-series extension on top of Postgres or MySQL, like SQLAlchemy-Continuum
or TimescaleDB. i.e. you caan get most of the useful git-like time-travel
semantics (modulo schema migrations) out of timeseries data with a separate
transaction history table.

I'm curious what Dolt's performance profile looks like (i.e. how reads and
writes scale vs "branch"-count, row-count, and how they handle indices across
schema migrations), since the aforementioned solutions on Postgres are
building on an already-very-performant core.

edit: TerminusDB also looks very cool, although it's approaching this problem
from a graph-database angle rather than a relational one. Their architecture
(prolog server on top of a rust persistence core) also seems super
fascinating, I'd love to read more on how they chose it.

~~~
Fiahil
I work as a SWE at a large AI consultancy. We've been experimenting with "git
for data" products for a while, and we've been trying to get rid of them
(notably Pachyderm) for -at least- 2 years.

What they all share is a) awful performances, and b) bad design.

Git semantics, ("branche", "merge", "commit") are not well suited for data,
because merging dataframes and creating "branches" often leads to
misunderstandings and delays. Time travel is very nice to have, but it's often
the case where you would like to consume your input datasets at different
point in time in the same repository (unless you do one dataset per
repository, but then, what's the point ?).

Performances are bad, because all updates needs to go through some kind of
coordination mechanism (etcd, zookeeper, or raft directly). In a single
instance scenario, you often end-up flooding it or needing additional memory
to cope with the load. However, you could deliver high throughput and high
availability by using proper sharding and distributing updates to specific
masters (like you would do in any actor-based architecture).

As a replacement, we're now using a custom event-sourcing framework on top of
AWS S3/Azure blob. It's faster, more reliable, and most importantly, better
designed.

~~~
zachmu
We're building a very literal "git for data" product, Dolt. (doltdb.com). I'm
very curious about this criticism:

> Git semantics, ("branche", "merge", "commit") are not well suited for data,
> because merging dataframes and creating "branches" often leads to
> misunderstandings and delays.

Can you give a concrete example of what you mean? I'm wondering if this is a
failing of the tool you're using or the model itself?

> Time travel is very nice to have, but it's often the case where you would
> like to consume your input datasets at different point in time in the same
> repository

Dolt supports doing exactly this. See my reply to a child comment below.

> Awful performance

It's not obvious to me why this needs to be true in the general case, unless
it's caused by b) bad design. Are you mostly talking about write performance,
or are reads also a problem?

~~~
Fiahil
I don't want you to take it personally, but, from my point of view, Dolt falls
in the same bucket as DVC, Pachyderm and some others. The shortcomings are not
from the tools themselves, it's a fundamental issue with the design.

First, branching and merging. In git, branching allows you to make
uncoordinated parallel progress for the price of a reconciliation step. In a
datastore, you want the exact opposite: A single, consistent, available source
of truth. Having different branches of the same dataset bring more confusion
while solving zero problem.

Then, commits. In git, a commit represent a snapshot of the entire state of
your repository. This is particularly attractive because it guarantees that
your code will build no matter what kind of update will follow (without
incidence: editing a readme ; severely destructive: removing an src folder).
In a datastore, this is nice but unnecessary. As I mentioned it in this
thread, datasets move at different speeds, and attaching an new hash to
something that didn't change doesn't add value. However, I have to recognize,
I failed to mention earlier that datasets are often unrelated and not
relational. This would be to reconsider if it were the case, of course. Most
of the time, a dataset is represented as a single dataframe (or a single
collection of dataframes).

There some points where git semantics make sense: immutability of commits,
linearizability within branches. Both are extremely important if you want to
enable reproducibility of your pipeline. These are traits coming from Event
Sourcing.

Reproducibility is also claimed by DVC and Pachyderm, but their issue here is
more a problem of trying to do too much things at once but not managing to do
it right. Running code within Pachyderm pipelines was a recipe for disaster
and the first thing we got rid of.

As for performances, the write side is where it matters, because it needs to
be coordinated. Reads almost never are an issue with good caching. In any
case, it should be robust enough to fill the gap between csv files sent to s3
and a full kafka cluster, eg: not complaining for a few TB. To my knowledge,
the only multi-leader datastore suitable for storing sharded datasets as a
continuous log is Kafka.

------
chickenpotpie
I’ve always felt it strange that in almost every job I’ve had databases have
been one of the most important pieces of the architecture but the least
debated. I’ve spent hours debating languages and frameworks, but databases
always come down to whatever we have a license for/what others at the company
are using. Engineering teams will always say they make sure to use the right
tool for the job, but no one ever talks about if it’s right to keep using the
same database for a new product.

~~~
mywittyname
My experience is places with DBAs tend to keep very tight control over
database design than those who do without. I've worked at several places where
developers did not touch the database schema. They couldn't even propose
changes, instead, you'd give the DBA team the requirements and they would give
you a solution.

------
rst
Materialize is neat, but there are other database systems that refresh at
least some materialized views on the fly, while being smart about not
rebuilding the entire view every time. See for example, Oracle, where FAST
REFRESH ON COMMIT does most of what Materialize is advertised as doing, at
least for views which that feature can support (restriction list here:
[https://stackoverflow.com/questions/49578932/materialized-
vi...](https://stackoverflow.com/questions/49578932/materialized-view-in-
oracle-with-fast-refresh-instead-of-complete-dosnt-work) ). Mind you, this
comes with Oracle's extremely hefty price tag, so I'm not sure I'd recommend
it to anyone who isn't already stuck with Oracle, but it is technical
precedent.

It would be interesting to compare notes, and see what Materialize does
better.

~~~
frankmcsherry
Hi folks. Frank from Materialize here.

The main differences you should expect to see are generality and performance.

Generality, in that there are fewer limitations on what you can express.
Oracle (and most RDBMSs) build their Incremental View Maintenance on their
existing execution infrastructure, and are limited by the queries whose update
rules they can fit in that infrastructure. We don't have that limitation, and
are able to build dataflows that update arbitrary SQL92 at the moment. Outer
joins with correlated subqueries in the join constraint; fine.

Performance, in that we have the ability to specialize computation for
incremental maintenance in a way that RDBMSs are less well equipped to do. For
example, if you want to maintain a MIN or MAX query, it seems Oracle will do
this quickly only for insert-only workloads; on retractions it re-evaluates
the whole group. Materialize maintains a per-group aggregation tree, the sort
of which previously led to a 10,000x throughput increase for TPCH Query 15
[0]. Generally, we'll build and maintain a few more indexes for you
(automatically) burning a bit more memory but ensuring low latencies.

As far as I know, Timescale's materialized views are for join-free aggregates.
Systems like Druid were join-free and are starting to introduce limited forms.
KSQLdb has the same look and feel, but a. is only eventually consistent and b.
round-trips everything through Kafka. Again, all AFAIK and could certainly
change moment by moment.

Obviously we aren't allowed to benchmark against Oracle, but you can evaluate
our stuff and let everyone know. So that's one difference.

[0]: [https://github.com/TimelyDataflow/differential-
dataflow/tree...](https://github.com/TimelyDataflow/differential-
dataflow/tree/master/tpchlike)

~~~
sradman
Materialize sounds like it has reinvented Michael Stonebraker's StreamSQL [1]
and SAP's Continuous Computation Language (CCL) [2] which was created as part
of a StreamSQL competitor named Coral8 and lives on in an enterprise product
now named SAP HANA Smart Data Streaming. This space has gone by many names,
Streaming SQL, Complex Event Processing (CEP), and Event Streaming.

I think the Continuous Computation Language (CCL) name captures the essence of
these systems: data flows through the computation/query.

These systems have always had promise but none have found anything but niche
adoption. The two most popular use cases seem to be ETL-like dataflows and
OLAP style Window queries incrementally updated with streaming data (e.g.
computations over stock tick data joined with multiple data sources).

[1]
[https://en.wikipedia.org/wiki/StreamSQL](https://en.wikipedia.org/wiki/StreamSQL)

[2]
[https://help.sap.com/doc/PRODUCTION/e1b391d2a3f3439fbab27ed8...](https://help.sap.com/doc/PRODUCTION/e1b391d2a3f3439fbab27ed882618873/2.0.00/en-
US/streaming_ccl_reference.pdf)

~~~
frankmcsherry
The projects you've mentioned are attempts to address stream processing needs
with a SQL-like language. That is fundamentally different from providing
incremental view maintenance of _actual_ SQL using streaming techniques (what
Materialize does).

If you want to maintain the results of a SQL query with a correlated subquery,
StreamSQL in Aurora did not do that (full query decorrelation is relatively
recent, afaiu). I have no idea what TIBCO's current implementation does.

If you want to maintain the results of a SQL query containing a WITH RECURSIVE
fragment, you can do this in differential dataflow today (and in time,
Materialize). I'm pretty sure you have no chance of doing this in StreamSQL or
CCL or CQL or BEAM or ...

The important difference is that lots of people do actually want to maintain
their SQL queries, and are not satisfied with "SQL inspired" languages that
are insert-only (Aurora), or require windows on joins, or only cover the easy
cases.

~~~
sradman
> The projects you've mentioned are attempts to address stream processing
> needs with a SQL-like language. That is fundamentally different from
> providing incremental view maintenance of actual SQL using streaming
> techniques (what Materialize does).

With all due respect, CREATE SINK and CREATE SOURCE are SQL-like. I would
argue that the pipeline created from the set of SINKs and SOURCEs is the key
concept to grasp for developers new to your platform. The purity of the
PostgreSQL CREATE MATERIALIZED VIEW syntax and other PG/SQL constructs seems
like a minor selling point, in my (very narrowly informed) opinion. I hope I'm
wrong.

Our difference of opinion involves marketing language and perceived
differentiators. There are some important use cases for continuous SQL queries
over Kafka-like data streams that remain unaddressed (as far as I know). I
hope Materialize gains traction where others have failed to do so. If PG/SQL
compatibility was the only thing holding back this style of solution then
kudos to you and your team for recognizing it. Good luck (honestly).

------
sudhirj
I'm working on Redis adapter for DynamoDB - Dynamo is really a distributed
superset of Redis, and most of the data structures that Redis has scale
effectively to the distributed hash table + B-Tree-like system that Dynamo
offers. Having a well known and understood API like Redis is a boon for
Dynamo, whose API is much more low level and esoteric.

The Go library is in beta, working on a server that's wire compatible with
Redis.

[https://dbproject.red](https://dbproject.red)

[https://github.com/dbProjectRED/redimo.go](https://github.com/dbProjectRED/redimo.go)

~~~
tmzt
Will you be able to support Redis-native replication from DynamoDB?

~~~
sudhirj
It would be super complicated, but technically feasible. DynamoDB has a change
log called streams that would technically work, with a lot of aggregation,
even with the distributed nature. But I would think about it only as a late
stage and very expensive feature.

------
barking
I really wish one of the existing db technologies, Firebird, got a shot in the
arm. It has both embedded and server modes which makes it unique as far as I
know. Also the database is a single file which with firebirds "careful write"
methodology remains consistent at all times so while you can make a backup at
any time because it has MVCC, even a file copy of the database file with open
transactions should not be corrupted. The installer size comes in under 10 MB.
It's being actively improved, is open source with a very liberal licence but
sadly it only gets a tiny fraction of the attention that SQLite, postgres etc
receive

------
willvarfar
My understanding of TileDB is that it is 100% client-side. There is no server.
In a sense it’s like handling orc or paraquet or even SQLite files on S3,
(except tiledb are fancy r-trees) with a delta-lake-like manifest file for
transactions too.

I think in the future there’s going to be a sine-wave of smart-clients
consuming S3 cleverly, and then smartness growing in the S3 interface so
constraints and indices and things happen in the storage again, and back and
forth...

~~~
jakebol
This is a good description, except that TileDB (the open source client) is not
transactional but eventually consistent at least for S3 and other object
stores.

I like your point about consuming S3 cleverly, it's often difficult to get
good out of the box performance from S3 so abstracting that to the degree
possible is good for end-users. The cloud vendors though are always one or two
steps ahead of companies that build upon their services. AWS Redshift for
instance already can pre-index objects stored on S3 to accelerate queries at
the storage layer. It's difficult as a third party vendor to compete with
that.

~~~
biggestlou
This is a very interesting development that I'd like to learn more about.
Whenever I've played around with writing databases (just as toy projects) I've
always done so using RocksDB or something similar as a backend. This "thick
client" model, though, seems to have a lot of potential benefits, most notably
no need to worry about disk space or volumes (so say goodbye to a bunch of
config parameters) and no need for a tiered storage setup or S3 migration
tools (already accomplished!). Not ideal for most use cases but intriguing for
some!

~~~
jakebol
There are a lot of issues though with S3, latency, poor performance for small
reads / writes, timeouts, api rate limits, api costs, and consistency issues
poorly understood by third party developers.

A "thick-client" also doesn't perform well unless that client is located on a
node in the same region. I think as with everything it works well in some
cases and not well in others.

------
slifin
I do wonder how databases like Datomic, and Crux are perceived (if at all) in
the wider database community

~~~
dgb23
I'm frankly shocked by how well these Clojure Datalog DBs work. Both the data-
modelling side (datoms) and the query side are extremely expressive and well
integrated into Clojure itself as well.

The principles are (from my perspective): How would a database (API) look like
when both data changes over time and structural flexibility were first class
concepts. The result is almost mindbogglingly simple.

Datalog is already such a powerful language in and of itself and I really
wonder why it is still such a niche thing.

Don't get me wrong. SQL is great. It is there for a reason. And it is and
should be "the default" for most use-cases. But Clojure Datalog DBs have a
fairly clear and large advantage, especially when data is temporal and/or
graph-like.

------
matlin
I support FoundationDB's approach to databases which is basically provide a
consistent, distributed, and ordered Key-Value store then you can build
whatever type of database you need on top of it whether that's RDBMS,
Document, Graph, etc.

With that said, CouchDB 4.0 (on FDB) is going to be killer. Master-Master
replication between clients and server with PouchDB is phenomenal when you
remove the complicated eventual consistency on the server side.

And as a plug, I'm building a multi-tenant/multi-application database on top
of it.

------
andrewstuart
I've found databases fascinating and tried various DB's as they come out.

I always find some issue or caveat or problem and I decide in the end that
Postgres gets most of the way there anyway and I return to Postgres.

Whenever I get tempted by a shiny new database I remind myself "don't bet
against Postgres".

~~~
akulkarni
+1000 Postgres. This is exactly why we chose to build TimescaleDB on top of
PostgreSQL. Reliability, JSON, mature tooling, tons of connectors.

------
grizzles
Software is advancing so fast. Interesting to constantly reconsider the things
I consider myself ahead of the curve on vs behind the curve on. Prisma looks
great so I've updated my I want functional dbs, not ORMs post:
[https://github.com/ericbets/erics-
designs/blob/master/funcdb...](https://github.com/ericbets/erics-
designs/blob/master/funcdb.md)

~~~
kiwicopple
From your blog post and the linked article:

> making a database look like an rpc api

I'd recommend checking out PostgREST for this (if you're using Postgres). We
used this approach in my previous startup quite successfully.

We also have plans at Supabase[1] to make Postgres databases semi-ephemeral.
You'll be able to spin them up/down using a restful API. The schemas/types
will be accessible via a restful API: [https://supabase.github.io/pg-
api/](https://supabase.github.io/pg-api/). Not quite as simple as SQLite, but
a good alternative.

Databases are cool.

[1] [https://supabase.io](https://supabase.io)

------
pachico
> What I have yet to see but always secretly wanted, however, is a database
> that natively supports incremental updates to materialized views. Yep,
> that’s right: Materialize listens for changes in the data sources that you
> specify and updates your views as those sources change.

This is precisely one of the features that make ClickHouse shine

------
mywacaday
Does anybody know of a good educational resource on software/best practices
that is kept up to date. Ideally something that does not include the latest
bleeding edge but things that are battle hardened or getting there. Something
that includes open source and commercial software would be ideal.

------
StavrosK
This is only tangentially related, but I rediscovered an old project of mine
from years ago today and am rather excited about it:

[https://github.com/skorokithakis/goatfish](https://github.com/skorokithakis/goatfish)

It's basically a 200 line document database in Python that's backed by SQLite.
I need to store a bunch of scraping data from a script but don't want a huge
database or the hassle of making SQLite tables.

Goatfish is perfect because it stores free-form objects (JSON, basically),
only it lets you index by arbitrary keys in them and get fast queries that
way.

It's pretty simple, but a very nice compromise between the simplicity of in-
memory dicts and the reliability of SQLite.

~~~
SahAssar
These days I'm guessing it's better to just use sqlites builtin json support
and expression indexes, right?

~~~
StavrosK
Probably so, which means I should adapt Goatfish to work as a layer over that,
since its API is much nicer than raw SQL.

------
statictype
>What I’m really hoping for is the emergence of extremely “hackable,”
resolutely non-monolithic DBs that provide a plugin interface for highly use-
case-specific data types,

Isn't this basically what FoundationDB is?

------
xchaotic
Interesting notes but I feel like the db itself has been commoditised and the
battle is elsewhere now. So anyone building a database engine today, will find
out that to make it sustainable they also need an ecosystem on top of it,
tooling, community, paid support, active devs, consultants (for which they may
have no runway) Finally I find anything that calls itself a database and uses
S3 as a backend a bit ridiculous. S3 has eventual consistency so you can’t do
the operations that differentiate a database from a file system.

------
hn_check
"What I have yet to see but always secretly wanted, however, is a database
that natively supports incremental updates to materialized views"

SQL Server ala 10+ years ago enters the discussion.

~~~
jiggawatts
I've actually used materialized views in production on SQL Server, and they're
great when they work, but they have far too many limitations. Most of these
make sense, but some of the join limitations are internal constraints of the
engine, not a fundamental limit. This prevents their use for a wide range of
scenarios where they would be useful...

------
moonchild
> And it’s worth noting that many have tried to do what Prisma does and failed
> because they.

Because they what?

~~~
biggestlou
Derp! Fixed that.

------
leetrout
I miss rethinkdb. I loved their approach and their tooling.

~~~
biggestlou
Yeah, very cool DB. I even considered doing a chapter on it in the next Seven
Databases in Seven Weeks book but alas...

------
tehlike
Mandatory mention: RavenDB. Probably the only LINQ native database with lots
of performance optimizations squeezed in.

------
remorses
I think that the graphql adapters like hasura and goke are also an important
innovation, for small mvp projects you can create a graphql api to query your
database directly from the frontend, this reduces the development time by a
factor of 2 at least.

~~~
biggestlou
Hmmm. I'm not finding Goke anywhere. Could you provide a link?

------
imglorp
I'll just throw a note for a new product, AWS's QLDB. It's an internal managed
product that combines a replicated, immutable, versioned document database
with ACID transactions and an immutable, provable history of every
modification. There's some streaming and subset SQL on the back end.

Something this focused should have a few applications where bit level
auditability matters, eg financial, chain of events, etc. Of course it comes
with some tradeoffs vs a relational or kv db.

I wonder if there would be room for a self-hosted clone?

------
lima
ClickHouse also supports incremental streaming from Kafka into a materialized
view.

You can even detach and reattach the view from its backing table.

------
coolleo
What is the best source to create your own database, just for learning
purposes ?

Great resources will be appreciated.

Thank you.

~~~
lhdj
[https://github.com/danistefanovic/build-your-own-x#build-
you...](https://github.com/danistefanovic/build-your-own-x#build-your-own-
database)

------
arauhala
Hi Luc,

What's your perspective on predictive databases like
[https://aito.ai](https://aito.ai)?

I'm one of the Aito.ai founders. If you would like to hear more, I'm happy to
talk one-to-one.

Regards, Antti

------
monksy
I love these kinds of posts. They're targeted towards what people are finding
interesting and they're highly tech related. It's a great way to find new
technology.

------
melvinroest
The website didn't load for me. So here it is:
[https://web.archive.org/web/20200615193041/https://lucperkin...](https://web.archive.org/web/20200615193041/https://lucperkins.dev/blog/new-
db-tech-1/)

Also, I'd like to add one database to the list (I work there for 3 weeks now):
TriplyDB [0]. It is making linked data easier.

Linked data is useful for when people of different organizations want a shared
schema.

In many commercial applications one wouldn't want this, as data is the
valuable part of a company. However, scientific communities, certain
government agencies and other organizations -- that I don't yet know about --
do want this.

I think the coolest application of linked data is how the bio-
informatics/biology community utilizes it [1, 2]. The reason I found out at
all is because one person at Triply works to see if a similar thing can be
achieved with psychology. It might make conducting meta-studies a bit easier.

I read the HN discussions on linked data and agree with both the nay sayers
(it's awkward and too idealistic [4]) and the yay sayers (it's awesome). The
thing is:

1\. Linked data open, open as in open source, the URI [3] is baked into its
design.

2\. While the 'API'/triple/RDF format can be awkward, _anyone_ can quite
easily understand it. The cool thing is: this includes non-programmers.

3\. It's geared towards collaboration. In fact, when reading between the
lines, I'd argue it's really good for collaboration between a big
heterogeneous group of people.

Disclaimer: this is my own opinion, Triply does not know I'm posting this and
I don't care ;-) I simply think it's an interesting way of thinking about
data.

[0] triply.cc

[1] A friend of mine once modeled some biochemistry part of C. Elegans from
linked data into petrinets:
[https://www.researchgate.net/publication/263520722_Building_...](https://www.researchgate.net/publication/263520722_Building_Executable_Biological_Pathway_Models_Automatically_from_BioPAX)

[2]
[https://www.google.com/search?client=safari&rls=en&q=linked+...](https://www.google.com/search?client=safari&rls=en&q=linked+data+and+biology&ie=UTF-8&oe=UTF-8)
\-- I quickly vetted this search

[3] I still don't know the difference between a URI and URL.

[4] I think back in the day, linked data idealists would say that all data
should be linked to interconnect all the knowledge. I'm more pragmatic and
simply wonder: in which socio-technological context is linked data simply more
useful than other formats? My current very tentative answer is those 3 points.

~~~
iaabtpbtpnn
I recently had to deal with some RDF data expressed as N-triples. So,
naturally I loaded it into a proper triplestore and embraced the whole W3C RDF
universe and started teaching myself SPARQL, right? Nah, instead I just
translated that shit into CSV and loaded it into Postgres, then post-processed
it into a relational schema that I can actually understand and query with a
language that other people at my company also know. The RDF was just a poorly
specified way of communicating that schema to me, along with a data format
that's no better than a 3-column CSV. Great stuff from the Semantic Web folks
here, real powerful technology.

Edit: Also, to answer your question, the difference between a URL (Uniform
Resource Locator) and a URI (Uniform Resource Identifier) is that the URL
actually points to something, an object at a particular location, and you can
paste it into your web browser to view that something. A URI just uses a URL-
like scheme to represent identifiers, such that your domain and directory
structure provide a kind of namespace that you control. But as long as it
follows the format, your URI can contain literally anything, it doesn't have
to be human-readable or resolve to anything in a web browser. It might as well
be mycompany_1231241542345.

~~~
zozbot234
> ...I just translated that shit into CSV and loaded it into Postgres, then
> post-processed it into a relational schema that I can actually understand
> and query with a language that other people at my company also know. The RDF
> was just a poorly specified way of communicating that schema to me, along
> with a data format that's no better than a 3-column CSV. Great stuff from
> the Semantic Web folks here, real powerful technology.

I'm not sure what's the point you're trying to make, that _is_ exactly what
RDF is for! It's not an implementation technology, it's purely an interchange
format. You should _not_ be using a fully-general triple store, unless you
really have no idea what the RDF you work with is going to look like.

SPARQL is the same deal; it's good for exposing queryable endpoints to the
outside world, but if your queries are going to a well-defined relational
database with a fixed schema and the like, you should just translate the
SPARQL to SQL queries and execute those.

~~~
iaabtpbtpnn
Well I’m glad to hear that my solution was sane, but I just don’t see what
technological innovation was contributed by the RDF. The file was canonical
N-triples, AKA a CSV with space instead of comma. The predicates establish
relationships between subjects and objects, but those relationships could be
one to one, one to many, or many to many. Should a given predicate be modeled
relationally as a column, or with some kind of join table? I have no idea from
the RDF. Say, those objects, are the types I see attached to the values the
only types that could possibly appear for that predicate? Who knows! Sure, the
data has been interchanged, but the “format” is so generic that it’s useless.
Why not just give me a 3-column CSV and tell me to figure it out on my own,
rather than pretend that RDF provided some improvement?

~~~
melvinroest
Thanks for your comment on it by the way. I'm still in the phase of gathering
what everyone thinks of it. I've noticed that RDF seems a bit polarizing. I
have the suspicion that people who feel neutral about it also don't feel the
need to chime in.

------
tourist_on_road
How does tiledb compares to something similar like milvus

~~~
biggestlou
I haven't come across Milvus but I'll check it out!

------
gigatexal
whatever became of rethinkDB -- i think it was just too far ahead of its time
it; had some really interesting ideas

------
davedx
Is tiledb useful for storing ML models?

~~~
Shelnutt2
You can certainly store ML models in TileDB. I've been working on an example
notebook showing how to store an opencv model in TileDB, its not quite
finished yet but will be published soon.

Using python as an example language, TileDB-Py offers simple functions, like
"from_numpy" [1] for basic and simple models which we can write directly. For
more advanced (and more common) use cases of complex models, its easy to
create an array to store the model and all associated metadata. TileDB's array
metadata and time travel capabilities allow for you to even see how your model
changes over time as you update it.

[1] [https://github.com/TileDB-Inc/TileDB-
Py/blob/bcaee16b194675f...](https://github.com/TileDB-Inc/TileDB-
Py/blob/bcaee16b194675f10235669fcb615928c0af964d/tiledb/libtiledb.pyx#L3974)

Disclosure: I'm a member of the TileDB team.

~~~
davedx
Really cool, thanks for the reply.

------
biggestlou
REMOVED: I complained about the rewritten title in a way that was excessively
harsh and have removed that comment.

~~~
dang
The article title was linkbait: "you" and "should" are linkbait tropes,
especially when combined. The site guidelines call for rewriting such titles.
See
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html).

When we change titles, we always try to use representative language from the
article. I replaced it with a phrase from the article where it says what it's
about in a neutral way. I added a hyphen to "recently minted" because it seems
to me that grammar requires it. However, if we worship different grammar gods
I am happy to let yours hold sway. The hyphen is now toast.

This is bog-standard HN moderation. I dare say the reason you "really really
hate it" is probably because you notice the cases you dislike and weight them
much more heavily:
[https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...](https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=by%3Adang%20notice%20dislike&sort=byDate&type=comment).
Meanwhile the ones you don't find objectionable pass over relatively
unnoticed. That is bog-standard HN perception bias.

If we didn't meticulously try to keep titles linkbait-free and less-misleading
(more hyphens, sorry!), HN's front page would be completely different, and
since the front page is the most important thing here, the entire forum would
soon be completely different. That requires editing titles all the time.
petercooper, who posted elsewhere in this thread, has an HN title-change
tracker (oops, I hyphened again!) which is quite nifty. It doesn't show all of
them, though, and it can't tell the difference between moderator edits and
submitter edits.

There is a power shortage in my neighborhood and my laptop battery is about to
expire, so if you don't see a link momentarily, or if I don't respond anywhere
on the site for a few hours, it's because shelter-in-place (uh-oh!) prevents
me from finding somewhere else to plug in.

Edit: here it is:
[https://news.ycombinator.com/item?id=21617016](https://news.ycombinator.com/item?id=21617016).
Note how we edited out the bloody pig mask.

~~~
biggestlou
I apologize for the strength of my language. There are several words I used in
that comment that I should not have. It's far too easy to forget in the moment
that there are real people making these decisions who may very well read what
you say. I understand where you're coming from with the decision and will try
to select more appropriate titles in the future. And will be much more mindful
about how I register complaints.

~~~
dang
Aw, thanks. I should have been nicer too. I was making myself chuckle while I
wrote some of that, which is one way to mitigate the tedium of writing similar
things over and over, but I fear that it comes across as a bit mean sometimes.
By the way: nice article!

Battery is at 2%...

~~~
sillysaurusx
I hope you're doing alright with the shelter-in-place. Losing power sucks.

If it goes on too long, and someone has a car nearby, you can use that as a
power source. (Be sure to keep the engine running though, otherwise you'll end
up with two objects that need a power source...)

~~~
dang
Considering that HN didn't fall apart while I was offline, it was a guilt-free
blessing.

I mean, I still had my phone. But moderating HN with a phone is like trying to
do surgery with only your thumbs, through a two-inch hole. In the dark.

~~~
sillysaurusx
I was always super curious what your browser extension for modding HN is like.
If there's such a huge difference between phone vs laptop, it sounds like the
extension is doing much of the work. It's a neat power tool.

