

Ask HN: Building Database on KV Store with Bytecode Generated Based on SQL - netforay

Does my idea have any merit...<p>I need to implement a Multi tenant Database for an ERP like application. As of now using PostgreSQL with Meta tables(Like this https:&#x2F;&#x2F;developer.salesforce.com&#x2F;page&#x2F;Multi_Tenant_Architecture)<p>My plan is to use a simple Key-Value database like BDB, LMDB, LevelDB and remove all SQL from the code. Any Indexing, Querying is performed by Java code generated specifically to use K-V db for that purpose.<p>By doing this I want to gain 10x DB performance compared to SQL.There is no SQL. So no SQL parsing. No query plan preparation, as I will prepare the plan in compile time and generate the Bytecode(using ASM) to use KV db. There is no other process, it is embedded, so no memory copy. No network in between. Even plan to use LMDB specifically so that no data copy is needed across buffers(LMDB uses Memory Mapping). Every plan is prepared in advance in code. Any SQL Database anyway uses some kind of KV store in backend.<p>My question to the experts here is... How does this sound? Can I get 10x(or even more) performance compared to PostgreSQL across Network? What other parameters I might be missing?<p>Notes: Replication, Scalability beyond one server are not required as they are handled at Application Level.
I know that my Query plans may not be as good as what DB server do. But at least it is in my control and it is not rocket science to copy the similar plans.
======
tom_b
I think if you are willing to trade the flexibility of normal relational back-
end DBs for a storage of your own making, you can certainly optimize for a
specific workload. But is that your case? Joe Celko made a comment in one of
his SQL books that implied all data hackers wake up one day and think "I know,
I'll just put everything into one big K/V table . . . "

Its interesting that you focus hard on "no SQL. So no SQL parsing. No query
plan preparation, . . . ". Have you found SQL parsing and query plan prep to
be actual bottlenecks in PostGres? This would be surprising to me.

Personally, I have found that occasionally doing simple things like memory
mapping multi-gigabyte pre-sorted data and using simple binary search from
within Java (well, Clojure to be truthful) can be quite performant. But that
case involved static data and well-known search patterns. I have also flirted
with column-stores for analytical workloads and they are pretty awesome for
that.

There is such an explosion of options available these days in both relational
and the NoSQL family. As much as I like hacking, I do try hard to ask myself
"why can't one of these solutions do this hard work for me?"

I suspect a 10x improvement will elude you (no such thing as a free lunch),
but I geek out hard reading what other people do in these cases, so if you
have a business case (or personal side-interest) to tackle your own data
store, have at it. Post us what you find sometime. Good luck.

~~~
hyc_symas
You can already get a 10x improvement just by switching from SQL to LDAP. SQL
parsing _is_ expensive and inefficient, even with pre-compiled queries.

Writing indexers is pretty tedious; I personally would use something that has
already done this for me. Like OpenLDAP.

~~~
netforay
So a directory server can be a replacement for SQL? Do any one use it for that
purpose? I always thought they are both completely different requirements. I
will look into the possibility. Thanks for the suggestion.

