Hacker News new | past | comments | ask | show | jobs | submit login

> How else could you solve what Vespa does using Rust, Go, or C/C++ libraries?

Let me try myself answering my own question, I hope someone hops in and tells me where I'm wrong or how else to improve :)

     1) Get PostgresSQL exntensions via "package manager" pgxnclient
     1.1) pg_bouncer - For connetion pooling
     1.2) yoke - As a high-availability cluster manager with auto-failover and automated cluster recovery
     1.3) prestodb.io - Distributed SQL query engine for pgsql
     1.4) pglogical - Logical streaming replication for using a publish/subscribe model
     1.5) pg_lambda - To create your own AWS (meta) Lambda
     1.6) pg_strom - To offload tasks to the GPU
     1.7) zombodb - To utilize full-text searching via indexes backed by Elasticsearch
     2) Put all together with pglogical and presto to seperate GPU/CPU intensive tasks.
     2.1) "Build Missing Middleware" - To design/fuse a query visually that combines multiple backends
     2.1.1) Create a binary data-stream by integrating pg_lambda, pg_strom, presto and zombodb
     2.1.2) "Build Missing Middleware" - A tensor processing extension to use ML Model evaluations
     2.1.3) "Use Missing Middleware" - For data-processing via Machine-Learning models
     2.1.4) "Use Missing Middleware"- To output ML processed results into the database
     2.2) Partition these queries using "pg_lambda + middleware" to create accelerated and fused query results
So what's missing to create a Vespa alternative using existing technologies is everything in Point 2) if I'm not mistaken. Torrent based replication isn't exactly neccessary, except at Twitter/Facebook scale, but if you reach that stage you can hire a libtorrent author.



I thik basing this on PostgresSQL was wrong now and believe that a meaningful approach at creating a Vespa alternative yourself is basing this on a Content-Adressable-Storage[1] and adding a DB-Layer ontop (ie. using AUFS).

It would have following properties: decentralized, distributed, resilient, highly-available, software-defined storage & retrieval system.

According to http://vespa.ai/#featurematrix:

        FEATURE	                    VESPA	ELASTIC SEARCH	RELATIONAL DATABASES
        ACID transactions			                •••
        Optimized for analytics		        •••	        ••
        Optimized for serving	    •••	        •	        ••
        Scalable	            •••	        ••	        •
        Easy to operate at scale    ••	                        •
        Text search	            •••	        ••	        •
        Machine learned ranking	    •••	        •               2.1.2) - 2.1.4)	
        Middleware logic container  •••		                1.4)
        Live reconfiguration	    •••	                        1.2)
And yet I've to admit that even if the Github repository looks quite chaotic, making an alternative, even using existing technologies would be big feat.

Initially I would've chosen PostgresSQL as a base, but the "HA-Layer" is something that shouldn't be decoupled and not a later thought. That's why CAS is a much better form of integration. Also integrating the PostgresSQL Engine into a zfs kernel extension ie. would be a mess. And integrating the database engine into a a distributed p2p algorithm would only add compatability issues an no real advantages.

[1] https://en.wikipedia.org/wiki/Content-addressable_storage#Op...

PS: Clever aquisition by Docker! "Infinit.sh is a content-addressable and decentralized (peer-to-peer) storage platform that was acquired by Docker Inc." And in my eyes one of the best implementations and easiest targets that allow adding a database-layer ontop.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: