Hi yingw787, I work on the product team at Rockset. Thanks for your thoughts! I'...

yingw787 · on April 13, 2019

My impression of most databases is that locating the data physically close together (i.e. an internal network connection ties together database nodes) provides assumptions for performance optimization (e.g. based on internal testing we think there is the tail latency at this percentile is X milliseconds between requests on database nodes, or the network will only fail requests X% of the time, therefore we can optimize this factor in source). If you have disparate data located elsewhere, it may be more difficult to bake in such assumptions (e.g. requests across public Internet may fail more often), and more difficult to achieve performance, and therefore the value-add from a product like Rockset would be to tie together disparate data sources. But I just read your comment that the data is transformed to a Rockset specific format, so it might matter less in that case because you do have a persist filesystem.

For the extensions API, I was imagining something like postgresql-contrib: https://www.postgresql.org/docs/current/contrib.html

In Rockset's case, I thought it would make sense if the data came from multiple locations, extensions requests might take that as a top-level assumption; hence the idea of a Rockset extension for something like Zapier, where multiple Internet services are tied together into automation pipelines (or in Rockset's case, read/write query pipelines).

I just thought of this now, but the client interface for a database like PostgreSQL is useful enough where other databases like CockroachDB can implement it too: https://www.cockroachlabs.com/blog/why-postgres/

Hope this helps :)