Hacker News new | past | comments | ask | show | jobs | submit | tison's comments login

Not quite. As described in the FAQ:

What about the interoperability with SQL?

Some libraries and tools enable developers to write queries in a new syntax and translate them to SQL (e.g., PRQL, SaneQL, etc.). The existing SQL ecosystem provides solid database implementations and a rich set of data tools. People always tend to think you must speak SQL; otherwise, you lose the whole ecosystem.

But wait a minute, those libraries translate their new language to SQL because they don't implement the query engine (i.e., the database) themselves, so they have to talk to SQL databases in SQL. However, ScopeQL is the query language of ScopeDB, and ScopeDB is already a database built directly on top of S3.

Thus, what we can leverage from the SQL ecosystem are data tools, such as BI tools, that generate SQL queries to implement business logic. For this purpose, one should write a translator that converts SQL queries to ScopeQL queries. Since both ScopeQL and SQL are based on relational algebra, the translation must be doable.


The syntax is still changable and welcomes any comments for improvements.

To try my best to avoid divergent discussion, I'd include two most significant FAQs:

*What about the interoperability with SQL?*

Some libraries and tools enable developers to write queries in a new syntax and translate them to SQL (e.g., [PRQL](https://prql-lang.org/), [SaneQL](https://www.cidrdb.org/cidr2024/papers/p48-neumann.pdf), etc.). The existing SQL ecosystem provides solid database implementations and a rich set of data tools. People always tend to think you must speak SQL; otherwise, you lose the whole ecosystem.

But wait a minute, those libraries translate their new language to SQL because they don't implement the query engine (i.e., the database) themselves, so they have to talk to SQL databases in SQL. However, ScopeQL is the query language of ScopeDB, and ScopeDB is already a database built directly on top of S3.

Thus, what we can leverage from the SQL ecosystem are data tools, such as BI tools, that generate SQL queries to implement business logic. For this purpose, one should write a translator that converts SQL queries to ScopeQL queries. Since both ScopeQL and SQL are based on relational algebra, the translation must be doable.

*Project Foo has already implemented similar features. Why not follow them?*

ScopeQL was developed from scratch but was not invented in isolation. We learn a lot from existing solutions, research, and discussions with their adopters. It includes the syntax of PRQL, SaneQL, and SQL extensions provided by other analytical databases. We also deeply empathize with the challenges outlined in the [GoogleSQL](https://research.google/pubs/sql-has-problems-we-can-fix-the...) paper.

However, as answered in the previous question, we first developed ScopeDB as a relational database. Then, we learned users' scenarios where an enhanced syntax helps maintain their business logic and increases their productivity. So, directly implementing the enhanced syntax is the most efficient way.


I don't have a dedicated benchmark for these primitives, but we use them in a database that processes petabytes of data [1] and we don't find specific bottlenecks.

[1] https://www.scopedb.io/blog/manage-observability-data-in-pet...

Most of the performance factors would be the sync Mutex in used. I can imagine that by switching between the std Mutex, parking_lot's Mutex, and perhaps spin lock in some scenarios, one can gain better performance. Mea has an abstraction (src/internal/mutex.rs) for this switch, but I don't implement the feature flag for the switch since the current performance is acceptable in our use case.

The internal semaphore's implementation may be improved also. Currently, to keep code safe, I implement the linked list with `Slab<Node>` (you can check src/internal/waitlist.rs for details). Using a link like [2] may help, but that's not always a net win and needs much more time to do it right.

[2] https://github.com/Amanieu/intrusive-rs


Interesting. Thanks! I've been experimenting a bit with my keepcalm library. I have some experimental async concurrency primitives in there but I'd like to compare with what you've got here to potentially replace them.


Welcome to create an issue on GitHub for sharing and discussion :D


> they were probably just trying to be humble about their accomplishment

Thanks for your reply. To be honest, I simply recognize that depending on open-source software a trivial choice. Any non-trivial Rust project can pull in hundreds of dependencies and even when you audit distributed system written in C++/Java, it's a common case.

For example, Cloudflare's pingora has more than 400 dependencies. Other databases written in Rust, e.g., Databend and Materialize, have more than 1000 dependencies in the lockfile. TiKV has more than 700 dependencies.

People seem to jump in the debt of the number of dependencies or blame why you close the source code, ignoring the purpose that I'd like to show how you can organically contribute to the open-source ecosystem during your DAYJOB, and this is a way to write open-source code sustainable.


And contributing back is one of the approaches to maintaining open-source dependencies. I have described how to deal with OSS dependencies in [1] (yet to translate it :P).

[1] https://www.tisonkun.org/2024/11/17/open-source-supply-chain...


This article is actually a translated one. In the original article[1], I talked about commercial open-source and how one can collaborate with the open-source community when running a software business.

This section is moved to the second-to-last section in the posted blog, including:

[QUOTE]

When you read The Cathedral & the Bazaar, for its Chapter 4, The Magic Cauldron, it writes:

> … the only rational reasons you might want them to be closed is if you want to sell the package to other people, or deny its use to competitors. [“Reasons for Closing Source”]

> Open source makes it rather difficult to capture direct sale value from software. [“Why Sale Value is Problematic”]

While the article focuses on when open-source is a good choice, these sentences imply that it’s reasonable to keep your commercial software private and proprietary.

We follow it and run a business to sustain the engineering effort. We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable, and maintain the open-source twin to share the engineering experience.

[QUOTE END]

I wrote other blogs to analyze open-source factors within commercial software[2][3][4][5], and I have practiced them in several companies as well as earned merits in open-source projects.

When you think about it, there are many developers working for their employers, and using open-source software in their $DAYJOB is a good motivation to contribute more (especially for distributed systems; individuals can seldomly need one). I know there is open-source developers who develop software that has nothing to do with their $DAYJOB. I'm maintaining projects that has nothing to do with my $DAYJOB also (check Apache Curator, the Java binding of Apache OpenDAL, and more).

[1] https://www.tisonkun.org/2025/01/15/open-source-twin/

(Need a translator) [2] https://www.tisonkun.org/2022/10/04/bait-and-switch-fauxpen-...

[3] https://www.tisonkun.org/2023/08/12/bsl/

[4] https://www.tisonkun.org/2022/12/17/enterprise-choose-a-soft...

[5] https://www.tisonkun.org/2023/02/15/business-source-license/


I've updated the Gist with a full Cargo.lock file that can be audited - https://gist.github.com/tisonkun/06550d2dcd9cf6551887ee6305e...

Running cargo audit -n --json | jq -r '.vulnerabilities.list[] | (.advisory.id + " - " + .package.name)' gives:

RUSTSEC-2023-0071 - rsa

which is transitively introduced by sqlx-mysql while we don't use the MySQL driver in production.


Datadog always builds their own event store: https://www.datadoghq.com/blog/engineering/introducing-husky...

It may not be named "database" but actually take the place of a database.

Observability vendors will try to store logs with ElasticSearch and later find it over expensive and has weak support for archiving cold data. Data Warehouse solution requires a complex ETL pipeline and can be awkward when handling log data (semi-structured data).

That said, if you're building an observability solution for a single company, I'd totally agree to start with single node PG with backup, and only consider other solution when data and query workload grow.


In 2025 I'd consider starting with clickhouse instead, if you're going the DIY route


In the linked article below, we talked about "If RDS has already been used, why is another database needed?" and "Why RDS?"

Briefly, you need to manage metadata for the database. You can write your own raft based solution or leverage existing software like etcd or zookeeper that may not "a relational database". Now you need to deploy them with EBS and reimplement data replication + multi AZ fault tolerance, and it's likely still worse performance than RDS because first-class RDS can typically use internal storage API and advanced hardware. Such a scenario is not software driven.

https://flex-ninja.medium.com/from-shared-nothing-to-shared-...


Here are several points I have in mind:

1. JSONPath/SPath supports multiple selector, e.g., $["a", "b"] or $[1, 3:10:2, 101]. This may be a bit more tidy than dot or subscript.

2. JSONPath/SPath supports descendant query: a descendant segment produces zero or more descendants of an input value. For example, $..[0] selects all the first element of arbitrary successors that is an array.

3. JSONPath defines filter selectors. SPath doesn't support it now, while it's on the Roadmap. This can be more powerful as a query language.

4. The original reason I wrote such a library is, however, to use JSONPath/SPath syntax beyond a JSON value. That is, I've written a database for processing semi-structured data [1], and my clients told me that's like to extract inner value with JSONPath syntax. All the existing JSONPath libraries, whether mature or not, are, of course, assuming they are handling JSON values. But in ScopeDB, we define our own variant value.

[1] https://www.scopedb.io/reference/datatypes-variant


Nice, these are all solid and I appreciate it!


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: