Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: I want to build my own query language
3 points by bewal416 2 days ago | hide | past | favorite | 11 comments
Our product is starting to get more and more requests for custom reports. I’ve built some basic tables with filters and exports to Excel/PDF, but they fall short of the nuance our customers need, especially those in regulated markets.

One customer needs full first names but only the first letter of the last name. Another needs a very specific JOIN with an entity almost no other customer cares about. To accommodate, I’ve been building custom Looker reports for each customer, which won't scale well.

I started looking into how other SaaS companies solved this. Many built their own SQL-like query languages:

- Salesforce -> SOQL - Shopify -> ShopifyQL - Stripe -> Sigma

All of them seemed to address the same problem I’m seeing: customers have unique reporting needs that no-code GUIs can’t handle. A drag and drop builder is great for non-techies, but most real requests require joins and transformations, and I’m trying to avoid becoming a consulting shop for every customer.

I'm particularly impressed by Stripe Sigma because of how they combine SQL with an LLM layer. Users can ask for a report in plain English, customize it in a lightweight BI tool, and edit the query only whenever needed.

Has anyone gone through this or have advice on alternative approaches? I’m open to any direction here.





Unless your data is really unusual, I’d generally recommend that you avoid writing your own query language and processor: it’s just damn hard to make it work well. Instead, look at how to put something like DuckDB in front of your data so people can just write SQL.

Or a step up from that: build a compiler that converts queries in a human-friendly or application-specific language to SQL or something similar.

I'd stick with SQL, they can pull queries straight out of ChatGPT if they don't know it themselves.

If everyone lives within one database I'd throw up a per-customer read-only database in front of it for running their queries so they don't create performance issues.


We do have a single-tenant DB. That’s one of my architecture challenges- how to handle permissions and clean up the schema a bit to entities that only my users need.

Possibly achieve that with some views or w/e the equivalent is in your database, and database accounts that can only access those views.

Another option might be to let them ingest their data directly into the existing BI tools they use where they can do whatever they want, cool thing about that is it can entrench you into their infrastructure and it offloads a lot of this complexity you're dealing with.


Okay- just spent the whole day tinkering wit this:

1) I create a baseline set of views I want my customers to have 2) For each new customer, I’ll run a script that create a replica of those views- filtered by their customer ID 3) I’ll allow my customers to write pure SQL- limiting them to only SELECT queries and a couple niche business rules, as well as masking any DB-level errors, because that just feels wrong

How does that approach sound?


I think the main thing you're missing is creating an account in the DB that only has access to those views, so for each customer you'd do something like:

    CREATE USER customer_xyz WITH PASSWORD 'foo';

    CREATE VIEW customer_xyz_data AS SELECT * FROM data_stuff WHERE customer_id=x;

    GRANT SELECT ON customer_xyz_data TO customer_xyz;
So then two things are happening, SELECT-only is being enforced by the view itself no matter what, and their account is categorically unable to touch anything outside of that view too, so as long as you run their queries through that account it will always be sandboxed.

You can enforce all of that yourself but ultimately if they're using an account that can read/write other tables you will always have to be careful to make sure you are sanitizing their input not just to selecting but like, limiting joins and nested queries too.


Gotcha. Yeah- I was thinking of working with my engineers to figure out a permissions layer, but I understand enforcing that at the DB-level would guarantee security.

Dumb question- is creating a set of Views for each customer even efficient for my MySQL database? I could realistically see us having ~12 customer-facing views- is having 12*N views a smart and scalable way to architect this?


A view is just a query that pretends to be a table, so it will come down to the complexity of that query. Each time you're querying the view it will be running the combination of the user's query against the view's query so the performance comes down to whether your DB is optimized around basically "SELECT field1, field2, field3 FROM (SELECT * FROM data_stuff WHERE customer_id=x)". Whether you execute that query as a view or as ad-hoc SQL doesn't make a difference itself.

"Your side" of this can be optimized easily enough, but the user-submitted queries are likely to be inefficient or miss indexes, which is why one database per customer can be better since they each have their own resources.

You can create the views and accounts as needed and destroy them when sessions end rather than keeping them permanently too, so when the user signs in you create the view and account, after the session or some period of inactivity you remove them.


Makes sense. The fact that my SQL Editor puts tables and views in the same section on its left sidebar was the main reason I did a double-take.

The idea of deleting and recreating views is an interesting one. I see that as a really cool approach- considering we can go without it as a v1 then include it as we scale.

Thank you for all your advice so far! This has been truly helpful.


You're welcome!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: