Hacker News new | past | comments | ask | show | jobs | submit login

I've done some previous digging into natural language SQL queries -- there's a good amount of research around this. But the error rate is always too high for useful production deployment and the systems I've looked at never handled ambiguity well. The target user for this is someone who knows nothing about SQL so ambiguity is guaranteed.

Most importantly, you can't have the system confidently answer incorrectly.

I just tested this query to evaluate joins: "What is the name of the series with the highest rating?" The answer: 5.8 :/

A typical user will be even less clear than this, asking the question as: "What's the top tv series?"

The ideal system should be able to confirm with the user that 1) they'd actually like to evaluate tv series based on rating. And 2) it should be able to guess without user confirmation that I really want the 'name' of the tv series.

It's a hard problem but massively valuable if it can be solved.

You're right! I package public trained models for natural language queries (https://github.com/paulfitz/mlsql/), and the models are bad at saying they don't know. I'm optimistic though. There's significant year-on-year improvement (driven by real progress in NLP), and the training datasets are getting more interesting. There are now conversational datasets (e.g. https://yale-lily.github.io/cosql) where the model is trained to ask follow-up questions, and an explicit goal is "system responses to clarify ambiguous questions, verify returned results, and notify users of unanswerable or unrelated questions". That could be a big win.

Great job compiling the datasets! For the Yale group, I like the goal of their research. If the system can only answer 70% of questions, but it's pretty sure it knows the answer when it claims it does, that can still be very useful in many domains. And the conversational approach is a good one for clarifying ambiguity. Will continue to monitor their progress. Thanks!

I worked for Cleargraph, a startup which built a natural language query layer on top of RDBMSes and which we sold to Tableau. We reached the same conclusions as you: that such a system must properly handle ambiguous queries, and users need to explicitly understand the results they're viewing.

The way our system worked, broadly, was with a computational linguistics approach rather than an NLP one. We used an augmented bottom-up parsing algorithm to assemble a user's query into an AST, then translated that AST into both SQL (though that changed post-acquisition) and back into a canonical English-like query.

For ambiguous inputs, the parser would produce multiple candidate ASTs which we could offer to the user in the form of search results, for them to select the expression they preferred (solving your point 1).

The parser was constructed to use semantic information about the database schema to infer fields in the case of underspecification (handling your point 2). For example, we might understand about the `Series` table that `name` was the "display column", and should be the default field by which record-level results were displayed. Semantic information could, to various degrees, be inferred - always with the option to be manually overridden.

It seemed clear to us that a schema-agnostic approach to SQL generation would be untenable for a real product. Annotation of the schema over time (starting with basic information like field names, synonyms, and types, then adding semantic information like currency or unit labels) is key to letting a query generator produce sensible results to underspecified queries. For this reason alone, I'm skeptical of the quality of results that something like Photon can produce. Every schema differs in syntax and semantics. Perhaps a big enough corpus of schemas can allow for a generalized model. The more you can constrain the domain in building such a system, the better the results. (look at Statmuse [0]: pretty impressive query capability over sports facts).

[0]: https://www.statmuse.com/

Statmuse works a lot better than I expected. It needs to be able to answer which player is the hottest/cutest/most handsome though. Huge gap that needs to be corrected, tout suite.

Just threw some 'curveballs' at statsmuse and it held up pretty well. Good argument for restricting the domain as you say.

We paid a lot of attention to this exact problem at Delver, talking to potential users it was a big concern especially for reporting/BI. We’d never give results until we had a single unambiguous parse of a question, but this required a core that was quite old fashioned NLP, with machine learning only really being used around the edges. So ultimately we’d always generate a series of parse trees with embedded semantics that composed together, but the parser could look at things like external ontologies or word embeddings to try and work out the exact analogy in the domain for something the user said (and confirm with them that the new language was what they meant).

It was reasonably cool but I was a terrible CEO and slowly killed it:


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact