

Show HN: Delver, a natural language interface to your app - thom
http://delver.io/

======
neya
Actually, this is a brilliant _idea_. However, I am not sure if I will use
this service for my organization. Why? It's got something to do with
(everything to do with!) my data and my trust.

Most likely, you're just converting my questions into queries and parsing it
over my data. Because your hompage is not clear if you are letting out just an
API or you are giving out results of our data by yourselves. But if you gave
out just an API, then it would mean we still have to process the converted
queries and create our own interfaces to display the results. So I'm going to
assume you take care of the data processing as well.

But, wait? You need access to my data to perform those queries. Let's say I
have a million users. This means, you could potentially log every single query
AND the result of the query into YOUR databases.

This means,

If I ask you "Which percent of my users pay the highest and are from the
United States?"

And you perform a query on my one million users to find that out, you have
data in your hands that my competitors or third party advertisers will come
around you like sharks for. Whereas, all I can do at that point is just hope
that you won't sell my data to them, which puts me and my data in a pretty
vulnerable state.

Not to say that this is a bad idea. It's a brilliant idea, but I'm not sure
how you are going to earn the trust factor.

~~~
thom
Yup, this is a big issue, and obviously our lives would be easier if everyone
was happy for us to just gobble up their data. As I've mentioned elsewhere, we
want to enable a scenario where we know your schema and ontology, and can
therefore generate the query. That allows us to query on your network, but as
you say, there's still a huge trust issue.

I'd love to hear your thoughts on the minimum desirable interaction here - is
an API that just translates natural language to, say, SQL useful enough for
you to pay for?

~~~
neya
> is an API that just translates natural language to, say, SQL useful enough
> for you to pay for?

Definitely!

If you get the natural language to SQL right, then it is multitudes easier for
me as a developer to parse a JSON and display the results to an interface than
having to learn the natural language processing part by myself and then try to
interact and implement with it!

EDIT: I wanted your email. Nevermind, I found it.

~~~
thom
I'll add that as an option on the survey actually, ta. The primary thing we
need to know right now are realistic integration scenarios.

------
mgkimsal
The big assumption here is that the data you'll be querying is, in fact,
decently structured. I like the idea, but I've come across too many structures
that aren't organized to be queryable at all, because they've been constructed
by people without any understanding of relational concepts, or indeed data
integrity or normalization.

"If the 3rd character in "customer ID" is 7 or 8, and the start date is after
2007, then they are a "premium" customer, and the maximum order amount doesn't
apply, if they're shipping an order to Michigan, Texas or Florida".

"If customer ID is greater than 80000 and the date is after November 2009,
then the real customer number should be reduced by 10000, because we had a
problem and needed to reuse customer IDs". (meaning, an invoice for customer
12000 on November 2004 is related to a different customer than customer 12000
for an invoice on November 2010).

"If the employee's start date is >2005, then check table 1 for employee data
if their last name starts with A-M, and table 2 if their last name starts with
N-Z, otherwise check table 0 (legacy) Oh, and in table 4, if the employee ID
starts with L, that means "legacy", so use table 0 to find their information,
but remove the L".

These are situations I've run in to in the last few years, and I'm sure many
of you have similar WTFs in your experiences. If someone has their data in
good, solid, structured formats/tables, natural language syntax might
fun/easy/exciting, but those people can also be served by things like Crystal
Reports, some books, and a few hours of learning. The companies that most
desperately need NLP->SQL probably also have the worst data.

~~~
thom
Yeah, there's certainly a level of schema insanity beyond which we won't be
able to offer a lot of value. We can still in those cases consume data that
looks like subject-predicate-object, but the onus would be on integrators to
supply that, and then you have security and timeliness issues.

For people with good data, even those with the expertise to query it, they'll
still often have end users who want the data. The cycle of 'call IT
department, ask for data, wait for data' or 'email SaaS provider, request
report, wait for report' can be short circuited in these cases, and I believe
that's of value.

------
thom
The VM hosting this page is struggling a bit, apologies. If anyone's
interested, I'd love it if you filled out the survey here:

[https://docs.google.com/a/hotwoofy.com/forms/d/1ixCUouKsq1Q4...](https://docs.google.com/a/hotwoofy.com/forms/d/1ixCUouKsq1Q4cwPlftu9skSz7kTJkqYx0QG2q8nawl0/viewform)

~~~
alok-g
I would love to see the results of this survey. Thanks.

~~~
thom
I'll certainly publish some anonymous stats about data sources and people's
desired integration method (API/SDK/appliance etc), as it's likely to be
interesting to other businesses offering a service to integrate in people's
environment.

------
TheAnimus
Love the idea of been able to provide NLP to our users in a very low effort
way.

However we wouldn't be able to even disclose some anonymised data, let alone
have something communicate with the outside world that was munging our real
data. Just the idea from security attack vector stance, effecively allowing
any query would be a a deal breaker.

The problem is I can't see much happening in the way of tuning, we would be
the clients from asshole land:

 _Oh yeah when I make a query, I don't get good results back_ Ok lets have a
look, what's the query like _Can't tell you that_ Ok, what's the data like
_Can't tell you that_ What can you tell us? _System sucks_

But obviously if someone else is providing data for tuning the NLP stuff to
actually work on our data, if we can run the output as a AST, putting it
anywhere we want as we would the output from our DSL, I could imagine the
business case for paying a few cents per user, per month.

~~~
thom
Yup, people are making the privacy/security aspect pretty clear, thanks for
confirming that. There are also issues about partitioning of data from one
data source that we need to address - only being able to query data on your
own user_id etc.

When you talk about tuning, are you saying you'd be unlikely to have time to
train the system after initial setup? We're making an iterative model that'll
allow you to add new concepts, new sentence structures etc as you go, and
we've thought a few times it would be good to expose a log of queries
(especially failed ones), and also allow end users to say 'this is
wrong/nonsense' whenever they get results.

~~~
TheAnimus
To be honest, I'd rather the security of the query wasn't handled by your
thing. My data source shouldn't allower user bob to ever be able to see data
that is not intended for him.

Time to train wouldn't be the problem, it would be a case of letting you guys
near the data. The lawyers would have kittens. Having a nice tuning tools
would be a good idea, as it allows us to do it.

Would the thing nock out an AST, or would it be SQL only? As it stands, one of
the benefits of our own DSL (using Irony) is that we can implement the AST in
T-SQL or just C# code against POCOs.

~~~
thom
Well, internally we're just passing around bits of lisp before building the
query. Part of the aim in gathering data through our survey is to get a feel
for what data sources people want, so it's likely we'll have more than just
SQL. Given that, I don't see why we could just have a homebrew, abstract
structure as well, if not the internal representation.

Also, in my distant C# days I was very impressed with Irony, glad it's still
around.

------
ique
I think your three examples at the bottom, Data -> SQL -> Delver query, should
all be on the same example. In the first table it's like a timesheet thing, in
your SQL you're making a list of people and their number of purchases and in
the third Delver query you're asking a number of questions.

I realize you might be aiming this application at people who actually can't
read the SQL, but you're also saying "You're drowning in tedious reports."
implying they actually use SQL and this is to replace it.

So make the examples consistent and give people some insight into how this
works and what actual SQL query a question generates.

~~~
thom
You're right the examples don't really tell a story, thanks for the feedback.

We need to work to clarify the message a little, because we're in the
situation where we're simultaneously targeting IT departments and developers
who would benefit from removing the burden of ad-hoc reports, but also their
end-users who want the data. I'll give that some thought.

------
icebraining
Pretty cool. I've become interested in this topic when I read the paper on
Natural Language Query System for RDF Repositories[1], which mapped NL queries
to a PIM ontology using SPARQL, but alas, I never further explored it.

It's nice to know the technology is becoming available to the average
programmer like myself ;)

[1]:
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.8453&rep=rep1&type=pdf)

~~~
thom
You can certainly go a long way with pretty naive models that take advantage
of RDF[1][2], because you can just match words in a query to those in the
ontology. Our model is a little more complicated, as we need to support
everything from simple things like 'not' to negate part of a query, to more
complex stuff like 'more recently', and compositional queries (what products
have sold more than products that are red, tricky stuff like that).

[1]: [http://gate.ac.uk/sale/dd/related-
work/Kaufmann_nlp+reduce_E...](http://gate.ac.uk/sale/dd/related-
work/Kaufmann_nlp+reduce_ESWC2007.pdf) [2]:
<http://gate.ac.uk/sale/eswc10/freya-main.pdf>

------
apunic
As useful this by the new Facebook Graph Search inspired service my sound, I
doubt the usefulness. I just prefer to have a expressive _non-natural_
language to specify my needs (especially when retrieving data). This could be
useful for the mainstream querying drunk female friends nearby but again this
is solved by FB already.

However, I would love to have seen some demo.

~~~
thom
Yup, if everyone who wants to get at the data already has the skills to do
that, we've got nothing to offer. But many organisations have knowledge
workers or end users that won't learn SQL, and don't have people available to
regularly run ad-hoc reports, or the time to constantly expand the reporting
available on their intranet/admin suite.

I'll post again when we have a demo up.

------
seanmcdirmid
So these guys just do queries right? Not full on dialogues that Siri supports,
but rather the trivia questions that Siri can resolve using Wolfram Alpha (via
Wolfram's own hand-constructed NLP tech)?

Not that this is bad, but it is pretty far away from a general natural
language interface where you basically have a conversation with your app.

~~~
thom
We are read-only, this is absolutely correct, hopefully the site's not
misleading in that respect. We would absolutely _love_ long term to allow
commands as well as queries, and become something akin to the Siri third-party
API everyone wants (although we're not tackling speech anytime soon).

That said, what you call trivia questions, we call decision support,
reporting, and other grown up things. :P

~~~
seanmcdirmid
The title of this post was just misleading a bit. I didn't mean to use trivia
as a pejorative, just this is how I call the whole "use data X to answer
question Y" domain of NLP, which are fairly distinct from dialogue processing
systems. You seem to be basically in competition with Wolfram Alpha, except
you focus on custom structured data sources, but they seem to do something
here also [1].

[1] <http://products.wolframalpha.com/enterprise/>

~~~
thom
Yup, Wolfram are a scary competitor, EasyAsk too. In my head, we're targeting
the lower-end of the market - potentially smaller data models, no consultancy
face-time. We'll see how realistic that turns out to be.

------
alok-g
See also: <http://www.easyask.com/>

It may be harder to see this from their website now, but they provide natural
language interfaces to SQL databases.

PS: I am also working on something like this, though not fully defined as yet.

~~~
thom
EasyAsk do some cool things, and offer a much wider business intelligence
suite as well. I like to think we're targeting smaller enterprises than they
are. Essentially, we want to be a self-serve app, and we can't learn and grow
the way we want to if we have to send consultants to work with you on your
integration (something which EasyAsk will happily charge you six figures to
do).

------
offdrey
You should probabaly put a little more info on your landing page : a preview,
some examples maybe...

~~~
thom
Yup, all on its way. Right now I'm just hoping to get a few responses to the
linked survey so I know what front-ends to concentrate on.

------
phreeza
Is there something along these lines but in open source? I have been dreaming
of having natural language querying combined with Freebase or such, and with
computational in addition to pure data endpoints, to build a kind of open
source Wolfram Alpha.

~~~
DanielRibeiro
Python's Natural Language Toolkit is pretty cool: <http://nltk.org/>

~~~
thom
Yup, we're on the JVM but I've heard nothing but good things about NLTK (and
have been envious of some of the things it makes easy).

------
gingerlime
looks interesting. Would be nice to have a live demo database where you can
try out different natural language searches on and see the results.

Also interested to know the model - is this a SaaS? if so perhaps some pricing
info would be useful, or how it interfaces with your data, security
considerations etc.

~~~
thom
Thanks for the feedback. As mentioned elsewhere, we're very keen to open up
with something like a Magento plugin or something targeting a similar product
just to showcase the querying front-end and mapping tech with a known schema.

Business model-wise, we'll be offering per-seat licensing for internal apps,
and likely traffic-based pricing for more public apps. We are a SaaS app - the
intelligent bits happen in the cloud. The default option is to integrate one
of our agents to gather data, or publish data to us via an SDK (or REST API).
Because it's a deal breaker for many people, we're working hard to enable
scenarios where data never leaves your network, but we know your schemas and
ontology, and so the NLP bit happens on our side, but the querying happens
privately. Obviously this can have an affect on our ability to do entity
recognition, so it's an interesting problem.

------
tharshan09
Just sent my request! Hope I get an invite for using this, looking forward to
trying it out on stvplus.com

~~~
thom
Thanks for signing up, looks like some really interesting data to play with!

------
theoutlander
Very interesting. Can you share an example mapping to a data source?

~~~
thom
Mappings aren't entirely monolithic things in our architecture. From the
integrator's point of view there's an guided process of connecting to a data
source, enriching the physical schema with some ontological data, mapping verb
frames to those ontological concepts etc. We're making this as simple as
possible - you need to speak English (English only at the moment, apologies),
know your schema, and know the concepts it represents.

We're really hoping to produce a demo integration soon, and I've mentioned
Magento elsewhere, or something like Wordpress etc. I'll publicise that when
it's available

------
alagu
Would love to see an live example - seems too good to be true.

~~~
thom
That should be coming soon - our first product may well be a bespoke
implementation for something like Magento as a demo, but the big game is
letting anyone with structured data integrate. I'll be absolutely clear - some
people's data is going to be difficult to work intelligently with, and in
those cases we'll shift much more of the burden to the integrator. The sweet
spot is obviously people with fairly clean data models - if you've got a
standard Ruby on Rails app using Active Record, for example, your data is
pretty friendly.

~~~
krichman
I think it's going to take work from an integrator in all cases... How is it
going to translate "developers free on Friday that know Javascript" into the
three-way table join that you are asking for unless you first define "know"
and "free" in terms of your database tables?

~~~
thom
Apologies if I implied we just magically work things out, there is indeed an
setup process where you help the app learn your ontology and how it maps to
your physical infrastructure. This is, however, an iterative process, and
you're likely to be able to get off the ground in hours, not days. Some
people's data will be beyond us, I freely admit - in those cases you can do
the work and supply us with something more or less resembling RDF.

~~~
krichman
I only object to what appeared to be magic. Now that I know there will be a
setup it seems practical and useful. This is a great idea for exploring data,
I hope you are wildly successful.

