
Prior Knowledge: A Predictive Database For Developers - iProject
http://techcrunch.com/2012/09/11/prior-knowledge-a-predictive-database-for-developers/
======
AndrewKemendo
So, this seems fairly intuitive to me - mostly because I have been trying to
build one of these for predicting strategic actions by states ie: when is x
country going to launch a missile. I am curious what they are using as their
heuristics for developing solutions and how they are determining confidence
intervals (my guess: number of samples).

It is actually pretty easy to build a process that "fills in the blanks" the
hard part, obviously is getting those blanks correctly. In my experience that
takes a fairly deep set of expert opinions and coherent "causal" node
programmatics which have assigned confidence intervals. Not to mention the
fact that people are going to be scrutinizing this data and if it is wrong,
they are going to lose faith in the system, and quickly.

When you talk about implementing this on databases however, either the blanks
which are being filled need to not be critical processes which require
precision or they need to iterate corrections very quickly. In the end I think
this is useful for non-critical products which users can correct if necessary.
If however they have a novel solution with a significant history of "right"
answers, they may be onto something in machine learning.

------
micro_cam
Intriguing. The linked article mentions linear regression but this article:

[http://blog.priorknowledge.com/blog/scaling-not-your-
problem...](http://blog.priorknowledge.com/blog/scaling-not-your-problem/)

suggests something that handels scale, heterogeneous and missing data
gracefully. Maybe Random Forests?

------
codenerdz
Their PreQL demo during the Disrupt was pretty amazing. Im looking forward to
using their API for several ML experiments.

------
Sol2Sol
Prior Knowledge's Veritable database sounds intriguing. I do a bit of work on
the analytics side with medical claims data. I wonder how effective Veritable
would be against a large medical claims data set where you have multiple
hospital claims, lab claims, pharmacy claims for each member.

