
A Predictive Database - tlarkworthy
https://aito.ai/blog/introducing-a-new-database-category-the-predictive-database/
======
fragsworth
I think this kind of tool can be neat, and probably sometimes useful, but
honestly I do not hope that tools like this become common, and used in
inappropriate places. I am sick and tired of recommendations that are based on
actions that are not inherently intended by me to affect my recommendations.

Take YouTube, for instance. It has gotten so bad that I actively avoid
watching videos that I might otherwise want to see (even when I just want to
educate myself - a simple example might be Nazi war propaganda videos),
because I really don't want all my YouTube recommendations to turn into
similar crap. I therefore use the service less. And I like the service less.
And I think it is now more difficult for people to discover interesting videos
because they largely rely on the inherent behavior pattern-matching and not
robust, intentional searches. And people avoid searching for specific topics
they don't want popping up on their sidebar.

I am learning that my behavior is what changes my settings, and therefore I
should change my behavior if I want my settings to be good. Robust search is
falling by the wayside. This is an objectively terrible situation.

~~~
maccard
I've found myself watching those sorts of videos in a separate Firefox profile
that I'm not logged into YouTube in.

~~~
pmart123
I find this very true for things like movie reviews or product reviews. You
likely will be bombarded by a ridiculous amount of similar content for weeks
after you no longer have any intention to watch similar content.

------
arauhala
Hi, the author here, we made predictive database to help developers test,
prototype and productize predictive functionality - lighting fast.

Just to be clear, the predictive database's value proposition is two fold.
First: querying for predictions in instant is much faster than fitting &
deploying ML model and using it. Second: it looks like a database and it is
used like a database so it is familiar and easy to use.

I am available for any questions, here or via email (antti@aito.ai)

~~~
bytematic
Is this using Collaborative Co-Occurrence for recommendations?

~~~
arauhala
The recommendations are content based. Basically, if you have a preference for
certain item or feature, it will get better score.

Compared to collaborative approach: content based scoring works better for
learning routine, e.g. the weekly grocery shopping routine. It also works
better in situations, where there isn't lot of samples about the recommended
content, but there is lots of metadata about options. E.g. the sales situation
is such: you likely haven't sold before to this customer company, but you may
have lot of information about it

------
mrits
Would have been cool if they weren't already called recommendation engines.
Introducing side effects into data loading or DBMS scheduled tasks is not the
future but the past.

~~~
arauhala
Aito can do statistical predicting, recommending, matching and relating, and
of course normal queries and FTS search.

It has been built bottom up to provide programmers ability to query unknown
(like ML system) as an addition to known (like database).

I'm not certain how this relates to the recommendation systems you are talking
about. Perhaps you could provide me a link to one.

------
kcolford
This overlooks a huge part of data science which is cleaning your data. The
rest is relatively available with off the shelf tools.

~~~
asdfman123
The reason data science is so expensive is because you have to spend so much
time cleaning your data, teasing out the true relationships between things,
and avoiding common pitfalls.

If you want quick, cheap predictions, there are tools out there that make it
very easy, like Azure Machine Learning Studio where you just paste data and
have common algorithms run on it.

------
cocktailpeanuts
It's just a query language. not a "new database category".

~~~
fr1tkot
they've also somehow modified Lucene's search algorithm to perform some ML
calculations, but it's not clear what they've implemented. still not a "new
database category" though.

~~~
arauhala
Founder here, Aito has a custom database, that has been bottom up optimized
for statistical operations.

This let's Aito create models at spot to answer the predictive queries

~~~
jrumbut
Could you share any of the thought process that led to making this from the
ground up rather than extending/layering on top of an existing DBMS (or,
conceivably, multiple DBMSes)?

Was there indexing and storage engine considerations? Was it a lack of
interface support for this kind of thing? Marketing? I could see a lot of
arguments either way and wondered what convinced you.

It's always auspicious to start a software project in Finland, all the best of
luck on this! The site looks great.

~~~
arauhala
Aito builds a model for predictive queries in millisecond scale. It requires
heavy optimizations and preparations in DB to reach the performance. The
indexes are optimized for statistics, and there are extra datastructures not
found in normal DBs.

The ML is also implanted inside the database to minimize various overheads,
and to have direct access to data & invested. if you need to do thousands of
statistical operations in 10ms, just IPC can become a huge overhead. You want
to put data & math in same process.

Overall, its all based on tight AI+DB integration to enable the instant
modeling.

~~~
lumost
can you share details about the "instant modeling" capability? while simple
copula's and correlation matrices may be calculable in milliseconds, larger
models are likely to have more performance considerations and training
latencies.

~~~
arauhala
It's done for discrete data. Operations on such data can be optimized to
pretty extreme level. Aito scales to around million rows and million features
before slowing down too much.

We believe we can make it scale to 10m or 100m rows in the future. Maybe more

~~~
lumost
got it, any plans on supporting models requiring backprop/gradient descent?

------
YeGoblynQueenne
>> The end users have gotten used to AI-driven features like recommendations.
Features like personalization can provide huge benefits for both the user and
the business.

While it's true that "the end users have gotten used to AI-driven features
like recommendations" most of the time the user is "used to" recommendations
like Medieval peasants were "used to" poverty, wars and the Black Death. If I
had a penny everytime I heard an "end user" making fun of e.g. Amazon's
recommendation algorithm I'd be a penny billionaire (latest example:
"everytime I order shoes on Amazon it shows me shoes for a week afterwads").

"Personalisation" in particular usually means personalised advertisement. I
don't think at this point anyone can seriously deny that personalised
advertisement is just personalised nuisance. It seems that only the people
working in advertisement companies are immune to this observation. As a small
bit of concrete evidence- well, that's why we have ad-blockers (and the
success of ad-blockers, evidenced by attempts to er, block them, is evidence
of the strength of feeling against internet advertisement, personalised or
otherwise).

So, yes, personalised ads can provide "huge benefits" for businesses, as long
as those businesses can profit while ignoring the annoyance those ads cause to
the users. How the user benefits- that's another matter and I'm very skeptical
of the article's claim that the user also reaps "huge benefits" by
personalisation in this context.

Edit: just noticed the author of the article is participating in the thread. I
hope the above doesn't come across as a criticism of the product itself. I'd
be interested to know how the "predictive database" can help reduce the
nuisance of targeted advertisement. For example, is the predictive database
smarter than a typical recommender engine? Can it avoid situations like "I get
shoe ads for a week afterwards"?

------
ieatwatermelons
not sure what is so special about this. sounds like just a normal sql database
queried with yaml file and (AI part is just) ranked with fuzzy search.

~~~
arauhala
Founder here. Aito had a custom database optimized for statistical operations.
It essentially creates Bayesian models real time to answer the queries.

~~~
zcw100
Sounds like BayesDB. (That was an academic project and I’m not sure if it’s
still being worked on).

~~~
arauhala
True. Aito does resemble BayesDB. BayesDB wasn't the inspiration for Aito (I
found it afterwards), but it is extremely impressive piece of work.

The biggest difference between BayesDB and Aito is that BayesDB is built on
top of SQLite, while Aito has its custom implementation. I have understood,
that the SQLite approach puts pretty hard limits on the BayesDB's scaling. The
custom database enables pretty radical optimizations, which allow much, much
bigger scale.

------
fogetti
> For example if we predicted how likely a vegetarian is to purchase bacon,
> Aito could return that it is very likely, because based on data, that's the
> common average.

Why is this a good thing??? Maybe it's just me who doesn't understand the
point of this? I see this "fetaure" as a benefit in some cases, but this makes
me very doubtful very quickly in most instances.

------
anentropic
but why is JSON the query language...?

------
Tycho
Any similarities to BayesDB?

