Hacker News new | past | comments | ask | show | jobs | submit login

> There are immediate applications to data warehousing.

I am not too worried about data warehouses. Data warehouses can be fed by batch processes, which you can run, say, once every day or once every week. This leads to the design of algorithms that optimize aggregate measures of performance (e.g. amortized complexity, average complexity, throughput) rather than microscopic measures of performance (e.g. worst-case complexity, latency). Machine learning techniques have a track record of delivering good aggregate results.

However, online databases require optimal worst-case performance. The occasional massively slow operation is not okay - we'd rather every operation be a tiny bit slower instead.

> Do I have any indexed queries? If not, nothing to do here.

Of course I do. Needless to say, complex schemata come with lots of indices that are necessary to speed up queries that pull data from 10-15 tables each.

> Does the data change rapidly? If so, nothing to do here.

Of course the data changes rapidly. Crucially, an online transactional database's input comes from external parties (e.g., public-facing web servers) that cannot be considered trustworthy. Many machine learning techniques are vulnerable to so-called “adversarial examples”. Feeding adversarial examples to an ML-powered index could cause the database's performance to drop drastically in ways that are provably not possible with B-trees.




So, would you take a bet that this won't have been used in any self-evident way within the next 5 years?

Because I would take the other side of that bet. :)


I'm not in the business of predicting the future, since that is beyond my control. What other people do with learned indices is up to them.

I'm just stating my concerns.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: