Hacker News new | comments | ask | show | jobs | submit login
Index Search Algorithms for Databases and Modern CPUs (2010) [pdf] (arxiv.org)
58 points by tjalfi on July 8, 2017 | hide | past | web | favorite | 11 comments

I can't wait for for spatial databases with GPU acceleration. Things like massively parallel collision checking for complex shapes become relatively fast when you have a GPU.

Original author here; was quite surprised to find this on the front page today.

Let me know if you have any questions. Happy to answer them!

Looks like the algorithm 2.1 has a flaw: after low becomes equal to high , we need to take keys[low] and vals[low] instead of midKey and vals[mid] in last "IF", in other case midKey / mid may remain not actual and the function can return nil.

I put it coded in golang there https://play.golang.org/p/hXdA0CY755

PS I like the article - I've never realized that such low level thing as L1/L2 cache are so important for query optimization!

Foremost, thank you for your work on this subject and your contributions.

As this is work from 2010, and, to cite from your paper:

> In this work, we will not consider finding multiple records at the same time. The algorithms given in pseudo-code will not directly support range queries. In case there is multiple matching records for the same key, the algorithms will only return one of the matching records. For most of the algorithms given here, it is however simple to extend them for range queries, or returning all of the records matching a key. We also will not consider set queries, where all records matching a set of keys are returned from one search operation. There is some potential for follow-up work in this area ...[snip]... Current commercial database systems already know when to switch from index probing to a full table scan. There might be some potential for performing better than both of those when explicitly considering this case in index structures.

The precise reason for using b-tree(-ish) data structures in databases and search engines like Lucene is precisely those two issues: to enable set and range retrievals. Have you published any follow-up of your work that extends the algorithms presented to cope with these two issues?

Last I checked, Lucene doesn't use B-trees. It uses a variation on skip lists to implement efficient inverted indices.

As I understand it, its performance comes from a column-store-like approach where large document lists can be efficiently intersected. Lucene also uses a LSM-style approach when writing index files; instead of updating a single large index in place, it incrementally appends to new partial index files, and to execute a query it queries every partial index and merges the results. (These partial index files are then incrementally merged in the background to reduce the number of files that need to be aggregated at query time.)

Lucene is slower than an RDBMS on narrowly targeted queries where an RDBMS can use a B-tree, but is more efficient when a query matches many documents because its index data structures are more efficient than a classical "sequential scan" across row pages.

Not sure I understand your reply - which of it explains how NitroGen (i.e. the indexing algorithm described in the linked paper) has evolved?

I was responding to the assertion that Lucene uses B-trees: "The precise reason for using b-tree(-ish) data structures in databases and search engines like Lucene is..."

Oh, that's why I had mentioned tree-ish structures, sort of referring to skip-lists & friends on a broad level. Sorry for the misleading expression. Rather, I'm interested if any ranged/set retrieval approach can be found that scales better than log n. (n.b. ranged, so LSH doesn't count either ;-))

My question: why did your advisor stop updating his fantastic youtube channel[1]? I'm a huge fan of his ability to communicate the concepts.

[1] https://www.youtube.com/channel/UCC9zrtAkl6yY4dpcnWrCHjA

Have you encountered anything else since the paper was published that is worth mentioning?

What tool did you use to create diagrams? They look nice.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact