Hacker News new | past | comments | ask | show | jobs | submit login
In-Memory Performance for Big Data (2014) [pdf] (vldb.org)
29 points by lichtenberger 20 days ago | hide | past | web | favorite | 7 comments

Has this pointer swizzling concept been adapted by any big data-focused databases?

Would be interesting to see how this works with bigger pools of volatile and persistent memory (disclaimer: I work at Intel on Optane DC SSD/PM-related things).

BTW: What's your take on random read speed? I think with SSDs now and maybe NVRAM in the future we can address physical storage more fine granular than with HDDs and for instance with http://sirix.io data is only clustered during writes (batched and synced to the flash drive) and it versions variable length in-memory and physical pages such that not every record has to be copied and written again. For instance to reconstruct a page we have to read from random places (maybe in parallel) to reconstruct a page in-memory. Thus Sirix relies on fast random read speed and on find granular access to the physical stored records.

My take on random read speed? Check out this paper from Lenovo testing the performance of Optane DC Persistent Memory on their servers: https://lenovopress.com/lp1083.pdf.

There's another good paper from a team at UCSD testing all sorts of usages and file systems with NVRAM here too: https://arxiv.org/pdf/1903.05714.pdf.

Sirix sounds like a good fit to take advantage of App Direct access on Optane DCPM, but you're way more of an expert than me. Perhaps make a request for alpha access at GCP to test Sirix with the new tech? https://docs.google.com/forms/d/e/1FAIpQLSeX1tN6Qt-aQUK2iVVi...

I'm not entirely sure, but I'll apply the technique for my open source storage system. I have reference-instances with a log-key, persistent-key (file offset for instance) and will (re-add) the Java object references. In order to set them to null when the page is going to be evicted from the buffer cache I'll add a simple in-memory map with the proposed child => parent mapping, such that I can set the in-memory page reference to null again.

The term pointer swizzling isn't familiar to me, but it seems to be the same technique by which variable-width tuples on a page are accessed via a fixed width item pointer array (sometimes called an indirection vector) within most database systems.

Interesting, sounds a bit like the way write-barriers made the combination of copying garbage collectors and direct object pointers feasible.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact