Hacker News new | past | comments | ask | show | jobs | submit login

YouTube fed me a video I’d watched a couple years ago about a log searching tool. Compressed data, decompressed into cpu cache, and then scanned. Almost no indexing at all, just cpu cache for speed.

We spend a lot of time arguing with the cpu. If we give it something it’s actually happy doing, it almost doesn’t matter how stupid that thing is, because it’s stupid fast.




Interesting. Would really appreciate if you could share the video with us. Thanks


This is one of the main selling points of Blosc https://blosc.org/pages/blosc-in-depth/


I couldn't find the talk hinted at, but this blog post seems to touch on some of the same ideas: https://www.humio.com/whats-new/blog/how-fast-can-you-grep/


Brute force search is how Scaylr does it. Here’s an article they wrote about their implementation.

https://www.scalyr.com/blog/searching-1tb-sec-systems-engine...


> Compressed data, decompressed into cpu cache, and then scanned. Almost no indexing at all, just cpu cache for speed.

Where are these programs not using the CPU cache?


Indexes and tree structures involve pointer walking. Nothing is even guaranteed to be in main memory, nevermind L2 cache. These guys apparently went straight from disk (linear reads) to L1/L2.


assume they meant decompress small enough a block to fit the CPU cache, and do in-memory scans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: