Hacker News new | past | comments | ask | show | jobs | submit login
Why and how to search two years back in your Elastic search logs [pdf] (cloudvyzor.com)
54 points by cloudvyzor 33 days ago | hide | past | web | favorite | 8 comments

I don't get many 'new toy' moments with the unix CLI, and writing those words down makes me want to change that situation.

A number of years ago we had your typical low-disk-space-server problem and someone had bought us time by shortening the log rotation interval and compressing the logs. This is how I (re-)discovered zless and zgrep.

Streaming compression dovetails nicely with a number of kinds of tools, but seems to work particularly well on anything with pipe semantics. I'm certain that phenomenon informed the rather long tenure of the tgz file format.

I was trying to refresh my memory of the BWT algorithm for compression the other day and stumbled on a guy doing a tutorial on how they use it for gene analysis/searches. One of his assertions was that suffix trees and BWT aren't that far apart, and it has me wondering.

Compressing text and searching text are both about identifying patterns. How much R&D have we done on trying to do both at the same time? Is searching for text in a compressed file in log(n) to sqrt(n) time a solved problem?

There's literature on this topic, it is called succinct data structures. Wikipedia got you covered as usual: https://en.wikipedia.org/wiki/Succinct_data_structure

This seems to use this: https://cloudvyzor.com/downloads.html Is it only on Windows?

Is this an ad?

S3 Select

One issue with the title: the presentation says CSV and not JSON.

We changed the title to that of the article, as the site guidelines ask.

(Submitted title was "Convert: Elastic Search snapshots to zipped JSONs. 60TB to 3TB searchable [pdf]")

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact