
Why and how to search two years back in your Elastic search logs [pdf] - cloudvyzor
https://cloudvyzor.com/How%20to%20search%20in%20your%20ES%20snapshots%20-%20v1.2.pdf
======
hinkley
I don't get many 'new toy' moments with the unix CLI, and writing those words
down makes me want to change that situation.

A number of years ago we had your typical low-disk-space-server problem and
someone had bought us time by shortening the log rotation interval and
compressing the logs. This is how I (re-)discovered zless and zgrep.

Streaming compression dovetails nicely with a number of kinds of tools, but
seems to work particularly well on anything with pipe semantics. I'm certain
that phenomenon informed the rather long tenure of the tgz file format.

------
hinkley
I was trying to refresh my memory of the BWT algorithm for compression the
other day and stumbled on a guy doing a tutorial on how they use it for gene
analysis/searches. One of his assertions was that suffix trees and BWT aren't
that far apart, and it has me wondering.

Compressing text and searching text are both about identifying patterns. How
much R&D have we done on trying to do both at the same time? Is searching for
text in a compressed file in log(n) to sqrt(n) time a solved problem?

~~~
rolling_roland
There's literature on this topic, it is called succinct data structures.
Wikipedia got you covered as usual:
[https://en.wikipedia.org/wiki/Succinct_data_structure](https://en.wikipedia.org/wiki/Succinct_data_structure)

------
bleonard
This seems to use this:
[https://cloudvyzor.com/downloads.html](https://cloudvyzor.com/downloads.html)
Is it only on Windows?

------
gdm85
Is this an ad?

------
kevrone
S3 Select

------
marcinzm
One issue with the title: the presentation says CSV and not JSON.

~~~
dang
We changed the title to that of the article, as the site guidelines ask.

(Submitted title was "Convert: Elastic Search snapshots to zipped JSONs. 60TB
to 3TB searchable [pdf]")

