Needless to say, I rather build my own solution for logging instead using a database that's not in-house having experienced all of this.
Oh wow now it says 64GB of ram is the sweet spot.... What the heck is this thing doing that couldn't of been accomplished with MongoDB or PostgreSQL? I've got busier data sets that don't need 16GB of RAM, and yes we pound the database with logs of sorts and query in all sorts of ways and still I don't get it... I wouldn't recommend this stack to a friend unless they've got plenty of hardware to spare.
Part of the problem is what it's promoted for. It's a great drop-in, horizontally-scalable, full-text search engine, that's inexplicably become popular for log ingestion and analysis.
To those ends, I hate it, I hate every bit of it, from the atrocious JSON-based query DSL (seriously thought it was a joke at first) to its unpredictable timeouts, shard storms, mapping conflicts and other problems at scale. Elementary SQL concepts aren't possible ('select someone_elses_poorly_named_key as first_name', nope, you gotta reindex). High-cardinality aggregations fail in spectacular ways. High-anything aggregations fail in spectacular ways. The scroll API returns results unordered. There's no way to properly spec your cluster; the docs explicitly take a trial-and-error approach to design.
It's not just you. Elasticsearch does me no favors with the task of log analysis. I'd sooner normalize and grep a pile of gzipped log files than keep dealing with this mess, but this is the second job I've been at that's built their logging infrastructure on top of it.
> I do not understand why they built Elasticsearch.
"You know, for search."
It's great for searching proper text. Documents, comments, blog posts, etc.
> I've read in a couple places you need like 32GB of RAM just to run this thing to do queries, and having crashed Kibana / Elasticsearch a dozen times I believe it's designed poorly.
You don't necessarily need 32GB to do anything. The required heap size scales with your intended workload, but it's not like you can do a back-of-napkin calculation to figure out the relation. I run a 1GB instance for development.
It uses the memory to do a lot of caching so the queries you throw it are lightning fast. Mongo does something similar (but crappier IMO) with its concept of working sets.
I could not agree more, I found this in my codebase https://i.imgur.com/44stiHv.png
I've never seen an odder choice of data structure.
Is there a way to know how much RAM you are going to need for your dataset? I think I was using it for looking up restaurant names from a database of 100,000 and I wanted to factor in misspellings and partial matches.
I think Elastisearches policy on sizing is pay us or a partner to have a look and give you a guesstimate, which is pretty standard
Unless there are hundreds of GB/day to index it's much simpler to forward the logs using syslog or journald and then use grep on the collector.
Edit: And yes originally used it for log processing, but really people should definitely try it out for replacing very expensive BA stacks. For log analysis something called Graylog that actually uses Elk internally or just go with Splunk which gives you out of the box primitives for session length calculations. In ELK if you want to do something that sounds as simple as session lenght - you'll ending having to reprocess documents using Kafka or Spark with background jobs that reprocess documents greater than 1 hour old (or something to that effect).
My realization was also what others have mentioned, that I'm trying to use a search engine backend for log storage.
Specifically 3 months, which is the law here. That meant that ES had to keep 3 months of logs readily searchable. It's just not feasible when you're generating 25-30G logs each day.
I still use ES for search engines but I've stopped using ELK for logs unless it's for small environments.
For some reason though the past few years companies and people have become obsessed with gui's, and nagios/cacti were falling out of favor, so people started just dumping elk/graylog so randoms could quickly and easily get that gui... without considering the resource requirements and lack of scalability. (graylog2 fixed a lot of the graylog problems)
It seems a lot of managers also started wanting gui-dashboards, which is probably the big reason behind the push. Regardless, I don't think the trend is going away, so the market is ripe for disruption for something that does the same thing but faster and with less resources. The real problem is most of the competitors for some reason decided proprietary was the way to go, and the people using these tools don't want more proprietary bullshit in their stack.
The feature that makes them different from other web-gui-graphing tools though is the search/query customization.
But in its core, Elasticsearch is meant for full text search and analytics. Not logging. Logging is not even an interestinng use case.
I think ES's Java heap default is 1 or 2 GB, and that is more than enough for many use cases. Heap-heavy operations like sorting and aggregations may need more RAM depending on the index size. As far as I know, search isn't heap-heavy so you only need more RAM as query volume increases or index size increases.
For crashes, what version? Version 6.x has never crashed on me, but previous versions did have a tendency to crash for me.
It's good at text but can be used as a similarity search system, so you can index and find similar images, audio waveforms, binary data, etc.
Over the years, search queries have become useful for log analytics and so the ELK stack has been developed to become a single solution to do both text search as well as ingest logs and run all kinds of queries, aggregations, and even machine learning.
I think that's what fundamentally hurts it as they are very separate use-cases trying to be served by the same system. We still use ES for search but wish it just focused on that with much better performance, reliable clustering and proper transactions instead.
While you can do all of that with plain pg and having no indexes and doing table-scans, it will be slow when you search (depends on how often you need to search and type of queries).
So es is very good at some stuff compared to every other type of db. For logging, yeah, there are some companies who just do full-scan on each query and are fine.
but you can have indexes in pg..
Rendering HTML will, however, tend to be faster than having a client parse and interpret JSON, and then manipulate the (shadow) DOM accordingly.
And then it always felt wrong to me that I have to force countless clients to do the exact same view transformations, which could have been done once, on the server, and cached.
For a new project, I'm prerendering HTML fragments via AJAX with IntercoolerJS - it's a pleasure to work with, and the developer and user get both the best experience of those two worlds: A dynamic, blazingly fast UI that feels like an SPA, but degrades gracefully even for the Emacs-w3m user and Googlebot.
I also found it's really worth it to move servers or reverse proxies closer to the client - independent of client/server rendering. Users shouldn't wait more than 200ms for an Ajax request to finish.
They were the ones who insisted I use tableau, I had originally planned and suggested a nice server-side rendered dashboard that they could have used on any grade of laptop, but no they wanted something _right now_ and any kind of proper solution was simply "too hard".
The real irony being that they'll waste more time doing and redoing these excel reports every time they want one of these reports than it would have taken me and my teammate to build a solution for them.
Fun fact: We actually started out with elasticsearch-head, but realized it was really hard to hack on (this was ~2y ago) and decided to create Dejavu. Imo, the key difference is we have focused a lot on how to get the data indexed (mappings, bring your CSV/JSON files) and have the raw data visualized (via UI based filters or DSL queries).
So less so "missing" UI and more "outdated" UI.
I personally like Kibana but I am happy to see some innovation in the space. Competition can only help.
I love that this can be run as a Docker container or as a Chrome extension. With the quick setup I'll probably give it a try. It looks nice.
Kibana is more for dashboarding and visualization, Dejavu gives you control of the raw data views and indexing operations.
I'm not sure if this is actually accurate:
On a cursory look, it still seems to use a Node.JS based server . One of the key side effects of needing a server is that there is a more involved distribution + installation process. You just can't run it on a static server like github pages or as a browser extension.
Is websockets a requirement just for the real-time updates, or for the entire UI?
Edit: We need to update the GIFs and screenshots - the app has come along much UI/Ux wise since these were originally taken.
You might say I had a case of, ah, never mind, bad pun.
On a serious note, I understand there's only a finite number of words, but a limitless number of software project - and yet surely there are still enough names that aren't used for something well-known and widely used. Yes, the DJVU and Dejavu are spelled differently, but it is the same word.