Hacker News new | past | comments | ask | show | jobs | submit login

Does anyone know of any other prod ready elasticsearch alternatives? I'm working on a logging infrastructure project for shipping syslog, and it seems no one these days just uses plain central syslog and ES is the standard, but it seems bloated.

I've been tempted to just ship straight to a dB and skip all these crazy shippers and parsers and all the other middle men in the equation.

Also, why has no product unified monitoring and logging? AFAIK that's why Splunk is worth it is you have the budget (I dont)




Grafana's new Loki [1] looks very promising. It bills itself as "Prometheus for logging". It doesn't index the text of messages, but it does allow fast filtering predicates on labels and time.

[1] https://grafana.com/loki


ES is not "bloated"; it does require decent hardware, but so does anything else that's taking in gigabytes of data an hour and making it all searchable and indexed. Why bite off working on some weird, less-supported mechanism because of your lack of confidence in Elasticsearch?

> I've been tempted to just ship straight to a dB and skip all these crazy shippers and parsers and all the other middle men in the equation.

You need the parsers. You want to find a needle in a haystack? Good luck without having things broken down into proper fields with metadata.

You need the shippers. Elastic Beats has full backpressure support so that when your cluster is busy, it can intelligently back off. Otherwise, you'll drop logs, or overwhelm the system to the point of uselessness....

> why has no product unified monitoring and logging?

Metricbeat from Elasticsearch + Grafana aimed against Elasticsearch to get you better dashboards and alerting.

Please don't reinvent the wheel on this one. Deploying ELK + Beats + Grafana is not that hard, there's tons of documentation, and it is a very stable product.


Back when I worked at Google, the standard log processing tool was Dremel. You could get exactly the same thing by shipping your logs to BigQuery.

I haven't checked, but I bet it's cheaper than ES for data that's mostly cold, like logs. You will need a separate monitoring solution though.


If you want streaming inserts to BQ, that becomes the biggest cost. Dataflow could be used to turn inserts into batch and gather interesting metrics that you don't want to hit BQ for, but I don't think anyone's open sourced anything in this space. I've implemented streaming inserts to BQ for logs "at scale" and it was at least an order of magnitude cheaper than splunk still. Happy to talk via email.


what was the monitoring solution at the end, where were BQ results going to?


It was generic log aggregation used mostly for incident response and forensics as well as some offline metrics. There were a bunch of metrics that were being created (like 3 different ways) on box with parsing that we were looking at moving into the log processing stream. We had a chat bot that people could use to interact with common queries as well as standard SQL interaction via UI and API auth'd by Google IAM.


we use spark on hdfs.


IMO it is not as widespread because it isn't a good practice. Health monitoring, metrics and logging are orthogonal and really should be handled separately. Monitoring makes sure everything is working properly, metrics is about understanding how it is being used, and logging is about peering inside what is happening. Conflating them hinders their application and makes them less useful.


I like the seperation of concerns, do you have an example of conflating these that leads to issues?


When a customer complains that something's not right, and you check your logs, and the logs have been spewing millions of alarming messages for hours that you wish you had seen before the customer noticed issues, that's when you wish the programmer who wrote the log lines had used the health monitoring framework instead of the logging framework.


> health monitoring framework instead of the logging framework.

What is the difference?

Can you just extract metrics from logs via ES. Logstash even has a prebuilt JAVASTACKTRACEPART for java exceptions.


That user/dev error, not a fault of the system or a problem in handling it all with the same.

Metrics can always be generated from logs, especially structured payloads.


You may take a look at https://blevesearch.com/ written in Go.


If you're looking for SaaS solution, check out DataDog. They offer monitoring and logging (and a bit more) as a packaged service.


As of 6.5, the Elastic Stack ships with a logging app and an infrastructure monitoring app out of the box in Kibana. They are both new, so expect a bunch of new features im 6.6 onward.

The docs have more info about both: https://www.elastic.co/guide/en/infrastructure/guide/current...

Disclaimer: I work on Kibana at Elastic


Clickhouse is a good (self hosted) alternative to Elasticsearch for log storage: it saves a lot of space due to better compression, it supports sql (with regex search instead of useless by-word indexing), and ingestion speed is great.


> (with regex search instead of useless by-word indexing)

Perhaps I misunderstand your situation, but I don't see any "CREATE INDEX" available in Clickhouse, and thus won't "SELECT * FROM logs WHERE match(message, '(?i)error.*database')" require a full column-scan (including, as you mentioned, decompressing it)? Versus the very idea of an indexer like ES is "give me all documents that have the token 'ERROR' and the token 'database'" which need not tablescan anything

I only learned about the project 9 minutes ago so any experiences you can share about the actual performance of those queries would be enlightening -- maybe it's so fast that my concern isn't relevant


Clickhouse is designed for full table scans. It allows one index per table, usually a compound key including the date as the leftmost part of the key. This allows it to eliminate blocks of data that don’t contain relevant time ranges. It is also a column store, so the data being read is only the columns used in the query.

If your query is linearly scalable conceptually, Clickhouse is also linearly scalable. Per core performance is also pretty good. (tens of millions of rows per second on good hardware and simple queries, like most log aggregation queries are)


Clickhouse (like any other SQL DB) would work great if you could chop up your log files into fields and store one type per DB. ElasticSearch is great for this because you don't have to worry about schema- ClickHouse you will... unless you do Two Arrays. One for the field type, and one for the field value.

If you value being able to store arbitrary log files, ClickHouse is not for you. If you want to build your system to generate tables on the fly- ClickHouse might work.

See: https://github.com/flant/loghouse


You can use materialized views in ClickHouse to simulate secondary indexes. See https://www.percona.com/blog/2019/01/14/should-you-use-click... for an example of this usage. It's about half-way through the article.

Disclaimer: I work for Altinity which is commercializing ClickHouse.


Yes. The TICK stack.

Namely InfluxDB and friends


Since when can you send logs (not just time series) to Influx? That would certainly be news to me, as a huge TICK stack user.


You can send logs to influx no problem. telegraf (The collecting agent from the Influx team) has a logparser input plugin which can parse logs via patterns, it understands Apache logs by default. The catch is that there is no full text search. It's still useful as you can quickly find logs to view by searching fields that have been parsed.


That's really funny. My employer is one of the larger users of influxdb I'm aware of, so much so we had to write some software[1] (we did synthetic benchmarks of 500k metrics per second and it held up just fine) to overcome scaling limitations and I didn't know this. Thanks for the cluebat! TIL!

[1] https://jumptrading.github.io/influxdb/metrics/monitoring/de...


Try https://getseq.net/

It was built for .NET devs using Serilog but can accept structured logs/json over HTTP.


Loki from grafana



Check out Vespa: https://vespa.ai


> Also, why has no product unified monitoring and logging?

In my experience SumoLogic is excellent for this.


just ship to spark?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: