Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: LogsQL – opinionated query language for logs (victoriametrics.com)
61 points by valyala 11 months ago | hide | past | favorite | 29 comments
I don't like the existing query languages for Elasticsearch and Grafana Loki, because they are too awkward to use for typical logs' investigation cases. So I designed new query language - LogsQL - and wrote reference implementation for it as a part of VictoriaLogs - an open source database for logs. LogsQL is based on the following principles:

- Simplicity. It is easy to write typical queries over logs in it. For example, a single word `error` is a valid LogsQL query, which returns all the logs with the `error` word. Another example is `_time:5m error`, which returns all the logs with the 'error' word over the last 5 minutes.

- Composable building blocks similar to Unix pipes, which allow powerful filtering, transforming and calculating stats over the selected logs. For example, `_time:5m error | stats count() as rows` returns the number of logs with the `error` word over the last 5 minutes.

- Readability. Typical LogsQL queries must be easy to read and understand even for persons unfamiliar with it.

Take a look at LogsQL docs [1] and try using VictoriaLogs [2] in production. If you like Unix way and KISS design principle, then you'll enjoy LogsQL :)

[1] https://docs.victoriametrics.com/victorialogs/logsql/

[2] https://docs.victoriametrics.com/victorialogs/




Am I the only one that feels that EVERYTHING is wrong in this ELK, Splunk, etc. Grafana world? The user interfaces that these monstrosities present us with are barely useable, everyone has their own query language, they force us to install their own agents own our hosts and servers, when I upload logs, many can't even take random JSON logs and input them in a structured way without defining pipeline rules or what now. And did I say that the Logstashes and Promtails and Vectors and what not pipeline tools with their Grok etc. filters feel like somebody wanted to really make busywork cool.

I am happy that in my day to day work I can dump my mostly Linux logs to rsyslog, and eventually forward them to S3 glacier for a few years.

So I am guessing the question I am asking is that what in the world are you doing with these observability or SIEM platforms and is anyone actually deriving some REAL value from using them?


Genuine question: does your job involve troubleshooting from logs on a regular basis? Because if it does, I would be surprised that you feel the way you do.

My experience is with ELK but at least Kibana interface is pretty decent for applying filter combinations to find the needle in a haystack of logs.

And in terms of ingestion, if you are in a container environment you can just configure stdout from the container to be ingested - no agent required.

Building a system that can ingest a few GB of logs a day, index them in near real time and keep them around for a few months while keeping search speed usable is not as easy as it might seem at cursory glance.

But the real challenge is to get developers to write software that outputs structured logs that don’t suck. :)

And don’t even get me started on all the snowflake non-json log formats I’ve had to deal with …


I used to spend a lot of time looking at logs from a complex state machine. I would pull up a half day of logs in less (maybe a few GB), and search for something I was interested in like an id from an error message. This could be slow (tricks were disabling line numbers and searching backwards from the end) and then answer questions of the form ‘how long from this line until the next line matching x?’ or ‘what events of type y happened after the first event of type x after this log line’ or ‘what events of type x containing y happened between these log lines’ and suchlike. This is annoying to do with less; useful tricks were learning the shortcuts, taking advantages of regexes to highlight things or advance long wrapped lines with /^ and copying interesting messages into an editor and writing notes.

ELK/grafana don’t give great solutions here. Elastic/kibana already struggles with the first part because it doesn’t make it ergonomic to separate the part of your query that is ‘finding the log file’ and the part that is ‘searching within the file’. For the rest the UI tends to be more clunky and less information-dense than less, though if your data is sufficiently structured the kibana tables help. In particular, you’re still copying notes somewhere but next before/after queries aren’t easy/quick/ergonomic to express (I think), and changing the query and waiting is pretty slow. The typical search described above would be fast because the result wouldn’t be far away, and pressing n or N brings the next result exactly where you expect it on the screen, so you don’t need to try to find it on a page.

I think sql-like queries aren’t great here either because they are verbose to write and require either very complicated self-joins/ctes trying to write queries that find rows to search between/after or copying a bunch of timestamps back and forth.

Something people sometimes talk about is ltl (linear temporal logic) for log analysis. I think it can be useful for certain kinds of analysis but is less useful for interactive exploratory log querying. I don’t know of how to do ltl queries against data in Loki or Elasticsearch.

To be clear, most of the reasons that elk/grafana don’t work for the use case I described vaguely above are problems with the frontend ui rather than fundamental issues with the backend. It may just be the kind of logs too – if your logs all look similar to logs of incoming http requests, the existing ui may work great.


In the past I spent a lot of time cutting up logs with grep, less, jq and Perl. It was amazing UX that Kibana can't beat in terms of performance, assuming you already know the time-window you are interested in (although I never learned enough gnuplot to be able to do visualisations so Kibana wins there). However, all that went out of the window when I moved into a world of milti-instance micro-services in the cloud and SOC2 compliance. No more downloading of logs to my local machine and cutting them up with sensible tools. :(

That said, nothing that you outlined above is particularly difficult in Kibana, the main annoyance being the response time of the query API (somewhat mitigated by indexing). Based on your vague description my vague workflow would be:

  - filter for type x
  - limit time range to between occurrences of x
  - change filter to type y
  - an any point pick out the interesting fields of the log message to reduce noise in UI
  - save and reuse this query if it is something I do regularly
  - if your state machine has a concept of a flow then filter by a relevant correlation ID
Not sure what you mean by "finding the log file" since Elasticsearch is a document database where each log line is a document.


VictoriaLogs could fit your use case:

- It natively supports 'stream' concept [1] - this is basically logs received from a single application instance.

- It allows efficiently querying all the logs, which belong to a single stream, on the given time range, via 'curl', and passing them to 'less' or to any other Unix command for further processing in streaming manner [2].

[1] https://docs.victoriametrics.com/victorialogs/keyconcepts/#s...

[2] https://docs.victoriametrics.com/victorialogs/querying/#comm...


I am indeed excited about VictoriaLogs.


Take a look at quickwit. Its basically a clone of elasticsearch but in rust.

I have around 380 TBs of logs currently in s3 and have sub 1s searches for needle in the haystack searches. It handles all that with just 5 search nodes running on kubernetes with 6gig of RAM each.

I'm ingesting around 20TBs of logs a day on it.


Yes there's real value there. That everyone's got their own flavor is a bunch of extra work because we haven't solved the coordination problem yet is annoying, but that's solved by choosing one and sticking with it. Learn that query language really really well, and don't touch anything else. Splunk is useful as all hell once you get over the hump of learning their proprietary query language. it's really friggin useful. it's useful to the tune of Cisco buying them for $28 billion. people are deriving real value from them, the question is what are your problems that this can solve for you, but do you even have those problems in the first place? If you've not found it useful then why are you stuffing logs into S3? just send them to /dev/null instead


>> just send them to /dev/null instead

I wish. But 'regulatory compliance'. And 'we might need them' - just not sure for what - but we'll try another data analyst next quarter. Thankfully because of the GDPR (and maybe other reasons) there is a healthy pressure to also cleanse us from the logs we've collected.

That said, I agree, based on my trials (and mostly errors), Splunk seems one of the better ones. Not considering the cost anyway. My trouble is that I am not a data analyst, but I get asked more than I would like about these things.


As other commenters already suggested I think it just comes down to what your actual day-to-day job is. In some companies you have dedicated data engineers whose job it is to understand these complex logging systems. But despite the complexity they may still derive value from it since they are deeply involved in writing SomeQL queries pretty much all day.

At my place of work we do not derive much value from our ELK installation, because we do not interact with it every day. But since everyone is used to grep, awk to some extent from other activities, these are the tools that are used when incidents/bugs happen and the cause has to be found.

As with all the other stuff out there, ELK etc. may simply not be for everyone.


I actively use grep, cat, uniq, sort, less and other Unix commands every day. That's why I added good integration with Unix pipes into VictoriaLogs, since it may cover missing functionality [1].

[1] https://docs.victoriametrics.com/victorialogs/querying/#comm...


> And did I say that the Logstashes and Promtails and Vectors and what not pipeline tools with their Grok etc. filters feel like somebody wanted to really make busywork cool.

The worst part about Promtail/Vector is that you have to write code in YAML. Why.


Vector at least supports TOML, not just YAML [0]

that, plus having support for built-in support for "unit testing" processing pipelines [1] are two features that made me immediately want to ditch our existing Promtail configs and switch to Vector.

0: https://vector.dev/docs/reference/configuration/#formats

1: https://vector.dev/docs/reference/configuration/unit-tests/


None of those require agents?

But for most people, an agent is simpler.


Interesting to see a new approach!

You wrote that you don't like Loki's LogQL, but it looks quite similar (Victoria's LogQL first):

  log.level:error _stream:{app!~"buggy_app|foobar"}
  {app!~"buggy_app|foobar"} | "log.level:error"

The pipes are arguably a bit noisy in Loki queries (compared to spaces in Victoria's), but I find they do make the queries a bit more readable, and it's easier to understand under the hood how the queriers will handle the query, and coming from PromQL I found Loki's approach quite intuitive: https://grafana.com/docs/loki/latest/query/.

Maybe I missed something fundamental though, interested to hear more about the differences, since I only read the couple links you shared!


As an addendum, I think the main flaws with Loki is log presentation in the Grafana UI, rather than necessarily querying syntax (extracting data from log lines with regex etc/pattern matching is gross, and if you're doing that you have a bigger problem).

With the Grafana Loki UI the main issue is if you log perfect JSON / logfmt and Loki parses it, the Grafana UI can struggle a bit with rendering views of logs in the way you'd want. Kibana's columnar views with nice filtering UX is much easier to use. I think Elastic still has this over Loki (that, and the capacity to build big expensive indexes if you have the stomach to manage them).

I wrote a custom CLI for Loki to work around this, because the Loki CLI has a similar problem. All that said, I'd still recommend Loki, it's really good IME.


- Loki doesn't allow queries without stream filters. This may be very inconvenient. For example, try selecting all the logs with the 'error' word in Loki. In LogsQL you just type 'error' and that's it!

- As I know, Loki doesn't allow selecting all the logs on the given time range. For example, try selecting all the logs for the last 5 minutes in Loki query language. In LogsQL this is just '_time:5m'.

- Loki has unreadable syntax for calculating analytics over the selected logs. For example, counting the number of logs with the 'error' word over the last 5 minutes in Loki looks like:

count_over_time({required="stream_filter"} | "error" [5m])

Compare this to LogsQL:

_time:5m error | stats count() as errors

- Loki doesn't allow calculating multiple stats in a single query. For example, try calculating the number of logs with the 'error' word, plus the total number of logs, over the last 5 minutes. In LogsQL this is easy:

_time:5m | stats count() if(error) as errors, count() as total

- As I know, Loki doesn't provide functionality for sorting of the returned logs.

- Loki can return only up to 5000 logs from a single query by default. VictoriaLogs allows returning billions of logs from a single query, without any performance and resource usage issues [1].

[1] https://docs.victoriametrics.com/victorialogs/querying/#comm...


I think we are conflating Loki/VictoriaLogs (the UI, API, and log databases) and LogsQL vs LogQL (the syntax).

> Loki doesn't allow calculating multiple stats in a single query

Victoria's support for projecting multiple stats in one query is nice, with Loki's LogQL you need to write two separate queries.

> Loki doesn't allow queries without stream filters

True! I don't mind filtering for e.g. `{env: "production"}` but each to their own :-)

---

RE Loki vs Victoria, which I don't really want to get into since I am not a vendor and have not tried Victoria...

> Loki doesn't allow selecting all the logs on the given time range

True! TBH I don't mind selecting a date from a date picker. One issue is that I guess it's going to be a bit slower to write precise date/time ranges if they're baked into the query language?

> Loki doesn't provide functionality for sorting of the returned logs > Loki can return only up to 5000 logs from a single query by default

I think these are Grafana UI restrictions -- as I wrote above there are indeed limitations with the Grafana UI, but I don't think they're necessarily syntax related.


I believe pipes for logs were invented by SumoLogic 10+ years ago. Or maybe someone before that.


Canonical pipes were invented ~50 years ago by Unix creators. They are successfully used for logs' analysis to this day.

Unix pipes have a simple idea - connecting the output of one program to the input of another program. This allows building arbitrary complex data processing pipelines by combining simple programs like grep, cat, awk, cut, head, etc., via pipes. These pipelines have nice properties:

- The data is processed in a streaming manner. This allows processing unlimited data streams without excess resource usage.

- The data processing speed is limited by the slowest program in the pipeline. This allows pausing and resuming data processing at any time. For example, by putting 'less' to the end of the pipeline.

- The data processing pipeline is terminated instantly as soon as some program in the pipeline terminates. For example, if you put 'tail -100' to the end of pipeline, then only the first 100 lines of data will be processed.

VictoriaLogs adapts these properties into logs processing with LogsQL pipes. The difference is that the data, which is flown between pipes, represents structured logs [1].

[1] https://docs.victoriametrics.com/victorialogs/querying/#comm...


You have to feed your logs into the VictoriaLogs database in order to use LogsQL, right?

"LogsQL is a simple yet powerful query language for VictoriaLogs."


Correct for now. I'm sure LogsQL support will be added into other systems for logs' management over time.


Recently there seems to be an bunch of SPL (Splunk) -like query languages popping up: PQL, PRQL, Grafana Explore Logs syntax, Kusto.. Probably others as well. Does yet another similar but slightly different language make sense? Why not leverage an existing one?


Diversity is great! Let's see which query language for logs will survive.


True. My bet is that arbitrary text queries converted to SQL via LLM will become an increasingly popular alternative. Not sure if it will completely replace the custom query languages though..


I'm a https://logdy.dev (logs to UI interface) author and been recently thinking about how to enable users use a query language to search throught logs beyond usual filter. I was looking at LogsQL but then I felt that is just another QL a user will need to learn. My next though was on SQL, but it was not designed for this purpose. Any ideas? I would appreciate any recommendation (peter at logd.dev)

[1] https://github.com/logdyhq/logdy-core


How about allowing users creating pipelines with good old grep, awk, cut, etc., directly in the web UI?


IMO SQL is great for logs, ClickHouse and DuckDB SQL in particular are awesome


I find SQL nice for a happy path - highly structured, non-nested, and simple log scenarios, such as some heavily curated app log subset like an app-level authentication audit log

When we have arbitrary logs from all over, and especially system.. I appreciate the sloppy and dynamic nature of SPL and its derivatives

We have ported our app tier to OTel metrics/logs/traces, and I've been curious what will be more 'right' in this new era. Likewise, with louie.ai, we have genAI do the text2query for us by default, which shifts things a bit as well.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: