Hacker News new | past | comments | ask | show | jobs | submit login
Lnav – An advanced log file viewer for the small-scale (lnav.org)
179 points by thunderbong on Jan 5, 2023 | hide | past | favorite | 58 comments



Say I'm building a logging library, is there any sort of generally agreed-upon standard per what a log file should look like? I know there's several formats that lnav supports, but I'm not familiar with them.

I've always had trouble before figuring out what a good compromise between log format and flexibility looks like. Especially wrt newlines in the log message itself (eg if I want to log a stack trace)

Any thoughts for a format that would generally work ootb in lnav or other log viewers?


Not sure about lnav but most log aggregation systems support json and logfmt-formatted logs, and there are many standard logging libraries that supports emitting those formats

Of those json is better if you want to be able to do more advanced stuff (nest dictionaries, use lists, ...) and logfmt is better if you want to have human-readable logs without external tools as well, an example line can look like

    msg="Request finished" tag=request_finish status=200 user=brandur@mutelight.org user_id=1234 app=mutelight app_id=1234
Some more info here https://www.brandur.org/logfmt


Agree with logfmt. I wrote the logfmter python library: https://github.com/jteppinette/python-logfmter. You can quickly have all of your logs (including 3rd party) converted to this style.


Has anybody written an actual spec for logfmt? I noticed that different implementations handle escaping of quoted strings subtly differently


Thank you! Now when you say json logging, are there any common patterns you can guide me to look at? Just looking around for JSON logging gives me results like [1] which talk about JSONL (object per line, presumably JSON serializer makes sure to not emit literal newlines).

But some other places describe this a bit more liberally. And [2] notes that I should include a time stamp in each log entry which makes sense.

[1]: https://stackoverflow.com/questions/10699953/format-for-writ... [2]: https://www.papertrail.com/solution/tips/8-essential-tips-fo...


In general if you use things like filebeat or promtail to do log ingestion into centralized log search systems (elasticsearch or loki in these examples) they prefer one object per line.

It makes parsing and keeping track of the "state" of the file a lot easier. Say that your application crashes/gets killed halfway through writing a log message / json dict and then gets restarted and appends to the log file. How should the log reader handle that case if it suddenly becomes a valid nested object? And even if it doesn't, should it throw away the first new log message as well because that was embedded in the invalid json object? Much easier to just say "one line is one json object, if there are literal newlines that's the delimeter to start a new parse".

And yes in any case it's good to have a timestamp on your log message no matter the format, unless you're logging somewhere you know that it gets added immediately (like the systemd journal). Your log parser/forwarder can add a timestamp for when it reads your log message but that is not necessarily the same as when your application emits it.


Another benefit with json/logfmt that bears mentioning explicitly: it has structure.

This means that you shouldn't just write (to reuse the previous example):

    msg="Request for brandur@mutelight.org finished with status 200"
you should do it like

    msg="Request finished" status=200 user=brandur@mutelight.org
and not put any variables into the msg key (and not really do advanced formatting for any of the keys for that matter). This way once you get it put into a log system that understands your format you can do searches like "all log messages where user=foo" or "all statuses that are >=500 and <600" or search on specific messages, all without having to craft elaborate regular expressions and with better performance since the log search system can do indexing and various optimizations so that it doesn't have to be a full-text search every time.


lnav does support JSON-lines and logfmt logs. For JSON-lines, it will pretty-print the log messages to make them human readable.

For logfmt, I seem to remember the spec not being very clear on quoting semantics (maybe I'm wrong). Anyhow, I would suggest using JSON since it has pretty broad support at this point.


Hmmm, it doesn't say in https://lnav.org/features#automatic-log-format-detection but i see that at least json (and xml) is mentioned under the pretty-printing header.

You would know what is supported what with you being the author, just saying that the docs aren't super clear from a quick glance :).

And yes, I second the suggestion to focus on JSON. The main benefit of logfmt is that it's simpler for a human to parse directly but in general you probably shouldn't aim for that so..


> Hmmm, it doesn't say in https://lnav.org/features#automatic-log-format-detection but i see that at least json (and xml) is mentioned under the pretty-printing header.

Yes, I should mention it on the features page. It's currently only mentioned in the main docs:

https://docs.lnav.org/en/latest/formats.html


You can use logfmt with Serilog on dotnet too:

https://github.com/serilog-contrib/Serilog.Logfmt


There is a bunch of them (log formats). I don't think any is like ISO/ANSI standarized, but many tools work with many of the different formats.

Probably the most common one (or at least was most common, maybe not anymore) would be "NCSA Common log format" (or just CLF) which looks something like this:

    127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
https://en.wikipedia.org/wiki/Common_Log_Format

Many tools for getting analytics out of server-side logs can work with CLF and the various variations. But probably today there are more "modern" formats as well.


Take a look at the docs linked on the main page of Serilog[1] and the various videos[2] and blog posts[3] by the devs who make Seq.

[1]: https://serilog.net/

[2]: https://datalust.co/

[3]: https://blog.datalust.co/


Structured logs. Probably json.


The package is available in Alpine so I tried it out. It seems to work best if the terminal is using en_US.UTF-8. My default is C but I have a function to set UTF-8. I was curious if it could handle stdin and it does. It saves the output in ~/.config/lnav/ something to probably add to my clean up script. It is convenient that it can prepend timestamps to logs. I could see this being nice for people that do QA on build logs all day or that have to debug access logs often. If used with sensitive data one should probably add a shell function and log rotate scripts to clean up ~/.config/lnav/ using the shred command.


This will depend on the specific threat model for the sensitive data you are dealing with, but generally files in an SSD can't be destroyed securely (and shred specifically does not help due to wear leveling).

Instead, you may want to encrypt your disk so that as soon as the key is gone it all becomes unreadable. For a bigger threat, maybe you need to follow the NIST destroy guidelines [1] to "Disintegrate, Pulverize, Melt, and Incinerate" the media.

1: https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.s...


The data consumed from stdin is stored in a file to support scrollback.

> something to probably add to my clean up script.

These capture files are deleted if they are older than a day the next time lnav is run.

(I realize they should've been stored in ~/.local/state instead of ~/.config ...)


I was under the impression that the shred command was pretty hard on SSDs. Did that change?


It would be as hard as writing the file {n} number of times again. So if one created a 1MB log file and then did a 3 pass wipe that would write 3MB. Number of passes would really just depend on the system being worked on. If PCI I would expect 7 passes and I am sure that would just be factored into the cost of the servers for that environment.

Shred also isn't perfect as it has no concept of the file systems journal and does not clean that data but I think it is still good practice for sensitive data, in addition to filesystem or file level encryption on systems with highly sensitive data.

With sensitive data one must look at the value of the data vs the value of the SSD. If I go out of my way to extend the life of my SSD, do I risk losing $5billion? That is how I factor in whether or not extra wear-and-tear on a SSD makes sense.


Shred doesn't work at all on SSDs because wear leveling will spray the writes all over the drive, rather than overwriting the blocks you intended to. Your random bytes will end up in new blocks, not overwriting the original blocks. Trying to shred individual files is a totally pointless exercise on SSDs. It isn't doing what you want it to do.


I've also seen people say that and to instead use "blkdiscard -sz" but that also does not write to specific blocks. I still can not find a definitive source saying that shred is not able to write to specific blocks on a SSD and it is not clear to me that wear leveling actually prevents shred from writing to specific blocks. But I agree that I have seen people say this repeatedly on StackExchange, ServerFault and Reddit.

Either way one should also use filesystem and file level encryption for sensitive files and encrypt sensitive attributes inside databases. Swap must also be encrypted as it may contain sensitive data. Datacenter bad-drive mishaps do happen as in not shredding the physical drive as a few government and military agencies have recently been embarrassed by.

I do agree with pinkorchid that ultimately drives should be destroyed physically.


Wear leveling prevents anyone from writing to specific blocks. It decouples the OS's block numbering from the physical blocks. That indirection allows it to distribute writes evenly across the drive even if the OS is writing to consecutive blocks (or the same block).

This behavior is fundamental; it's what wear leveling is and why it exists. Wear leveling exists to prevent you from writing to the same block repeatedly, which is exactly what you're trying to do with shred.

I'm not sure what other source you're looking for, but I hope you find it. Perhaps, at least, the number of people "on StackExchange, ServerFault and Reddit" and now also HackerNews telling you the same thing is sufficient evidence to stop spreading the idea that shred might work on SSDs. Do what you want on your own systems, of course, but this is dangerously misleading guidance to be offering on an open forum.


So my concern with following this logic is that such tools that are writing to specific block locations would then be wiping unrelated inodes depending on the low level logic. I tried emailing Colin but his last known email is invalid now. He may be working for the NSA so I probably won't be able to reach him. Some people there are responsive, some are not and a few are on here.

So even if wear leveling prevents overwriting a file then such tools should in theory be a risk of data corruption. If this is the case then the tool should be updated to detect if the target is an SSD and abort with a scary message. Perhaps another route is to reach out to the coreutils team.


> writing to specific block locations would then be wiping unrelated inodes depending on the low level logic

It won't wipe unrelated files as far as the file system sees it, but it might overwrite some previously discarded internal blocks. "inodes" is a too high-level concept in this context.

You can read more about it on wikipedia which might be an authorative enough source for you? https://en.wikipedia.org/wiki/Wear_leveling

Also note that most ssds actually have more internal blocks available than what they present to the host device so that they can have a "cache" to be able to move things around internally, and also so they can mark certain positions as "bad" when writes to one internal block start failing and still operate properly.


Does that mean that partitions on an SSD therefore do not affect the long term wear on the SSD?


It's also explained in the shred manual itself, that seems like a decently authoritative source to me: https://www.gnu.org/software/coreutils/manual/html_node/shre... — halfway the page all the caveats are listed up to & including forensic analysis of magnetic traces of the deleted data.


For Windows users, there is LogViewPlus. (I am the author). https://www.logviewplus.com/

It has a similar feature set (tail, syntax highlighting, SQL reporting) with a focus on accessing files remotely.


I'm the author of lnav and had never heard of logviewplus before, thanks for mentioning it. It looks like we've independently arrived at a very similar feature set, that's pretty neat. I will now shamelessly steal some of your ideas :)


It looks like you have done a great job with lnav - it's impressive. I know how much work has gone into LogViewPlus and you must have made a similar commitment. I thought I was the only one crazy enough to want to solve this problem. :-)

I would be happy to chat to a like mind if you are interested. You can contact me here: https://www.logviewplus.com/contact.aspx

Steal away! :-)


I've been using https://glogg.bonnefon.org/. The mark / matches feature is really handy. However there are a few bugs with highlighting and it hasn't been updated in a while. Will have to check this out!


Glogg is a great tool. We take a bit of a different approach in that we parse the log file. This enables some more in-depth features such as SQL reporting. We also support remote file access (for example, SFTP).

Let me know if you have any questions or feedback. You can reach me on the contact page of the site.


Is LogViewPlus heavily tied to Windows? How hard would a port to Linux be?

I love that so many Windows tools get a nice-looking GUI, versus a heavy CUI lean to *nix tools.


As things currently stand - hard. LogViewPlus is written in .Net using WinForms, so it is heavily tied to Windows. However, Microsoft is evolving. The bulk of the code can now be made to be cross-platform easily, but the front-end remains a problem.

I would love to do a Linux port as soon as there is a viable cross-platform GUI technology with strong 3rd party vendor support. The controls in LogViewPlus are very rich.


It's cool you can SSH to demo it. More CLI tools should have that.


The SSH demo was inspired by git.charm.sh


I mean if the goal is to read logs the default color scheme looks pretty bad, why there are so many colors in the example?

For example the http 403/404 is red but it does not stand out. Also why a 404 would be red in the first place.


> why there are so many colors in the example?

Identifiers, like process-names/pids/IPs, are semantically highlighted by default to make it easier to visually match up values that are the same. And, it's this way because I like it this way, so that's the default.

> For example the http 403/404 is red but it does not stand out.

To me, the red stands out. But, I would also have a much wider window, so the red would stand out even more with the rest of the text.

> Also why a 404 would be red in the first place.

Because 4XX are error codes.

Of course, lnav is pretty customizable at this point, so you can adjust it to your liking. The theme can be changed to something less colorful and the log format definitions can be patched to change their behavior.


Does anyone have a trick for connecting to remote hosts as a non-root user but having it elevate to root to read the logs?


For systemd/journald managed log, best is to add non privileged user to the systemd-journal group.

For logs managed by syslog daemons, most implementations allows you to set the owner and group of the log files. You could decide on a specific log group and add the desired dedicated user to this group.

In the end it is usually better to ship the logs to a dedicated machine/space/database.


I used this extensively when working in a cloud support role, incredible tool.


Related to this, is there a tool like this, but that can show some kind of simple analytics, for example page views over a period of time from an access log produced by python html server.


As mentioned in a sibling comment, https://goaccess.io is probably best for simple analytics on log files.

If you want to write more complicated queries, lnav exposes log data through SQLite vtables[1]. So, you can do a SQL query and get a simple bar chart visualization.

[1] - https://docs.lnav.org/en/latest/sqlext.html#sqlite-interface


> […] a tool like this, but that can show some kind of simple analytics

Yes, you can use → https://goaccess.io/


You can run queries on the log files and write them to files. I have used these files to feed pre-built spreadsheets with charts, etc.


I mentioned this below, but you could try LogViewPlus. It has a built in SQL based reporting feature you can use.

You can see it in action here: https://www.logviewplus.com/docs/create_a_custom_report.html


I'm using every day on multiple erver and application. It a MUST


Nice work! I've written something similar based on regexes:

https://github.com/mihaigalos/pipeview


This tool saved me weeks of work once. You gotta try it.


Works fine on my pi.. thx

With caddy server


Do we need lnav thread every 3 weeks?


First I'd heard of it. I have wanted a tool like this for a while.


Same, and I'm working on an IoT project that precisely needs this (superhigh$ per device, don't need scale or efficiency)


Yes /s

Not everyone checks out daily HN to see all the reposts. Pulling numbers out of the air, I'd say that 5-10% of daily content on HN is a repost, should get used to it. Just hide it an carry on is what I would suggest.


Keep a log of how often it occurs and we can review it... in lnav


lnav is pretty great. I wish it had a library that could be pulled out into an app instead of a terminal. There are a few things that make more sense with a GUI like standard keyboard shortcuts, filtering, saving exports, etc.


Not saying there's something wrong with lnav - just that spamming HN every 3 weeks isn't great.

There were submissions with links to lnav.org main site:

- 21 days ago

- 22 days ago

- 46 days ago

- 68 days ago

It's a spam campaign from my pov.


Do you browse new? None of these have had more than a couple points since https://news.ycombinator.com/item?id=9294622 in 2015.

It seems very reasonable that most visitors simply never saw any of those submissions.


Five submissions in 70 days is rather less than a spam campaign in my view.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: