Does it store timestamps for those repeated occurrences? I wouldn't want my logs to "helpfully" coalesce multiple identical messages into a single one. For example, if I saw the equivalent of the text below in a program/OS while facing some issues, I'd be remarkably pissed.
[2023-05-18 07:12:21] [E] Invalid event 0x1AF2 received from ...
[2023-05-18 09:44:01] [I] Last message repeated 10 times
I care less about how many times the message was repeated - I care about timestamps, which I might want to correlate to other activities.
No, it's just a plain syslogd dating back to the 4.2 days. It aggregates at 30, 120, and 600 second intervals according to the source. Within that threshold I wouldn't care too much. If I really needed timestamps with more than thirty second precision I probably wouldn't be using syslogd.
In any new-ish production system I'd probably want to use anything other than syslogd anyways.
That's... good to know. I never realized anyone is doing something like this, ever. It breaks my trust in software logs in general - I'll be sure from now on to understand how any given program handles logging, before making assumptions relevant to troubleshooting.
In many cases, logs are asynchronous so depending on many factors among which are utilization of the host, you might get them with a delay and the ordering of events might not make sense because of that when read from the logs. If you need that precision you can surely engineer/ configure your system for that.
If I'm running my own distributed system for some business reasons, sure.
If I'm dealing with equipment failures, bugs in third-party software, or other such random tech bullshit, as an individual or a team, then I don't know in advance when and what precision I'll need.
In a reasonable system, you might be able to change some function in one place, perhaps even in the running system and get the precision or detail you need. You might take out the big guns like e.g. dynamic tracing using BPF to attach probes at the right places.
Most of the time, I'd assume all of them I can store (deduplication is fine, if I can recreate the raw data afterwards). Which is a lot, because they should compress well (in the limit, approaching the same size as deduplication solution).
Sometimes those data points don't matter - like if they're generated by some program stuck in an infinite loop. But in other cases, they do - like e.g. if each message is caused by some event, like another program doing some processing, or user pressing a key, etc. - then timestamps will be useful to identify the exact cause (e.g. logs only happen when process X is processing mouse input, or when user presses one of 20 specific keys on their keyboard, or only when my microwave oven is running).
-c Disable the compression of repeated instances of the same line
into a single line of the form "last message repeated N times"
when the output is a pipe to another program. If specified
twice, disable this compression in all cases.