μMon: Stupid simple monitoring (2022)

imiric · on Sept 25, 2023

I like the focus on simplicity of uMon, and agree with author's criticism of behemoths like Grafana.

But looking at the installation instructions[1], I can't help but think that their reluctance to use Docker feels contrarian for no reason (and the quip about it being "out of fashion" completely misguided). This whole procedure could be automated in a Dockerfile, and actually running uMon would be vastly simplified. Docker itself is not much more than a wrapper around Linux primitives, and if they dislike it specifically for e.g. having to run a server and run containers as root, there are plenty of other lighterweight container alternatives.

There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

I'm all for choosing "simple" tools and stacks over "complex" ones, for whatever definition of those terms one chooses to use, and I strive to do that in my own projects whenever possible, but simplicity is not an inherent property of old and battle-tested technologies. We should be careful to not be biased for technology we happen to be familiar with, but be pragmatic about picking the right tool for the job that fits our requirements, regardless of its age or familiarity.

[1]: https://tomscii.sig7.se/umon/#Installation%20and%20getting%2...

[2]: I have a pet peeve about tools or protocols with "simple" or "trivial" in their name. They almost always end up being the opposite of that as they mature, and the name becomes an alluring mirage tricking you into its abyss of hidden complexity. I'm looking at you SMTP, TFTP...

yellowapple · on Sept 25, 2023

> I can't help but think that their reluctance to use Docker feels contrarian for no reason

My impression is that it's less about contrarianism and more about

1. the developer opting for installation instructions being consistent across both of the intended targets (Linux and (Open)BSD); and/or

2. μMon being allegedly tiny and simple enough that you're probably best off stuffing it into every one of your containers anyway (i.e. so that said containers can expose their own monitoring interfaces) instead of having a dedicated container for it

> There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

Agreed here. If SNMP is supposed to be "simple", I'd hate to see what the Complex Network Management Protocol looks like!

(I have a similar peeve about LDAP, on that note; I guess compared to wrappers around it like Active Directory and FreeIPA it's "lightweight", but I dread imagining what a heavyweight directory access protocol would entail)

jdougan · on Sept 25, 2023

Both of those are based on old OSI protocols, which were terrifying in their complexity. LDAP mostly subsetted DAP and X.500 and added Internet concepts. SMNP leveraged ASN.1 with some ideas from CMIP but using Internet concepts and attention paid to operations when the network is marginal (unlike CMIP)

"What do you get when you cross a mobster with an OSI standard?

You get someone who makes you an offer you can't understand."

ilyt · on Sept 25, 2023

> There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

Oh you sweet summer child that's fucking easiest part of this utter abomination!

Adding anything that's not covered by standard OIDs is a fucking chore. And it is just a string of numbers so you either have to write your own custom OID files and distribute them everywhere OR operate by numbers alone.

And there is no fucking key-value or even fucking labels. If you want to have say distribute a list of arbitrary keys with arbitrary values under hierarchy, the OID will look like

    .1.1.1.1.1: key1
    .1.1.1.1.2: key2
    .1.1.1.1.3: key3
    .1.1.1.2.1: val1
    .1.1.1.2.2: val2
    .1.1.1.2.3: val3

It was created for devices with tens or hundreds kB of RAM that couldn't handle caring about descriptive protocol but it should be dead 20 years ago

themoonisachees · on Sept 25, 2023

This is what MIBs are for. Like yeah MIBs are probably too verbose for what they're used for, but once you write yours you only need it on the host you'll make snmp gets from (ie your monitoring host) the snmp client will then use it to translate it to an OID and bobs your uncle.

What i do is that i use a custom private OID in the apropriate range, and then sub-OIDs in that range get passed to a shell script that finds and executes the appropriate commands (generally one-liners) for that sub-OID. As soon as i add a sub-OID, i also add it to the MIB, which has support for descriptions upon descriptions on top of type info.

medellin · on Sept 25, 2023

When i read that they thought docker was outdated I figured i would open the repo to find go code and pre built binaries… when i found c++ i was left thinking the same as you.

throwbadubadu · on Sept 25, 2023

If it's simple doesn't need docker, and as someone else said you might even want to stuff it into your docker containers.. so a minimal thing doesn't need docker imo. The simple landscape is huge, same as for whether one believes that C++ is outdated or not ;) Guess figure what is easier to get up and run still on many platforms.. simple C/C++, or anything else?

taberiand · on Sept 25, 2023

Docker can be used to provide a plug and play Development environment, even if the result isn't deployed into production as a container.

C++ projects I expect would benefit even more than most from such configurations, reducing the barrier for entry of new contributors by eliminating the environment bootstrap headaches

goodpoint · on Sept 25, 2023

> Docker feels contrarian for no reason

No, Docker is a mess.

cpach · on Sept 25, 2023

What is it more specifically that you think is a mess? Do you mean the OCI Image Format, Docker Inc, Docker Engine, something else…?

chriswarbo · on Sept 27, 2023

Not sure if I'd count "OCI Image Format" under the Docker umbrella, since Docker doesn't actually follow it. Also, slight nitpick, but "OCI Image Format" is itself a bit of an umbrella, since there are images, manifests, layers, etc. It's easy enough to make a standards-compliant image using `tar`, `sha256sum` and `jq`, but it's rather hit-and-miss which tools will support it (e.g. AWS seems easy to please, but nerdctl rejects certain things, etc.)

Personally, my main problems with Docker are:

- Dockerfiles: these are basically just shell scripts, which throws away decades of improvements and leads to all sorts of insanity (e.g. running `apt install -y foo bar baz`, rather than making a .deb which depends on those). It also causes everything to happen "inside-out", with our compiler toolchains, etc. getting installed inside the container (requiring even more containers to try and extricate the build products, and so on)

- Docker Inc: specifically, their over-complication of basic shit, as a way to funnel everything through themselves. Want to u̵p̵l̵o̵a̵d̵ push your t̵a̵r̵ ̵f̵i̵l̵e̵ image to a r̵e̵m̵o̵t̵e̵ ̵d̵i̵r̵e̵c̵t̵o̵r̵y̵ registry? No rsync for you: not only will you need to run the `docker` command, but it must be "logged in" first (??!); oh, and you'll need to pass credentials over stdio (hooray for the /proc filesystem!). Note that this is just my experience from using private f̵o̵l̵d̵e̵r̵s̵ registries (e.g. like https://docs.aws.amazon.com/AmazonECR/latest/userguide/docke... ). Fun fact: AWS provide a multipart upload API for u̵p̵l̵o̵a̵d̵i̵n̵g̵ pushing to a b̵u̵c̵k̵e̵t̵ registry, which uses the normal AWS credentials chain; so you can just whack a loop around that to u̵p̵l̵o̵a̵d̵ ̵f̵i̵l̵e̵s̵ push images without any `docker login` bullshit ;)

Oh also, Docker Desktop for Mac is the only software I've used which makes the "ignore" button on update nags a "premium feature" (whilst simultaneously making it hard to actually update, since they only publish new binaries to a mutable "latest" URL, hence breaking its SHA256 and hoping people don't mind downloading random ever-changing binaries; the only stable URLs they provide are for "archived" versions, so no wonder I keep getting update nags.... urgh, I eventually just nuked the lot)

vachina · on Sept 25, 2023

I couldn’t get Docker to run with WSL backend on a freshly installed Windows 11 Pro, a very common use case.

I’m not sure if they even test their own software.

JoBrad · on Sept 25, 2023

You could just opt for the regular Docker for Windows approach, instead. Although I honestly only ran Docker in WSL on Windows, but haven't tried with Windows 11.

proxysna · on Sept 25, 2023

Windows and WSL are a mess. Run docker on Linux instead. There is no reason to run docker on windows, except if you are planning to build windows containers.

naasking · on Sept 25, 2023

Incorrect. Portable .NET development natively uses Docker. Works pretty seamlessly with Visual Studio actually, though slower than I'd like.

taberiand · on Sept 25, 2023

Funny that, WSL2 on Windows 11 with Linux containers works nearly perfectly for me (I think the only thing that'd make it better is another 8 GB or so of RAM), with using devcontainers, building images and running Docker compose - it's the Windows containers that cause the trouble, mostly with networking

JoBrad · on Sept 25, 2023

If you're a Windows user, running Docker in WSL _is_ running it in Linux.

proxysna · on Sept 26, 2023

Yes, Linux wrapped into countless unnecessary abstraction layers.

SanderNL · on Sept 25, 2023

You enumerated it yourself.

cpach · on Sept 25, 2023

I agree that the naming of the different entities is not very clear. That doesn’t imply that each of these entities is a mess though.

intelVISA · on Sept 25, 2023

Docker may be a terrible mess, sure, but something something VC.

ahofmann · on Sept 24, 2023

I like the concept of simple monitoring. Simple means it is simple to install, simple to maintain and simple to use. For me, this is netdata. Netdata could be much more, but I just install it on whatever machine and never think about it again. And when something is strange on that machine, I go to http://localhost:19999 and look around.

jerrac · on Sept 24, 2023

Not sure I haven't run across it before, but this is the first time I've tried using Netdata. Looks like it is very good for metrics, at least in the 10 minutes I have spent installing it on my local desktop and poking around the ui there.

I'm not seeing anything in it for logs, though. I'm guessing it doesn't aggregate or do anything with logs? What do you use for log aggregation and analysis?

I'm very interested because I've been getting frustrated with the ELK Stack, and the Prometheus/Grafana/Loki stack has never worked for me. I'm really close to trying to reinvent the wheel...

valyala · on Sept 26, 2023

If you want easy to install, maintain and use system for logs, then take a look at VictoriaLogs [1] I'm working on. It is just a single relatively small binary (around 10MB) without external dependencies. It supports both structured and unstructured logs. It provides intuitive query language - LogsQL [2]. It integrates well with good old command-line tools (such as grep, head, jq, wc, sort, etc.) via unix pipes [3].

[1] https://docs.victoriametrics.com/VictoriaLogs/

[2] https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html

[3] https://docs.victoriametrics.com/VictoriaLogs/querying/#comm...

thinkmassive · on Sept 25, 2023

Prometheus has become ubiquitous for a reason. Exporting metrics on a basic http endpoint for scraping is as simple as you can get.

Service discovery adds some complexity, but if you’re operating with any amount of scale that involves dynamically scaling machines then it’s also the simplest model available so far.

What about it doesn’t work for you?

Edit: I didn’t touch on logging because the post is about metrics. Personally I’ve enjoyed using Loki better than ELK/EFK, but it does have tradeoffs. I’d still be interested to hear why it doesn’t work, so I can keep that in mind when recommending solutions in the future.

jerrac · on Sept 25, 2023

Last time I tried Prometheus was years ago. So I don't know how much might have changed... I gave it a good month or two of effort trying to get the stack to do what I needed and never really succeeded.

Just my opinion, but I honestly don't think the scraping model makes much sense. It requires you expose extra ports and paths on your servers that the push model doesn't require. I'm not a fan of the extra effort required to keep those ports and paths secure.

Beyond that, promql is an extra learning curve that I didn't like. I still ran into disk space issues when I used a proper data backend (TimescaleDB). Configuring all the scrapers was overly complicated. Making sure to deploy all the collectors and the needed configuration was rather complicated.

In comparison, deploying Filebeat and Metricbeat is super simple, just configure the yaml file via something like Ansible and you're done. Elastic Agent is annoying in that you can't do that when using Fleet, or at least I have yet to figure out how to automate it. But it's still way easier than the Prometheus stack.

I've tried to get Loki to work 2 or 3 times. Never have really succeeded. I think I was able to browse a few log lines during one attempt, I don't think I even got that far in the other attempts... The impression I came away with was that it was designed to be run by people with lots of experience with it. Either that, or it just wasn't actually ready to be used by anyone not actively developing it.

So, yeah, while I figure a lot of people do well with the Prometheus/Grafana/Loki stack, it just isn't for me.

thinkmassive · on Sept 25, 2023

The most basic setup, and the one typically used until you need something more advanced, is using Prometheus for scraping and as the TSDB backend. If you ever decide to revisit prometheus, you’ll likely have better luck starting with this approach, rather than implementing your own scraping or involving TimescaleDB at all (at least until you have a working monitoring stack).

There used to be a connector called Promscale that was for sending metrics data from Prometheus to Timescale (using Prometheus’ remote_write) but it was deprecated earlier this year.

thinkmassive · on Sept 25, 2023

Also important to add: using prometheus as the tsdb is good for short term use (on the order of days to months). For longer retention you could offload it elsewhere, like another Prometheus-based backend or something else SQL-based, etc

andrewm4894 · on Sept 25, 2023

hey - I work on ML at Netdata (disclaimer).

We have a big PR open and under review at moment that brings in a lot more logs capabilities: https://github.com/netdata/netdata/pull/13291

We also have some specific logs collectors too - i think in here might be best place to look around at the moment, should take you to the logs part of the integrations section in our demo space (no login needed, sorry for the long horrible url, we adding this section to our docs soon but at moment only lives in the app)

https://app.netdata.cloud/spaces/netdata-demo/rooms/all-node...

jerrac · on Sept 25, 2023

Nice to see that the log analysis is being worked on.

I'll see if I can figure out the integrations you pointed out. They look more like they are aimed at monitoring the metrics of the tools, not using the tools to aggregate logs. Right?

The way most ops systems treat logs and metrics as completely separate areas has always struck me as odd. Both are related to each other, and having them in the same system should be default. That's why I've put as much effort into the ELK Stack as I have. They've seemed to be the only ones who have really grasped that idea. (Though it's been a year or two since I've really surveyed the space...)

One question not log related, is it required to sign up for a cloud account to get multiple nodes displaying in the same screen? From the docs on streaming, I think you can configure nodes to send data to a parent node without a cloud account, but I either haven't configured it properly yet, or something else is in the way, since the node I'm trying to set up as a parent isn't showing anything from the child node.

jerrac · on Sept 25, 2023

FYI, you need to add the api-key config section to the stream.conf file on the parent node in order to enable the api key and allow child nodes to send data to the parent node. I thought it went into the netdata.conf file... I also kinda wonder why it matters what file has what config since the different config sections all have section headings like `[stream]` or `[web]`.

So, the answer to my question is that you can get multiple nodes showing up without a cloud account. Just have to configure it correctly.

pdimitar · on Sept 25, 2023

I have used https://github.com/openobserve/openobserve in several hobby projects and liked it. It's an all-in-one solution. It's likely less featureful than many others but a single binary and everything in one place pulled me in and worked for me so far.

Not affiliated, I just like the tool.

cduzz · on Sept 25, 2023

I'm not sure if the version in use at $workplace is out of date or incorrectly configured, but it is a dreadful prometheus client in that it doesn't use labels, it just shovels all the metadata into the metric name like a 1935 style graphite install, making most of the typical prometheus goodness impossible to use.

The little dashboard thing is nice, though.

ilyt · on Sept 25, 2023

From my experience, no silver bullets. Let metric software do metric and log software do logs.

At the very least at the database level. Maybe we will get visualisation engine that merges both nicely but database wise the type of data couldn't be any different.

hadlock · on Sept 24, 2023

Back in 2017 when I had a bunch of physical machines and unmanaged VMs we ended up putting netdata on the servers. The reason why was because most of the team was used to manually logging onto servers and diagnosing the issue manually.

The reason I liked it was because it exposes a standard Prometheus endpoint I can scrape and then view using something like Grafana. There are only about 20,000 Grafana dashboard modules available for netdata but generally you can find one that works for you. Having that prometheus endpoint allows you to springboard into the cloud and get like-metrics out of your cloud stuff as well, with a nice long historical data trail from your older/est machines.

bugsense · on Sept 25, 2023

you don't need to scap anything if you use netdata cloud see https://blog.netdata.cloud/introducing-netdata-source-plugin...

andrewm4894 · on Sept 25, 2023

we are in process of getting the plugin signed at the moment: https://github.com/netdata/netdata-grafana-datasource-plugin

xrd · on Sept 24, 2023

I've been struggling with graphana and netdata looks so much better.

Is this a tool where you can boot up the docker app and then connect a bunch of servers into a centralized dashboard? Or, is it better to think of netdata as a dashboard for a single server that permits monitoring of a bunch of processes only on that machine?

I'm not sure I understand whether agents can be configured to talk to a dashboard, or if you don't need to do that configuration because they expect to talk to localhost. I have a bunch of VMs running on a bunch of different random hardware and want a way to monitor those VMs (and perhaps the hosts as well).

dizhn · on Sept 24, 2023

If you connect your servers to the netdata cloud, you can manage all of them there. (Put into groups etc). As far as I know there is no self hosted solution for this.

https://learn.netdata.cloud/docs/configuring/connect-agent-t...

andrewm4894 · on Sept 25, 2023

Hey - i work for Netdata on ML.

We have recently created enterprise self hosted options for bigger customers who can't use cloud etc. (prob not as relevant here)

For self hosted at a smaller scale then you can have your own parent with multiple children streaming to it.

This is an example demo node which is also a parent for some other demo nodes. None of these need to be claimed to or signed in to cloud:

https://sanfrancisco.my-netdata.io/

It uses the same actual dashboard as cloud so that we only have one dashboard to maintain so you get the cloud dashboard locally basically and the parent can then kind of act like its own little Netdata Cloud.

A handful of features not available this way since they depend on the metadata being stored in cloud as opposed to on a parent node but we are trying to bridge that gap where possible such that the metadata could actually live on a parent.

xrd · on Sept 25, 2023

Drat. I'm only interested in things I can self host. Back to the drawing board. Thanks for the clarification!

lukevp · on Sept 25, 2023

You can self host and centralize configuration with netdata parents [1]. It’s extremely lightweight and efficient for metrics collection, and the UI is very good as well. I recommend giving it more in depth analysis.

[1] https://community.netdata.cloud/t/advice-on-self-hosted-self...

dizhn · on Sept 25, 2023

Apparently this is possible. I didn't know. Didn't mean to mislead you. Sorry.

xrd · on Sept 25, 2023

Maybe what I want is nachos?

https://www.nagios.org/

doubled112 · on Sept 25, 2023

Or Zabbix. I’m assuming Nachos is a funny typo.

https://www.zabbix.com/

dengolius · on Sept 27, 2023

Zabbix was cool till 2015, now its better to use https://gitlab.com/mikler/glaber/ or https://signoz.io/.

andrewm4894 · on Sept 25, 2023

mmm nachos

rsyring · on Sept 25, 2023

They have a concept called "Parents":

> A “Parent” is a Netdata Agent, like the ones we install on all our systems, but is configured as a central node that receives, stores and processes metrics data from other Netdata “Child” nodes in our infrastructure...

https://learn.netdata.cloud/docs/streaming/

andrewm4894 · on Sept 25, 2023

hey - i work in Netdata on ML

Just to mention there is this doc too that also tries to explain various deployment strategies

e.g. stand alone: https://learn.netdata.cloud/docs/architecture/deployment-str...

andrewm4894 · on Sept 25, 2023

actually sorry in this case its more like parent-child

https://learn.netdata.cloud/docs/architecture/deployment-str...

and just dont have to claim the nodes to Netdata Cloud if you don't want to.

abrookewood · on Sept 25, 2023

Netdata deserves way more attention. It automatically configures itself with all relevant modules, runs very lean and has more information available than most people will ever need.

defanor · on Sept 25, 2023

The article's complaints include complexity of JS web interfaces and "eye candy", while netdata's UI requires JS, is quite laggy and jerky, very interactive. I think munin fits better (uses the same RRDtool graphs, too), though possibly its configuration is too lengthy for the requirements.

nine_k · on Sept 24, 2023

I like the idea of simplicity and doing exactly what you need. And the single executable.

The words "stupid simple" and "C++" together make me scratch my head though. C++ itself is not simple, and you have to recompile if you need to change something (and sometimes you inevitably do), which is slow. I'd likely go with a relatively simple C program that embeds the FFI for RRDtool and other stuff, and embeds Lua, or, better yet, Janet. Then most of the thing could be written in these languages, and would be easy to tweak when need be. That would still allow for a single executable + a single config file, on top of the logic already embedded. (But the author went and built the thing, and I did not, so their solution so far outperforms mine.)

uxp8u61q · on Sept 25, 2023

Simple means simple for the end users. Not necessarily simple for the developers. The end users do not care if the dev has to spend a couple of minutes recompiling, as long as the result is fast and simple to use.

nine_k · on Sept 25, 2023

This is fair.

But the developer and the user is one and the same here :)

lost_tourist · on Sept 24, 2023

c++ can be relatively simple if you don't throw in all the bells and whistles and stay away from template metaprogramming. I use it all the time. Sure compiles can be slow, but with proper file partitioning and creating libraries 95% of that can be controlled for.

dobin · on Sept 25, 2023

I was also overwhelmed by Grafana and co. In the time required to install it, i coded a simple monitoring alternative DMSR "Does My Shit Run" in python. Each agent has plugins which basically just sends a data structure to the monitoring server, which will display it as yaml. No persistence, history, graphs or similar. uMon looks like a behemoth in comparison.

Github: https://github.com/dobin/dmsr

Live: https://mon.yookiterm.ch

kburman · on Sept 24, 2023

I really like the idea behind μMon. It reminds me of when software was simpler. I remember using a program called "Everything" by voidtools. It was small but could search a lot of files quickly. Nowadays, some projects use big tools like Elasticsearch just to search a few things. Some even use PostgreSQL, a big database, for small tasks. I wish more software would keep things simple.

dave8088 · on Sept 24, 2023

“Everything” is a must have for me. It’s shocking that Windows doesn’t come with local search that works.

sphars · on Sept 24, 2023

Microsoft has added basically the same functionality to Power Toys, called Run[0], taking heavy inspiration from Everything. But yeah this should be a built-in utility.

[0]: https://learn.microsoft.com/en-us/windows/powertoys/run

zacmps · on Sept 24, 2023

Does it read the ntfs journal? Because that's really what makes everything, everything.

It makes me wish I used a journaling filesystem on Linux too.

lelanthran · on Sept 26, 2023

I'm pretty certain that everything doesn't use the NTFS journal[1].

My understanding is that everything uses the file open hooks provided for antivirus to maintain the index, which is why it appears instant.

Adding a millisecond to each open call is imperceptible to the user, and it takes less time than that if you return immediately and process the index update in the background.

[1] happy to be proven wrong.

tambourine_man · on Sept 25, 2023

You don’t? What do you run Linux on? The default Ext4 is journaled

beart · on Sept 24, 2023

Not quite. Run has plugins but the default file search still uses windows search. However, there is a third party plugin for powertoys run that integrates the everything search service.

abrookewood · on Sept 25, 2023

Yep, if you run Windows and don't have that installed, you're just suffering for no reason. Fastest file search you will ever find.

v3ss0n · on Sept 25, 2023

Whats wrong is Postgres? It is very simple yet very powerful. Can run on very minimal resources , without hogging your CPU

xcdzvyn · on Sept 25, 2023

I think GP meant more that people shouldn't use Postgres for small systems, over SQLite or something.

roger_ · on Sept 24, 2023

I was recently looking for an ultra minimal monitoring solution for OpenWrt and other lightweight systems (Pi’s, etc.) and was disappointed not to find one that met my needs (negligible CPU, disk space and RAM).

I ended up hacking together a shell script to send data to Home Assistant (via MQTT) which runs on pretty much any system that has at least netcat: https://github.com/roger-/hass-sysmon

synergy20 · on Sept 25, 2023

why not use https://collectd.org/ which is in C and used by openwrt's luci already along with rrdtool, small in size, low on resource, and has so many plugins already?

roger_ · on Sept 25, 2023

I wanted something that would run anywhere with no dependencies. This might work on OpenWrt, but would be trickier to setup on Debian.

synergy20 · on Sept 25, 2023

you can compile with specific plugins and features you need with ./configure and that cuts out most of the dependencies.

roger_ · on Sept 25, 2023

collectd + minimal plugins would still be 100+ kB installed on OpenWrt. Plus I'd have to write a solution to get the data into Home Assistant.

valyala · on Sept 26, 2023

Did you try VictoriaMetrics [1] and vmagent [2]? It is a single self-contained binary without external dependencies. It requires relatively low amounts of CPU, RAM, disk space and disk IO, and it runs on ARM.

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/

[2] https://docs.victoriametrics.com/vmagent.html

protoplancton · on Sept 25, 2023

Did you have a look at https://www.monitorix.org/ ? I'm not sure if it works on OpenWRT, but otherwise checks all the boxes.

roger_ · on Sept 25, 2023

It requires Perl, which ruled out OpenWrt devices with ~10 Kb of space.

posix86 · on Sept 24, 2023

Why's everyone hating on Grafana? I find it fairly easy to use, it has a good balance between power & simplicity. And with docker you can make it run in seconds.

keyle · on Sept 25, 2023

No one is "hating" on Grafana. But if you used tools of the past, you'd know how light they were compared to the Grafana/Kibana/Elastic mongrel.

But I should mention that there is nothing "stupid simple" about the solution of the OP. Just look at the install procedure... https://tomscii.sig7.se/umon/

posix86 · on Sept 27, 2023

what in the... what is the point?? I've used Grafana many times and it's always, always a simple docker start -v, ..., and modifying some configs to get started. This tool is completely useless.

xrd · on Sept 25, 2023

The difficult thing for me was understanding how to think about and then configure agents. I never could reason about getting the central server up and then connecting nodes to it, whether those nodes were applications or entire servers. The terminology didn't make sense to me and it made me question whether I had the right framing for the purpose of grafana.

dewey · on Sept 25, 2023

In the authors case he used Prometheus and Grafana was just the "frontend" for it. Basically a way to build pretty dashboards our of your PromQL queries. There's no Grafana agent you have to configure.

SanderNL · on Sept 25, 2023

Using Docker to run in seconds is like saying using a Windows VM in Qemu and you got Office running in seconds on Free BSD.

It’s still a mess, but it’s hidden now.

pphysch · on Sept 25, 2023

This doesn't seem much simpler than Prometheus+Grafana, if it all.

Some pushback:

- SNMP sucks. It's very limited, difficult to secure, etc. I've spent a lot of time with it, and it's more complex than Prometheus' simple HTTP metrics model. I use it where I have to (non-server devices), but I prefer dealing with Prometheus.

- Grafana is not necessarily complex. It's powerful, and you can waste a lot of time overinvesting in dashboard design, but that's not required. It can be used quite elegantly.

μMon does seem like "old school for the sake of old school". SNMP and RRDTool were designed when memory & bandwidth were much more limited. I will happily trade the overheads of HTTP and static Go binaries for the much superior UX they offer.

louwrentius · on Sept 24, 2023

I do run Grafana + InfluxDB at home and I agree it’s not trivial to setup. Grafana in particular makes creating informative, easy to read graphs/dashboards a PhD worthy endeavor.

Yet, I run some hobby projects that collect data and this setup is absolutely perfect for it. I even challenged myself to use SSL for the InfluxDB server (run small CA).

Also, I use slack-based alerting through Grafana, for example if a disk would fill up, or something is down.

So it’s really about what your needs are.

And often, basic metrics about systems like CPU usage, load or network traffic doesn’t tell you anything useful or actionable.

Borg3 · on Sept 24, 2023

Not bad. I like KISS concepts. I personally run old Cacti instance for monitoring here. Not that simple as uMon, but not very complicated either. And even wrote cacti_con CLI like graph viewer, to see specific port of that fat 100+ ports campus switches I had at work :)

ilyt · on Sept 25, 2023

https://collectd.org/ does the gathering (and writing to RRDTool database, if you so desire) part very well. Many plugins, easy to add more (just return one line of text)

Still need RRD viewere but that's not a huge stack

And it scales all the way to hundreds of hosts, as on top of network send/receive of stats it supports few other write formats aside from just RRD files.

Self-Perfection · on Sept 26, 2023

I used it for years but somehow it went out of fashion. It is now missing in Ubuntu and Arch linux repos.

vsviridov · on Sept 24, 2023

Looks more or less like munin...

rcarmo · on Sept 25, 2023

Rrdtool is still my go-to for a lot of things. The only functionality I would like to have from it is an SVG version of the charts that allowed for panning and zooming into particular points in the past.

ComputerGuru · on Sept 24, 2023

This really speaks to me - rrdtool is criminally underutilized. Great work!

I did something different but in a similar vein for one server network. We had Seq already deployed for log monitoring so instead of setting up a separate network/node/app health monitoring interface I configured everything to regularly ping seq with a structured log message containing the needed data that could be extracted and graphed with seq’s limited oob charting abilities in the dashboard. Not perfect, but simpler.

golem14 · on Sept 25, 2023

Looks nice. I would like to use something like this to remotely monitor machines. Currently use Prometheus (but without Grafana), since the alerting and built-in graphing is sufficient.

But agree with OP that Prometheus feels more complex than need be for simple use cases. But so does sendmail ;)

ilyt · on Sept 25, 2023

VictoriaMetrics have all-in-one binary that is pretty easy setup for simple one node install.

golem14 · on Sept 27, 2023

VictoriaMetrics is a layer on top of Prometheus from a quick read.

I managed simple alerts just using Prometheus's alert manager and aws's simple mail system, I prefer this simpler approach.

dengolius · on Sept 27, 2023

> VictoriaMetrics is a layer on top of Prometheus from a quick read.

Sorry, but you're wrong! VictoriaMetrics was built from scratch with own ideas [1][2][3] and never was build on top of Prometheus. Yes, it use some libs which are used in Prometheus/InfluxDB/othe open source projects but thats all. More over VictoriaMetrics team has created own query language, named MetricsQL [4] which is inspired by PromQL [5].

> I managed simple alerts just using Prometheus's alert manager and aws's simple mail system, I prefer this simpler approach.

In the VictoriaMetrics stack, alerting is placed in a separate utility - vmalert [6] that is responsible for alerts and works with alertmanager as well as Prometheus.

[1] https://faun.pub/victoriametrics-creating-the-best-remote-st...

[2] https://valyala.medium.com/open-sourcing-victoriametrics-f31...

[3] https://www.youtube.com/watch?v=-DbbIZzFHIY

[4] https://docs.victoriametrics.com/MetricsQL.html

[5] https://medium.com/@romanhavronenko/victoriametrics-promql-c...

[6] https://docs.victoriametrics.com/vmalert.html#vmalert

evilc00kie · on Sept 25, 2023

Nice work!

I often think about the "reinventing the wheel" argument. Isn't open source about diversity? There are so many fork, clones, "Yet another..."'s (yacc, yaml,...).

So many times I'm looking for suitable go libraries that solve a certain problem. There might be a few out there but every lib has its own pros and cons. Having the possibility to choose is great. Nothing sucks more than depending on a unmaintained clib nobody cares without alternatives.

The only counter-example that comes in my mind is crypto. You don't want to do your own crypto.

madsbuch · on Sept 25, 2023

I think it mostly depends on purpose. Crypto is being reinvented several times a day by students needing to understand the mechanics of various algorithms.

Personally, I find it rewarding to reimplement something known. There is always a solution when you are stuck, and who knows. Maybe one will develop a better API for the system of something else.

mayli · on Sept 24, 2023

fyi, here is a very similar project.

https://github.com/pommi/CGP

amar0c · on Sept 25, 2023

Monitoring should be: "central" location with GUI/graphs + agents per bunch of servers. Let me chose from dropdown what I want to see.

If I have to deploy this on each machine than it makes no sense. I know SNMP is able to be used like this, but is μMon ?

rcarmo · on Sept 25, 2023

You can easily put together something that will send UDP packets with stats at regular intervals. I’ve done that a number of times - https://github.com/rcarmo/raspi-cluster/blob/master/tools/se...

new_user_final · on Sept 24, 2023

Is there any simple monitoring system for kubernetes that will monitor memory and CPU usage for each deployments and node? Prometheus and Grafana is good but too much configuration. I also like stats page of HAProxy. Something like that for per service?

wtcactus · on Sept 25, 2023

Nice nifty project, he had me until the "no alerting" part.

Anyway, I might still deploy this in a Proxmox homelab where I don't want to fight with the complexity of a grafana dashboard.

rcarmo · on Sept 25, 2023

What’s missing from the Proxmox charts that you would need a dashboard for?

wtcactus · on Sept 25, 2023

From the top of my mind (I'm not near the deployment right now):

- Temperatures

- Fan speeds

- Centralized metrics for all VMs

random3 · on Sept 24, 2023

it's 2023. Tracing has been around for 20 years(dtrace, x-trace https://cs.brown.edu/~rfonseca/pubs/xtr-nsdi07.pdf).

Very simple logging, if not structured, while not completely useless it's not very useful either. except for maybe showing some nice charts.

Any serious monitoring tool is useful when it can explain things and only tracing gives you causal information.

basemi · on Sept 26, 2023

Not to be confused with http://umonfw.com/

gjulianm · on Sept 25, 2023

I honestly don't get the criticisms of the Prometheus + Grafana stack.

> A full-blown time-series database (with gigabytes of rolling on-disk data).

Prometheus has a setting that allows you to limit the space used by the database. I'm not sure however how one can do monitoring without a time-series database.

> Several Go binaries dozens of megabytes each, also consuming runtime resources.

Compared to most monitoring tools I've tested, the Prometheus exporters are usually fairly lightweight in relation to the amount of metrics they generate. Also, "several dozens of megabytes" doesn't seem like too much when we're usually talking about disk spaces in the gigabytes...

> Lengthy configuration files and lengthy argument lists to said binaries.

Configuration files, yes if you want to change all the defaults. Argument lists, not really. In reality, a Docker deployment of Grafana + Prometheus is 20 lines in a docker-compose.yml file. Configuration files come with defaults if you install it to the system.

By the way, I'm not sure that configuring a FastCGI server will be easier than configuring a Docker compose file...

> Systems continuously talking to each other over the network (even when nobody is looking at any dashboard), periodically pulling metrics from nodes into Prometheus, which in turn runs all sorts of consolidation routines on that data. A constant source of noise in otherwise idling systems.

Not necessarily. Systems talk to each other over the network if you configure them to do so. You can always install a Prometheus + Grafana on every node if you don't want to do central monitoring and you'll have no network noise.

> A mind-boggingly complex web front-end (Grafana) with its own database, tons of JavaScript running in my browser, and role-based access control over multiple users.

Grafana, complex? I think dragging and dropping panels with query builders that don't even require you to know the query language are far better than defining graphs in shell scripts.

> A bespoke query language to pull metrics into dashboards, and lots of specialized knowledge in how to build useful dashboards. It is all meant to be intuitive, but man, is it complicated!

Again, this is not a problem of the stack. Building useful dashboards is complicated no matter what tool you use.

> maintenance: ongoing upgrades & migrations

Not really. Both Prometheus and Grafana are usually very stable and you don't need to upgrade if you don't want to. I have a monitoring stack built with it in my homelab and I haven't updated it in two years, and it still works. Of course I don't have the new shiny features, but it works.

To me, it seems that the author is conflating the complexity of the tool with the complexity of monitoring itself. Yes, monitoring is hard. Knowing which metrics to show, which to pull, how to retain them, it's hard. Knowing how to present those metrics to users is also hard. But this tool doesn't solve that. In the end, I don't know how useful it is to make a custom tool that collects very limited metrics based on other ancient, limited, buggy tools (SNMP, RRD, FastCGI...) that is missing even basic UX features like being able to zoom or pan on charts.

toxik · on Sept 25, 2023

What about netdata?