Hacker News new | past | comments | ask | show | jobs | submit login
μMon: Stupid simple monitoring (2022) (sig7.se)
290 points by g0xA52A2A on Sept 24, 2023 | hide | past | favorite | 106 comments



I like the focus on simplicity of uMon, and agree with author's criticism of behemoths like Grafana.

But looking at the installation instructions[1], I can't help but think that their reluctance to use Docker feels contrarian for no reason (and the quip about it being "out of fashion" completely misguided). This whole procedure could be automated in a Dockerfile, and actually running uMon would be vastly simplified. Docker itself is not much more than a wrapper around Linux primitives, and if they dislike it specifically for e.g. having to run a server and run containers as root, there are plenty of other lighterweight container alternatives.

There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

I'm all for choosing "simple" tools and stacks over "complex" ones, for whatever definition of those terms one chooses to use, and I strive to do that in my own projects whenever possible, but simplicity is not an inherent property of old and battle-tested technologies. We should be careful to not be biased for technology we happen to be familiar with, but be pragmatic about picking the right tool for the job that fits our requirements, regardless of its age or familiarity.

[1]: https://tomscii.sig7.se/umon/#Installation%20and%20getting%2...

[2]: I have a pet peeve about tools or protocols with "simple" or "trivial" in their name. They almost always end up being the opposite of that as they mature, and the name becomes an alluring mirage tricking you into its abyss of hidden complexity. I'm looking at you SMTP, TFTP...


> I can't help but think that their reluctance to use Docker feels contrarian for no reason

My impression is that it's less about contrarianism and more about

1. the developer opting for installation instructions being consistent across both of the intended targets (Linux and (Open)BSD); and/or

2. μMon being allegedly tiny and simple enough that you're probably best off stuffing it into every one of your containers anyway (i.e. so that said containers can expose their own monitoring interfaces) instead of having a dedicated container for it

> There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

Agreed here. If SNMP is supposed to be "simple", I'd hate to see what the Complex Network Management Protocol looks like!

(I have a similar peeve about LDAP, on that note; I guess compared to wrappers around it like Active Directory and FreeIPA it's "lightweight", but I dread imagining what a heavyweight directory access protocol would entail)


Both of those are based on old OSI protocols, which were terrifying in their complexity. LDAP mostly subsetted DAP and X.500 and added Internet concepts. SMNP leveraged ASN.1 with some ideas from CMIP but using Internet concepts and attention paid to operations when the network is marginal (unlike CMIP)

"What do you get when you cross a mobster with an OSI standard?

You get someone who makes you an offer you can't understand."


> There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.

Oh you sweet summer child that's fucking easiest part of this utter abomination!

Adding anything that's not covered by standard OIDs is a fucking chore. And it is just a string of numbers so you either have to write your own custom OID files and distribute them everywhere OR operate by numbers alone.

And there is no fucking key-value or even fucking labels. If you want to have say distribute a list of arbitrary keys with arbitrary values under hierarchy, the OID will look like

    .1.1.1.1.1: key1
    .1.1.1.1.2: key2
    .1.1.1.1.3: key3
    .1.1.1.2.1: val1
    .1.1.1.2.2: val2
    .1.1.1.2.3: val3
It was created for devices with tens or hundreds kB of RAM that couldn't handle caring about descriptive protocol but it should be dead 20 years ago


This is what MIBs are for. Like yeah MIBs are probably too verbose for what they're used for, but once you write yours you only need it on the host you'll make snmp gets from (ie your monitoring host) the snmp client will then use it to translate it to an OID and bobs your uncle.

What i do is that i use a custom private OID in the apropriate range, and then sub-OIDs in that range get passed to a shell script that finds and executes the appropriate commands (generally one-liners) for that sub-OID. As soon as i add a sub-OID, i also add it to the MIB, which has support for descriptions upon descriptions on top of type info.


When i read that they thought docker was outdated I figured i would open the repo to find go code and pre built binaries… when i found c++ i was left thinking the same as you.


If it's simple doesn't need docker, and as someone else said you might even want to stuff it into your docker containers.. so a minimal thing doesn't need docker imo. The simple landscape is huge, same as for whether one believes that C++ is outdated or not ;) Guess figure what is easier to get up and run still on many platforms.. simple C/C++, or anything else?


Docker can be used to provide a plug and play Development environment, even if the result isn't deployed into production as a container.

C++ projects I expect would benefit even more than most from such configurations, reducing the barrier for entry of new contributors by eliminating the environment bootstrap headaches


> Docker feels contrarian for no reason

No, Docker is a mess.


What is it more specifically that you think is a mess? Do you mean the OCI Image Format, Docker Inc, Docker Engine, something else…?


Not sure if I'd count "OCI Image Format" under the Docker umbrella, since Docker doesn't actually follow it. Also, slight nitpick, but "OCI Image Format" is itself a bit of an umbrella, since there are images, manifests, layers, etc. It's easy enough to make a standards-compliant image using `tar`, `sha256sum` and `jq`, but it's rather hit-and-miss which tools will support it (e.g. AWS seems easy to please, but nerdctl rejects certain things, etc.)

Personally, my main problems with Docker are:

- Dockerfiles: these are basically just shell scripts, which throws away decades of improvements and leads to all sorts of insanity (e.g. running `apt install -y foo bar baz`, rather than making a .deb which depends on those). It also causes everything to happen "inside-out", with our compiler toolchains, etc. getting installed inside the container (requiring even more containers to try and extricate the build products, and so on)

- Docker Inc: specifically, their over-complication of basic shit, as a way to funnel everything through themselves. Want to u̵p̵l̵o̵a̵d̵ push your t̵a̵r̵ ̵f̵i̵l̵e̵ image to a r̵e̵m̵o̵t̵e̵ ̵d̵i̵r̵e̵c̵t̵o̵r̵y̵ registry? No rsync for you: not only will you need to run the `docker` command, but it must be "logged in" first (??!); oh, and you'll need to pass credentials over stdio (hooray for the /proc filesystem!). Note that this is just my experience from using private f̵o̵l̵d̵e̵r̵s̵ registries (e.g. like https://docs.aws.amazon.com/AmazonECR/latest/userguide/docke... ). Fun fact: AWS provide a multipart upload API for u̵p̵l̵o̵a̵d̵i̵n̵g̵ pushing to a b̵u̵c̵k̵e̵t̵ registry, which uses the normal AWS credentials chain; so you can just whack a loop around that to u̵p̵l̵o̵a̵d̵ ̵f̵i̵l̵e̵s̵ push images without any `docker login` bullshit ;)

Oh also, Docker Desktop for Mac is the only software I've used which makes the "ignore" button on update nags a "premium feature" (whilst simultaneously making it hard to actually update, since they only publish new binaries to a mutable "latest" URL, hence breaking its SHA256 and hoping people don't mind downloading random ever-changing binaries; the only stable URLs they provide are for "archived" versions, so no wonder I keep getting update nags.... urgh, I eventually just nuked the lot)


I couldn’t get Docker to run with WSL backend on a freshly installed Windows 11 Pro, a very common use case.

I’m not sure if they even test their own software.


You could just opt for the regular Docker for Windows approach, instead. Although I honestly only ran Docker in WSL on Windows, but haven't tried with Windows 11.


Windows and WSL are a mess. Run docker on Linux instead. There is no reason to run docker on windows, except if you are planning to build windows containers.


Incorrect. Portable .NET development natively uses Docker. Works pretty seamlessly with Visual Studio actually, though slower than I'd like.


Funny that, WSL2 on Windows 11 with Linux containers works nearly perfectly for me (I think the only thing that'd make it better is another 8 GB or so of RAM), with using devcontainers, building images and running Docker compose - it's the Windows containers that cause the trouble, mostly with networking


If you're a Windows user, running Docker in WSL _is_ running it in Linux.


Yes, Linux wrapped into countless unnecessary abstraction layers.


You enumerated it yourself.


I agree that the naming of the different entities is not very clear. That doesn’t imply that each of these entities is a mess though.


Docker may be a terrible mess, sure, but something something VC.


I like the concept of simple monitoring. Simple means it is simple to install, simple to maintain and simple to use. For me, this is netdata. Netdata could be much more, but I just install it on whatever machine and never think about it again. And when something is strange on that machine, I go to http://localhost:19999 and look around.


Not sure I haven't run across it before, but this is the first time I've tried using Netdata. Looks like it is very good for metrics, at least in the 10 minutes I have spent installing it on my local desktop and poking around the ui there.

I'm not seeing anything in it for logs, though. I'm guessing it doesn't aggregate or do anything with logs? What do you use for log aggregation and analysis?

I'm very interested because I've been getting frustrated with the ELK Stack, and the Prometheus/Grafana/Loki stack has never worked for me. I'm really close to trying to reinvent the wheel...


If you want easy to install, maintain and use system for logs, then take a look at VictoriaLogs [1] I'm working on. It is just a single relatively small binary (around 10MB) without external dependencies. It supports both structured and unstructured logs. It provides intuitive query language - LogsQL [2]. It integrates well with good old command-line tools (such as grep, head, jq, wc, sort, etc.) via unix pipes [3].

[1] https://docs.victoriametrics.com/VictoriaLogs/

[2] https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html

[3] https://docs.victoriametrics.com/VictoriaLogs/querying/#comm...


Prometheus has become ubiquitous for a reason. Exporting metrics on a basic http endpoint for scraping is as simple as you can get.

Service discovery adds some complexity, but if you’re operating with any amount of scale that involves dynamically scaling machines then it’s also the simplest model available so far.

What about it doesn’t work for you?

Edit: I didn’t touch on logging because the post is about metrics. Personally I’ve enjoyed using Loki better than ELK/EFK, but it does have tradeoffs. I’d still be interested to hear why it doesn’t work, so I can keep that in mind when recommending solutions in the future.


Last time I tried Prometheus was years ago. So I don't know how much might have changed... I gave it a good month or two of effort trying to get the stack to do what I needed and never really succeeded.

Just my opinion, but I honestly don't think the scraping model makes much sense. It requires you expose extra ports and paths on your servers that the push model doesn't require. I'm not a fan of the extra effort required to keep those ports and paths secure.

Beyond that, promql is an extra learning curve that I didn't like. I still ran into disk space issues when I used a proper data backend (TimescaleDB). Configuring all the scrapers was overly complicated. Making sure to deploy all the collectors and the needed configuration was rather complicated.

In comparison, deploying Filebeat and Metricbeat is super simple, just configure the yaml file via something like Ansible and you're done. Elastic Agent is annoying in that you can't do that when using Fleet, or at least I have yet to figure out how to automate it. But it's still way easier than the Prometheus stack.

I've tried to get Loki to work 2 or 3 times. Never have really succeeded. I think I was able to browse a few log lines during one attempt, I don't think I even got that far in the other attempts... The impression I came away with was that it was designed to be run by people with lots of experience with it. Either that, or it just wasn't actually ready to be used by anyone not actively developing it.

So, yeah, while I figure a lot of people do well with the Prometheus/Grafana/Loki stack, it just isn't for me.


The most basic setup, and the one typically used until you need something more advanced, is using Prometheus for scraping and as the TSDB backend. If you ever decide to revisit prometheus, you’ll likely have better luck starting with this approach, rather than implementing your own scraping or involving TimescaleDB at all (at least until you have a working monitoring stack).

There used to be a connector called Promscale that was for sending metrics data from Prometheus to Timescale (using Prometheus’ remote_write) but it was deprecated earlier this year.


Also important to add: using prometheus as the tsdb is good for short term use (on the order of days to months). For longer retention you could offload it elsewhere, like another Prometheus-based backend or something else SQL-based, etc


hey - I work on ML at Netdata (disclaimer).

We have a big PR open and under review at moment that brings in a lot more logs capabilities: https://github.com/netdata/netdata/pull/13291

We also have some specific logs collectors too - i think in here might be best place to look around at the moment, should take you to the logs part of the integrations section in our demo space (no login needed, sorry for the long horrible url, we adding this section to our docs soon but at moment only lives in the app)

https://app.netdata.cloud/spaces/netdata-demo/rooms/all-node...


Nice to see that the log analysis is being worked on.

I'll see if I can figure out the integrations you pointed out. They look more like they are aimed at monitoring the metrics of the tools, not using the tools to aggregate logs. Right?

The way most ops systems treat logs and metrics as completely separate areas has always struck me as odd. Both are related to each other, and having them in the same system should be default. That's why I've put as much effort into the ELK Stack as I have. They've seemed to be the only ones who have really grasped that idea. (Though it's been a year or two since I've really surveyed the space...)

One question not log related, is it required to sign up for a cloud account to get multiple nodes displaying in the same screen? From the docs on streaming, I think you can configure nodes to send data to a parent node without a cloud account, but I either haven't configured it properly yet, or something else is in the way, since the node I'm trying to set up as a parent isn't showing anything from the child node.


FYI, you need to add the api-key config section to the stream.conf file on the parent node in order to enable the api key and allow child nodes to send data to the parent node. I thought it went into the netdata.conf file... I also kinda wonder why it matters what file has what config since the different config sections all have section headings like `[stream]` or `[web]`.

So, the answer to my question is that you can get multiple nodes showing up without a cloud account. Just have to configure it correctly.


I have used https://github.com/openobserve/openobserve in several hobby projects and liked it. It's an all-in-one solution. It's likely less featureful than many others but a single binary and everything in one place pulled me in and worked for me so far.

Not affiliated, I just like the tool.


I'm not sure if the version in use at $workplace is out of date or incorrectly configured, but it is a dreadful prometheus client in that it doesn't use labels, it just shovels all the metadata into the metric name like a 1935 style graphite install, making most of the typical prometheus goodness impossible to use.

The little dashboard thing is nice, though.


From my experience, no silver bullets. Let metric software do metric and log software do logs.

At the very least at the database level. Maybe we will get visualisation engine that merges both nicely but database wise the type of data couldn't be any different.


Back in 2017 when I had a bunch of physical machines and unmanaged VMs we ended up putting netdata on the servers. The reason why was because most of the team was used to manually logging onto servers and diagnosing the issue manually.

The reason I liked it was because it exposes a standard Prometheus endpoint I can scrape and then view using something like Grafana. There are only about 20,000 Grafana dashboard modules available for netdata but generally you can find one that works for you. Having that prometheus endpoint allows you to springboard into the cloud and get like-metrics out of your cloud stuff as well, with a nice long historical data trail from your older/est machines.


you don't need to scap anything if you use netdata cloud see https://blog.netdata.cloud/introducing-netdata-source-plugin...


we are in process of getting the plugin signed at the moment: https://github.com/netdata/netdata-grafana-datasource-plugin


I've been struggling with graphana and netdata looks so much better.

Is this a tool where you can boot up the docker app and then connect a bunch of servers into a centralized dashboard? Or, is it better to think of netdata as a dashboard for a single server that permits monitoring of a bunch of processes only on that machine?

I'm not sure I understand whether agents can be configured to talk to a dashboard, or if you don't need to do that configuration because they expect to talk to localhost. I have a bunch of VMs running on a bunch of different random hardware and want a way to monitor those VMs (and perhaps the hosts as well).


If you connect your servers to the netdata cloud, you can manage all of them there. (Put into groups etc). As far as I know there is no self hosted solution for this.

https://learn.netdata.cloud/docs/configuring/connect-agent-t...


Hey - i work for Netdata on ML.

We have recently created enterprise self hosted options for bigger customers who can't use cloud etc. (prob not as relevant here)

For self hosted at a smaller scale then you can have your own parent with multiple children streaming to it.

This is an example demo node which is also a parent for some other demo nodes. None of these need to be claimed to or signed in to cloud:

https://sanfrancisco.my-netdata.io/

It uses the same actual dashboard as cloud so that we only have one dashboard to maintain so you get the cloud dashboard locally basically and the parent can then kind of act like its own little Netdata Cloud.

A handful of features not available this way since they depend on the metadata being stored in cloud as opposed to on a parent node but we are trying to bridge that gap where possible such that the metadata could actually live on a parent.


Drat. I'm only interested in things I can self host. Back to the drawing board. Thanks for the clarification!


You can self host and centralize configuration with netdata parents [1]. It’s extremely lightweight and efficient for metrics collection, and the UI is very good as well. I recommend giving it more in depth analysis.

[1] https://community.netdata.cloud/t/advice-on-self-hosted-self...


Apparently this is possible. I didn't know. Didn't mean to mislead you. Sorry.


Maybe what I want is nachos?

https://www.nagios.org/


Or Zabbix. I’m assuming Nachos is a funny typo.

https://www.zabbix.com/


Zabbix was cool till 2015, now its better to use https://gitlab.com/mikler/glaber/ or https://signoz.io/.


mmm nachos


They have a concept called "Parents":

> A “Parent” is a Netdata Agent, like the ones we install on all our systems, but is configured as a central node that receives, stores and processes metrics data from other Netdata “Child” nodes in our infrastructure...

https://learn.netdata.cloud/docs/streaming/


hey - i work in Netdata on ML

Just to mention there is this doc too that also tries to explain various deployment strategies

e.g. stand alone: https://learn.netdata.cloud/docs/architecture/deployment-str...


actually sorry in this case its more like parent-child

https://learn.netdata.cloud/docs/architecture/deployment-str...

and just dont have to claim the nodes to Netdata Cloud if you don't want to.


Netdata deserves way more attention. It automatically configures itself with all relevant modules, runs very lean and has more information available than most people will ever need.


The article's complaints include complexity of JS web interfaces and "eye candy", while netdata's UI requires JS, is quite laggy and jerky, very interactive. I think munin fits better (uses the same RRDtool graphs, too), though possibly its configuration is too lengthy for the requirements.


I like the idea of simplicity and doing exactly what you need. And the single executable.

The words "stupid simple" and "C++" together make me scratch my head though. C++ itself is not simple, and you have to recompile if you need to change something (and sometimes you inevitably do), which is slow. I'd likely go with a relatively simple C program that embeds the FFI for RRDtool and other stuff, and embeds Lua, or, better yet, Janet. Then most of the thing could be written in these languages, and would be easy to tweak when need be. That would still allow for a single executable + a single config file, on top of the logic already embedded. (But the author went and built the thing, and I did not, so their solution so far outperforms mine.)


Simple means simple for the end users. Not necessarily simple for the developers. The end users do not care if the dev has to spend a couple of minutes recompiling, as long as the result is fast and simple to use.


This is fair.

But the developer and the user is one and the same here :)


c++ can be relatively simple if you don't throw in all the bells and whistles and stay away from template metaprogramming. I use it all the time. Sure compiles can be slow, but with proper file partitioning and creating libraries 95% of that can be controlled for.


I was also overwhelmed by Grafana and co. In the time required to install it, i coded a simple monitoring alternative DMSR "Does My Shit Run" in python. Each agent has plugins which basically just sends a data structure to the monitoring server, which will display it as yaml. No persistence, history, graphs or similar. uMon looks like a behemoth in comparison.

Github: https://github.com/dobin/dmsr

Live: https://mon.yookiterm.ch


I really like the idea behind μMon. It reminds me of when software was simpler. I remember using a program called "Everything" by voidtools. It was small but could search a lot of files quickly. Nowadays, some projects use big tools like Elasticsearch just to search a few things. Some even use PostgreSQL, a big database, for small tasks. I wish more software would keep things simple.


“Everything” is a must have for me. It’s shocking that Windows doesn’t come with local search that works.


Microsoft has added basically the same functionality to Power Toys, called Run[0], taking heavy inspiration from Everything. But yeah this should be a built-in utility.

[0]: https://learn.microsoft.com/en-us/windows/powertoys/run


Does it read the ntfs journal? Because that's really what makes everything, everything.

It makes me wish I used a journaling filesystem on Linux too.


I'm pretty certain that everything doesn't use the NTFS journal[1].

My understanding is that everything uses the file open hooks provided for antivirus to maintain the index, which is why it appears instant.

Adding a millisecond to each open call is imperceptible to the user, and it takes less time than that if you return immediately and process the index update in the background.

[1] happy to be proven wrong.


You don’t? What do you run Linux on? The default Ext4 is journaled


Not quite. Run has plugins but the default file search still uses windows search. However, there is a third party plugin for powertoys run that integrates the everything search service.


Yep, if you run Windows and don't have that installed, you're just suffering for no reason. Fastest file search you will ever find.


Whats wrong is Postgres? It is very simple yet very powerful. Can run on very minimal resources , without hogging your CPU


I think GP meant more that people shouldn't use Postgres for small systems, over SQLite or something.


I was recently looking for an ultra minimal monitoring solution for OpenWrt and other lightweight systems (Pi’s, etc.) and was disappointed not to find one that met my needs (negligible CPU, disk space and RAM).

I ended up hacking together a shell script to send data to Home Assistant (via MQTT) which runs on pretty much any system that has at least netcat: https://github.com/roger-/hass-sysmon


why not use https://collectd.org/ which is in C and used by openwrt's luci already along with rrdtool, small in size, low on resource, and has so many plugins already?


I wanted something that would run anywhere with no dependencies. This might work on OpenWrt, but would be trickier to setup on Debian.


you can compile with specific plugins and features you need with ./configure and that cuts out most of the dependencies.


collectd + minimal plugins would still be 100+ kB installed on OpenWrt. Plus I'd have to write a solution to get the data into Home Assistant.


Did you try VictoriaMetrics [1] and vmagent [2]? It is a single self-contained binary without external dependencies. It requires relatively low amounts of CPU, RAM, disk space and disk IO, and it runs on ARM.

[1] https://github.com/VictoriaMetrics/VictoriaMetrics/

[2] https://docs.victoriametrics.com/vmagent.html


Did you have a look at https://www.monitorix.org/ ? I'm not sure if it works on OpenWRT, but otherwise checks all the boxes.


It requires Perl, which ruled out OpenWrt devices with ~10 Kb of space.


Why's everyone hating on Grafana? I find it fairly easy to use, it has a good balance between power & simplicity. And with docker you can make it run in seconds.


No one is "hating" on Grafana. But if you used tools of the past, you'd know how light they were compared to the Grafana/Kibana/Elastic mongrel.

But I should mention that there is nothing "stupid simple" about the solution of the OP. Just look at the install procedure... https://tomscii.sig7.se/umon/


what in the... what is the point?? I've used Grafana many times and it's always, always a simple docker start -v, ..., and modifying some configs to get started. This tool is completely useless.


The difficult thing for me was understanding how to think about and then configure agents. I never could reason about getting the central server up and then connecting nodes to it, whether those nodes were applications or entire servers. The terminology didn't make sense to me and it made me question whether I had the right framing for the purpose of grafana.


In the authors case he used Prometheus and Grafana was just the "frontend" for it. Basically a way to build pretty dashboards our of your PromQL queries. There's no Grafana agent you have to configure.


Using Docker to run in seconds is like saying using a Windows VM in Qemu and you got Office running in seconds on Free BSD.

It’s still a mess, but it’s hidden now.


This doesn't seem much simpler than Prometheus+Grafana, if it all.

Some pushback:

- SNMP sucks. It's very limited, difficult to secure, etc. I've spent a lot of time with it, and it's more complex than Prometheus' simple HTTP metrics model. I use it where I have to (non-server devices), but I prefer dealing with Prometheus.

- Grafana is not necessarily complex. It's powerful, and you can waste a lot of time overinvesting in dashboard design, but that's not required. It can be used quite elegantly.

μMon does seem like "old school for the sake of old school". SNMP and RRDTool were designed when memory & bandwidth were much more limited. I will happily trade the overheads of HTTP and static Go binaries for the much superior UX they offer.


I do run Grafana + InfluxDB at home and I agree it’s not trivial to setup. Grafana in particular makes creating informative, easy to read graphs/dashboards a PhD worthy endeavor.

Yet, I run some hobby projects that collect data and this setup is absolutely perfect for it. I even challenged myself to use SSL for the InfluxDB server (run small CA).

Also, I use slack-based alerting through Grafana, for example if a disk would fill up, or something is down.

So it’s really about what your needs are.

And often, basic metrics about systems like CPU usage, load or network traffic doesn’t tell you anything useful or actionable.


Not bad. I like KISS concepts. I personally run old Cacti instance for monitoring here. Not that simple as uMon, but not very complicated either. And even wrote cacti_con CLI like graph viewer, to see specific port of that fat 100+ ports campus switches I had at work :)


https://collectd.org/ does the gathering (and writing to RRDTool database, if you so desire) part very well. Many plugins, easy to add more (just return one line of text)

Still need RRD viewere but that's not a huge stack

And it scales all the way to hundreds of hosts, as on top of network send/receive of stats it supports few other write formats aside from just RRD files.


I used it for years but somehow it went out of fashion. It is now missing in Ubuntu and Arch linux repos.


Looks more or less like munin...


Rrdtool is still my go-to for a lot of things. The only functionality I would like to have from it is an SVG version of the charts that allowed for panning and zooming into particular points in the past.


This really speaks to me - rrdtool is criminally underutilized. Great work!

I did something different but in a similar vein for one server network. We had Seq already deployed for log monitoring so instead of setting up a separate network/node/app health monitoring interface I configured everything to regularly ping seq with a structured log message containing the needed data that could be extracted and graphed with seq’s limited oob charting abilities in the dashboard. Not perfect, but simpler.


Looks nice. I would like to use something like this to remotely monitor machines. Currently use Prometheus (but without Grafana), since the alerting and built-in graphing is sufficient.

But agree with OP that Prometheus feels more complex than need be for simple use cases. But so does sendmail ;)


VictoriaMetrics have all-in-one binary that is pretty easy setup for simple one node install.


VictoriaMetrics is a layer on top of Prometheus from a quick read.

I managed simple alerts just using Prometheus's alert manager and aws's simple mail system, I prefer this simpler approach.


> VictoriaMetrics is a layer on top of Prometheus from a quick read.

Sorry, but you're wrong! VictoriaMetrics was built from scratch with own ideas [1][2][3] and never was build on top of Prometheus. Yes, it use some libs which are used in Prometheus/InfluxDB/othe open source projects but thats all. More over VictoriaMetrics team has created own query language, named MetricsQL [4] which is inspired by PromQL [5].

> I managed simple alerts just using Prometheus's alert manager and aws's simple mail system, I prefer this simpler approach.

In the VictoriaMetrics stack, alerting is placed in a separate utility - vmalert [6] that is responsible for alerts and works with alertmanager as well as Prometheus.

[1] https://faun.pub/victoriametrics-creating-the-best-remote-st...

[2] https://valyala.medium.com/open-sourcing-victoriametrics-f31...

[3] https://www.youtube.com/watch?v=-DbbIZzFHIY

[4] https://docs.victoriametrics.com/MetricsQL.html

[5] https://medium.com/@romanhavronenko/victoriametrics-promql-c...

[6] https://docs.victoriametrics.com/vmalert.html#vmalert


Nice work!

I often think about the "reinventing the wheel" argument. Isn't open source about diversity? There are so many fork, clones, "Yet another..."'s (yacc, yaml,...).

So many times I'm looking for suitable go libraries that solve a certain problem. There might be a few out there but every lib has its own pros and cons. Having the possibility to choose is great. Nothing sucks more than depending on a unmaintained clib nobody cares without alternatives.

The only counter-example that comes in my mind is crypto. You don't want to do your own crypto.


I think it mostly depends on purpose. Crypto is being reinvented several times a day by students needing to understand the mechanics of various algorithms.

Personally, I find it rewarding to reimplement something known. There is always a solution when you are stuck, and who knows. Maybe one will develop a better API for the system of something else.


fyi, here is a very similar project.

https://github.com/pommi/CGP


Monitoring should be: "central" location with GUI/graphs + agents per bunch of servers. Let me chose from dropdown what I want to see.

If I have to deploy this on each machine than it makes no sense. I know SNMP is able to be used like this, but is μMon ?


You can easily put together something that will send UDP packets with stats at regular intervals. I’ve done that a number of times - https://github.com/rcarmo/raspi-cluster/blob/master/tools/se...


Is there any simple monitoring system for kubernetes that will monitor memory and CPU usage for each deployments and node? Prometheus and Grafana is good but too much configuration. I also like stats page of HAProxy. Something like that for per service?


Nice nifty project, he had me until the "no alerting" part.

Anyway, I might still deploy this in a Proxmox homelab where I don't want to fight with the complexity of a grafana dashboard.


What’s missing from the Proxmox charts that you would need a dashboard for?


From the top of my mind (I'm not near the deployment right now):

- Temperatures

- Fan speeds

- Centralized metrics for all VMs


it's 2023. Tracing has been around for 20 years(dtrace, x-trace https://cs.brown.edu/~rfonseca/pubs/xtr-nsdi07.pdf).

Very simple logging, if not structured, while not completely useless it's not very useful either. except for maybe showing some nice charts.

Any serious monitoring tool is useful when it can explain things and only tracing gives you causal information.


Not to be confused with http://umonfw.com/


I honestly don't get the criticisms of the Prometheus + Grafana stack.

> A full-blown time-series database (with gigabytes of rolling on-disk data).

Prometheus has a setting that allows you to limit the space used by the database. I'm not sure however how one can do monitoring without a time-series database.

> Several Go binaries dozens of megabytes each, also consuming runtime resources.

Compared to most monitoring tools I've tested, the Prometheus exporters are usually fairly lightweight in relation to the amount of metrics they generate. Also, "several dozens of megabytes" doesn't seem like too much when we're usually talking about disk spaces in the gigabytes...

> Lengthy configuration files and lengthy argument lists to said binaries.

Configuration files, yes if you want to change all the defaults. Argument lists, not really. In reality, a Docker deployment of Grafana + Prometheus is 20 lines in a docker-compose.yml file. Configuration files come with defaults if you install it to the system.

By the way, I'm not sure that configuring a FastCGI server will be easier than configuring a Docker compose file...

> Systems continuously talking to each other over the network (even when nobody is looking at any dashboard), periodically pulling metrics from nodes into Prometheus, which in turn runs all sorts of consolidation routines on that data. A constant source of noise in otherwise idling systems.

Not necessarily. Systems talk to each other over the network if you configure them to do so. You can always install a Prometheus + Grafana on every node if you don't want to do central monitoring and you'll have no network noise.

> A mind-boggingly complex web front-end (Grafana) with its own database, tons of JavaScript running in my browser, and role-based access control over multiple users.

Grafana, complex? I think dragging and dropping panels with query builders that don't even require you to know the query language are far better than defining graphs in shell scripts.

> A bespoke query language to pull metrics into dashboards, and lots of specialized knowledge in how to build useful dashboards. It is all meant to be intuitive, but man, is it complicated!

Again, this is not a problem of the stack. Building useful dashboards is complicated no matter what tool you use.

> maintenance: ongoing upgrades & migrations

Not really. Both Prometheus and Grafana are usually very stable and you don't need to upgrade if you don't want to. I have a monitoring stack built with it in my homelab and I haven't updated it in two years, and it still works. Of course I don't have the new shiny features, but it works.

To me, it seems that the author is conflating the complexity of the tool with the complexity of monitoring itself. Yes, monitoring is hard. Knowing which metrics to show, which to pull, how to retain them, it's hard. Knowing how to present those metrics to users is also hard. But this tool doesn't solve that. In the end, I don't know how useful it is to make a custom tool that collects very limited metrics based on other ancient, limited, buggy tools (SNMP, RRD, FastCGI...) that is missing even basic UX features like being able to zoom or pan on charts.


What about netdata?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: