Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Top, but for Nginx (github.com)
219 points by squiguy7 33 days ago | hide | past | favorite | 67 comments



Monitoring capabilities are missing from Nginx on purpose. They are not and will never be available for free because there is "NGINX Plus".

This is why I recommend switching to HAProxy.


I'd love to just "switch to X", but there is no X which provides all of the above in one great package: Static file serving, load-balanced proxying (TCP/HTTP), fine-grained caching, automatic Let's Encrypt update, API-based configuration (for dynamic upstreams etc), monitoring. Maybe there shouldn't be such a tool. For all other use-cases I go with nginx since it at least provides decent proxying, caching and static file serving.


Correct me if I’m wrong, but doesn’t Caddy 2 do almost all if that?


Had to check, Caching is still missing from Caddy 2, everything else seems to be there [1]. Now if there's no missing minor feature that I rely on in nginx, I might be able to switch eventually, fingers crossed.

[1] https://github.com/caddyserver/cache-handler/issues/1


You mean the software which wouldn't start when let's encrypts acme server was offline and which developers said this is working as intended?

I mean, I'd definitely encourage people to use it for hobby projects, but if that's how the developers see their software, I would never trust them with anything serious.


Someone's a little out of the loop.


I know it was "fixed" after thousands of people chimed in.

Nonetheless, I still wouldn't be able to trust developers who think that's reasonable.

if it had been an error and unintentional i wouldn't have been worried. mistakes happen to everyone. but it was an actual design decision. Without serious code review i'd be too worried the developers had any other bright ideas.


You're responding to caddy's author.


Nginx does automatic let's encrypt? Since when?


certbot --nginx foo.bar.com works like a charm


X == Apache httpd 2.4


I was thinking the exact same thing ironically.

Static file serving? Sure!

Load balanced proxying? mod_proxy_balancer is great!

Fine grained caching? mod_disk_cache is also great

Updating loadbalancer bits via the api?

mod_proxy_balancer supports a balancer-manager endpoint for that to do live updates

monitoring? mod_status + prometheus exporter or

mod_prometheus_status

native LE support? https://github.com/icing/mod_md is going to be rolled into upstream apache


Have you actually seen an apache in the wild in the last few years? No one picks it anymore, I'm not sure why.. Well, besides the fact that nginx is now nginx-ingress-controller and we all use k8.. :/


Why does no one pick it anymore? The reason is twofold: (1) the amount of FUD that surrounds it, based on old comparisons of nginx and Apache httpd 1.3 or 2.x using Prefork and (2) cool-kid syndrome. This thread itself is a perfect example.


Apache still runs some really big websites that likely have more requests than many of these startups. Ticketmaster has used Apache for almost 15 years as their primary webserver (but they're fronted by layers of varnish / Akamai). They also maxed out dual 10G links with web traffic in 2007 or so when I worked for them.

That said, netcraft says Apache still runs almost 25% of the internet, which is no small stake: https://news.netcraft.com/archives/category/web-server-surve...


I still use it to have basic auth connected to LDAP.

The weakness of nginx is that it can't have a dynamic module and if it's not compiled in, you need to roll your own build, which I won't do due to maintenance burden.


Maybe it‘s due to guilt-by-association with PHP and the LAMP stack...


Yes, but I know some of the apache.org SRE so maybe my view is biased.


The most amazing guy who wrote the book on mod_rewrite (Rich Bowen) is from the same tiny town where I grew up in. The apache software foundation upstream folks are super good people.



Do you know if it's the same with Openresty?


Wish all distros shipped https://github.com/vozlt/nginx-module-vts by default. It's a minor pain to self-build


This is cool!


HAProxy also has an 'enterprise' offering[1], what makes this different from nginx plus?

[1] https://www.haproxy.com/products/haproxy-enterprise-edition/


Haproxy's full monitoring capabilities are available in the open source version. Nginx's are not. The stub_status module is very limited. Compare https://www.haproxy.com/blog/exploring-the-haproxy-stats-pag... with https://nginx.org/libxslt/en/docs/http/ngx_http_stub_status_...


What exactly do you need to monitor on your nginxes? We collect logs, scrape metrics from the nginx pods and... that's enough..


I always wonder - why no one from open source community has created better stats module? Is there something in the license that prohibits creation of modules that overlap with Nginx Plus?


I would assume it's the lack of dynamic module support for nginx and you need to compile your own build even if someone creates a module.


That's true for any functionality provided by modules and there's plethora of them. Also, Nginx has support for dynamic modules. Recompiling Nginx always worked out of the box for me too, so it's not like a big issue.

One thing that comes to my mind is that maybe this can't be solved by a module due to missing API in open source Nginx.


And why others would recommend Nginx Plus.


If you can afford to pay four figures per instance per year, yeah


Maybe not as lightweight, but GoAccess (https://github.com/allinurl/goaccess) does an awesome job at parsing the logs and displaying statistics, works for nginx and other webservers too


I hadn't heard of this tool but it looks much more complete! Thanks for sharing.


Goaccess used to work perfectly, but recently when ever I try to run the real time HTML command, it exits without any error messages after ~2 million records. Maybe out of memory.. any ideas?


Doubt very much that ~2M will be a memory issue (unless you got less than ~130MB). https://goaccess.io/faq#performance

We're running v1.4 in production and it has been working pretty nice for us.


Last time I tried it was a pain to set it up to keep up with log rotations and retain all the records.


For us it was literately an `apt install goaccess`, picked the COMBINED log format and it's been running for over a month in prod without issues (rotating weekly).

You should post the issue on their https://github.com/allinurl/goaccess, they may be able to help you.


This is the tool I've wanted (and half written 3-4 times) my whole career. From reading the github it looks lightweight, not a big infrastructure addition, and that it helps you figure out wtf is going on with the web server.

Regarding the branding, for me top is a real-time tool rather than a logging tool. I was picturing something that may have been more useful for older style Apache httpd installs where you have several virtual hosts on a server and you'd want to know who is hogging the resources or causing the problems.


Pro tip: you can make any command into a real-time one with `watch`:

  watch bash -c "topngx < /path/to/access.log"
Will run `topngx` against access.log every two seconds and display the output.


That's not "real time"! And definitely won't behave well if the processing takes more than 2 seconds (imagine log files of many millions rows)


It'll just wait two seconds after the command returns, not spawn one every two seconds.....


Thanks for the feedback. I wanted to capture the same idea as the original tool I listed in the README. In the future I hope to add functionality to tail a log file in real time and do live updates of the stats. This is where the idea of top comes from.


Agreed. The defining feature of `top` is that it updates in real time.


My last company had something like that and included response time percentiles (50th, 90th, 95th, 99th) and we had these values graphed and displayed on a big screen in our office. Along with a ton of other performance stats: queries per second, various measures of system load, etc.

Averages can lie, especially when something like an empty query can take close to zero time compared to a non-trivial transaction. If some robot or other artifact of your site is generating a some amount of null queries that will make your average response time look better than it actually is. Percentiles, particularly on the tail of 90th or above, tell a better story of how well and consistently you're responding to traffic under load.


How "recent" are your percentiles? I have found that calculating percentiles is a pretty CPU heavy task. And you if you have a giant Grafana querying every 30s it can stress out your prometheus/graphite whatever. But if you take small data size, like 95th percentile of latencies in the last 2minutes, it's not really a very accurate representation either.

And ofcourse there is another problem of correctly storing all your latencies accurately which becomes pretty hard if you are using something like prometheus.


I wouldn't mind a screenshot before installing it.


I thought the same thing. I was expecting to see some kind of screenshot so I could have a glimpse of the software.

EDIT:

1# I'm gonna compile it and provide a screenshot via a pull request.

2# Compilation failed because it needed sqlite3 headers and this is not reported in the app. I'm gonna edit that in the readme too :)

This year (specially with the whole covid thingy) I set a goal to contribute more to open source. I'm trying to find every little issue I can find and contribute to :P


Thanks for submitting a patch! I have been throwing around the idea of using the "bundled" feature flag listed here: https://github.com/rusqlite/rusqlite#optional-features. I was hesitant to initially because it would force users to have a specific version of SQLite.



It seems to be a rewrite of ngxtop

https://github.com/lebinh/ngxtop


It does seem to be that except it doesn't "tail log files" like the original (see limitations at the bottom if the readme.


I can work on adding that into the README now.


Hmmm... looks like nothing more than a weblog analyzer. Someone correct me if I'm wrong. It's not "real time" since it can only report on what the web-server has done not what it is doing. AFAIK, nginx has nothing like Apache httpd's mod_status... at least, nothing open source.


If you're in a Kubernetes environment, the NGINX Ingress Controller has a pretty decent set of realtime metrics: https://kubernetes.github.io/ingress-nginx/user-guide/monito...

Which presumably means those metrics are available in the OSS edition somehow...


Open source nginx has stub_status. It's just not very (very not) featureful.

https://nginx.org/en/docs/http/ngx_http_stub_status_module.h...


What frustrates me about Apache's mod_status is that it's powered by a normal request handler. If all the child workers are busy, your request for a status report will timeout, even if you're running it locally from the command line. Not super helpful when you're troubleshooting in real time.

Anyone know if there's a "deeper" way to get the same stats info about what Apache is doing without having to basically wait in line with all the other incoming requests?


the latency between weblog and web server is a lot more real time than most things


> This tool is a rewrite of ngxtop to make it more easily installed and hopefully quicker.

Why make a whole new tool with limitations instead of improving the existing one?


Original maintainers of a project probably wouldn't just accept a PR of a complete rewrite.


Might also be an exercise in learning.


Interesting, but I would have thought "top" for nginx would be a tool that shows you all the connections, paths, and resource usage live, like the "top" command. Is there a tool that does that?


How does this compare to goaccess? Similar tool that I've used briefly. One issue I had was how complicated it was, I'm assuming since this is nginx specific it's simpler.


I made something similar in Python [0], but for parsing the error_log directive. Just for the odd time you need to parse that.

[0] https://github.com/madsmtm/nginx-error-log


>a rewrite of ngxtop to make it more easily installed and hopefully quicker.

What world does this guy live in that a program in Rust is easier to get running on any random machine than python script?


I've never had issues shipping Rust binaries to people, unlike in Python where they usually have to install dependencies.


I had the vague sense that it was as easy to cross-compile rust binaries as it is for go.

Distributing single binary would be easier, but `cargo install xyz` seems harder than python script


I made a new GitHub release with binaries for Mac and Linux pretty recently. In this sense, you can just download the binary and get up and running.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: