Hacker News new | past | comments | ask | show | jobs | submit login
How to run Valgrind on a CGI program in C (conman.org)
32 points by ingve 44 days ago | hide | past | favorite | 22 comments



to me it would have been simpler to wrap the program in a small shellscript or something, e.g.

    #!/bin/sh
    exec valgrind --log-file=/tmp/valgrind.%p.log /path/to/my/app

and have the web server execute that instead of the original executable


Being able to repeat the experiment at your leisure without digging through web server logs is valuable. Still, a shell script does indeed sound like the way to go: `set` should already dump the environment as required here, and then you can exec your CGI program. (Unfortunately, it doesn’t in bash; `set -o posix` first to fix that.)


I want to know how to run Valgrind on a Python program that uses lots of different modules. The amount of noise I usually get is so large that Valgrind becomes useless.


You might want to take a look at https://github.com/nyfix/memleaktest

We've open-sourced the tools we use to run valgrind (and ASAN) on large mixed C++/Java code bases. The JVM in particular triggers a slew of errors which can make filtering valgrind output impractical, but the scripts we developed can handle that. FWIW, we use these tools every day on the code that goes into NYFIX Marketplace (https://www.broadridge.com/financial-services/capital-market...).


That's very interesting, thanks!



Thanks, but I think I looked into that when I ran into the problem.

Iirc, the main problem was that most Python modules are not designed with Valgrind in mind (so they generate a lot of noisy/irrelevant warnings and errors)

I wish module maintainers would make running Valgrind a part of their QC process.


Understandable. I don't think there's a good way around this though. Even if you run valgrind on a C++ repo that hasn't seen much valgrind use, you'll get a lot of output. But that output is because the memory management is usually broken in every larger C++ repo which isn't developed by a small team of experts.


why not run valgrind on the web server and wait until it crashes on your POST request and then inspect the logs


A decent server wouldn't crash if a cgi binary crashes.


One, CGI scripts are run as a separate process, so running valgrind on the web server won't get status about the CGI program. Two, it would probably generate a ton of output not related to the CGI program even if it did work. Three, the CGI program wasn't crashing, it was to check for memory errors.


I don’t get why people use (or used) CGI programs for web servers. This seems crazy inefficient and prone to break


CGI was the late 90s equivalent of "function as a service". It was especially useful on multi-tenant web hosts, in a way that hasn't really been replicated since. The main advantage is absolute simplicity: you upload an executable, the webserver runs it under your uid/gid, it handles your request, and then it exits and doesn't consume any more resources.

The downside was the cost of starting an executable from scratch, which isn't that bad on Linux, plus the cost of whatever interpreter was handling the request if you didn't write it in C. Such as PHP or Perl ('use CGI' was very popular).

(what better solution would you choose in 2024 for a web server where you have 1,000 customers each of which has one or two CGI scripts that get on average one request every few minutes?)


You don’t get, when web servers were a new thing, why anyone continued using the same kind of executables they always had to handle workloads that required executable code.

Probably because there was not yet any data AT ALL about demand from the network, the patterns that random visitors could cause, etc.

“I don’t get why the Romans didn’t build suspension bridges with concrete, steel, and cables.”


It's actually pretty fast if written in a lean language like C/rust. See "cgit" for a good example. And if you want to distribute a program which talks HTTP, it's kind of the obvious choice - the alternative is each program packing it's own HTTP server, which needs a reverse proxy/SSL in front of it anyway.


The bottleneck for most CGI requests wasn’t the language, it was the fork()ing for each request.

Granted there will be some CPU intensive services that will benefit from C but I still wouldn’t call CGI “pretty fast”.

That’s not to say I’d dislike CGI. I made a fair amount of money in my early career based on services written to use CGI. But the reasons why we had things like fastcgi, mod_perl, and the HTTP servers included in language frameworks was to mitigate the overhead of expensive process forking.


Interesting, I've never found fork particularly expensive, usually the bottleneck turned out to be setting up language specific runtimes. I would certainly never write CGI in a language like python where startup takes significant time.


Yeah, that’s true too.


Because back when CGI programs where the norm application servers weren't really a thing outside of maybe Java and similar ecosystem (by which I mean dotnet).

When you wrote a web application before the ~2010s you wouldn't start of with a library with webserver building blocks to which to attach routing + behaviour. It sounded crazy at that time, especially when you where building websites with a scripting language like PHP/Perl/Python/Ruby. It sorta became the norm with the rise in popularity of event driven IO, through projects like NodeJS and the rising popularity of Go.

Even today, at least in the PHP world, we still mostly use the CGI based approach to running code (though it's through a more modern interface such as "fast cgi"), with an inbetween service that handles process (fpm - fastcgi process manager). I'm sure other dynamic languages have something similar, though there is a large wave of people pushing for more application server style deployments in those ecosystems as well (which is more feasible today).


I have no idea why someone would use one now. In the mid-90’s it was pretty much the only practical cross-language way to create dynamic web content. Someone certainly could have written their CGI program as a separate web server, but:

1. proxy support was limited or non-existent in the main webservers that existed at the time, so routing the request over would have been a problem.

2. there were no frameworks to speak of to make any of this easier. CGI.pm would help parse and decode GET and POST data. That was just about all there was. There was no PHP (as it exists now,) no Rails, no http frameworks like actix-web. Nothing.

“Inefficient and prone to break” is an excellent description of CGI. Most practical people moved away from as quickly as possible.


CGI is a simple interface. It's simple to implement and simple to use. As a result, it's much less prone to break than most modern web architectures IMHO.

It does leave some performance on the table, which was addressed by things like mod_perl or FastCGI (incl. PHP-FPM). There's also the middle ground solution of mod_cgid, which has a specialized daemon fork the CGI process instead (presumably the specialized daemon has a much smaller memory image than the full server).


What should they have been using instead?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: