Hacker News new | past | comments | ask | show | jobs | submit login
What Were CGI Scripts? (rickcarlino.com)
130 points by UkiahSmith 3 months ago | hide | past | web | favorite | 119 comments

To answer the title: what we called FaaS back in the day. And back in those days, the suspender-wearing UNIX sysadmins said that it was just a more restricted version of inetd.

Wonder what the next generation will call it?

And yes, I am being sarcastic. There are differences. But do recall that when AWS Lambda came out, it had the exact same limitations w.r.t. one process per call, needing one fresh connection to a DB per request to handle, etc.

Lambda is still single threaded, right? You will only be sent one request at a time per process. So now its at best FastCGI.

compare inetd in wait mode

This makes me nostalgic. Spent inordinate amounts of time writing terrible, unmaintainable Perl scripts and FTPing them to cgi-bin on free 5MB hosting plans on F2S and Netfirms. Most hosts wouldn't provide error logs, so there was nothing more than a 500 page from httpd to tell you that something had gone wrong. Debugging anything was an endeavor. Then of course, PHP came along and killed cgi-bin.

If I had to pick one thing to represent the CGI era, it would be Matt's FormMail [1].

[1] https://www.scriptarchive.com/formmail.html

> terrible, unmaintainable Perl scripts

Lucky you. At a previous workplace I had to cater with actual binaries produced from C source code that contained strings that contained script tags that contained JS code that generated HTML. I’ll let you sink that in and process the escaping.

Apparently it never occurred to some people that they could read those from disk, even from the C program.

> PHP came along and killed cgi-bin

PHP was using CGI like Perl. And then PHP had Apache's mod_php, which I believe is still widely used today.

Perl, of course, had mod_perl. And later, PSGI/Plack followed by Catalyst, Mojolicious, Dancer, etc.

mod_perl had the issue that it had hooks deep into the webserver, which wasn't "shared hosting safe" mod_php had less hooks and some basic cross-vhost-access mitigations (Safe_mode oben_basedir etc.) which allowed independent users to put PHP scripts on the same server, which in turn made web hosting cheap.

When I last used PHP (about 5 years ago) it was common to use it via FastCGI "for performance". I'm not sure how true that reasoning was, or how the speedups of 7.x have changed things, but (Fast/S/whatever)CGI still seems prevalent.

Matt's Script Archive was a great resource.

Matt's Script Archive was a great source of security holes.

Oh man I remember using the Netfirms free tier! The internet was so different back then, a company wasn't afraid of giving free compute resources on subdomains of their main site.

More recently, the Netfirms team started another free service: https://www.sync.com/about/

To be fair, there are a LOT more companies today giving free compute resources on their sites :-)

So many problems solved by formmail.pl

When PHP used to mean "Powerful Heavenly Perl".

Rumor had it the first version was written in Perl.

No, it was Personal HomePage tools. It was written in C. Rasmus even explicitly said you didn't need Perl when he first announced it on Usenet, here: https://groups.google.com/forum/#!msg/comp.infosystems.www.a...

I know, that's why I used the word `rumor`. Powerful Heavenly Perl was a geek joke (like PlentyofHorsePower).

CGI is still a reasonable choice for low query rate and simple pages, like an AJAX responder.

Especially when there won't framework already in use on this particular server.

Why add <framework of the day> to your setup when every 10 minutes or so, it's time to read the temperature sensor and return a string with the reading.

Also, since it's served by a transient process, there is no problem of memory leaks, file handle leaks, other resource capture by long running processes.

Per Murphy's Law, of course, everything is fine until one day the internal app goes public and suddenly there are 500 queries a second.

> CGI is still a reasonable choice for low query rate and simple pages

By which you mean, "Almost every website in the Universe which isn't actively being DDoSed" with the understanding that essentially nobody can stand up to a modern, competent DDoS.

> Per Murphy's Law, of course, everything is fine until one day the internal app goes public and suddenly there are 500 queries a second.

Omae wa mou shindeiru. You are already dead.

It's humorous to think every random person on HN is working on a site that gets more traffic than Slashdot did in 1999. A site that ran on mod_perl and, by modern standards, fairly basic hardware.

And for the most part, CGI went on the decline the day mod_perl came out. Which was 1996. So we really didn't use basic CGI for that long. The popularity of putting Perl right in Apache eclipsed running standalone Perl via CGI. It would be at least another decade before the world started to move on from this arrangement (mod_php or mod_perl).

agreed. I launched a cgi-based thingy a few months ago for my job to render a really basic template across a ton of sites with very low volume. Had it out in less than an hour.

Honestly, sensors is a really bad example.

Each CGI request has to create a new process, that can easily mean entire seconds of CPU per request.

A few sensors sending information from time to time will quickly turn into a massive resource hog. Having used CGI extensively, one of the biggest caveats is resource exhaustion and the ensuing DDoS.

That's why I explicitly chose every 10 minutes.

There are situations like that.

I have seen CGIs where usage started very slow and then gre t the point of friendly DDoS, and it's not fun because the CGI is usually built deep into the architecture by that point.

Also, sensors was the first thing that came to mine.

Also some sensors read very fast, like in Linux "sensors -f" returns the CPU temperatures.

came to minD


came to mine

I think one of the overlooked things of the process-per-request CGI scripts we ran back in the day is how you might run out of PIDs. I remember this happening once, but I think(50/50) it was because the PID was a 16-bit value at the time and the system and an uptime of about 6 months.

PIDs will recycle rather than exhausting. As an aside, that's also the reason why it's unsafe to use a PID from any process other than the parent process. As soon as the parent calls wait(2), the PID is released and could be reused for a fresh process at any moment.

The only way to run out of PIDs is not wait() properly, leading to zombie processes, or else some CGI processes time out and don't exit, so never reaped by a wait().

In my experience PHP is the CGI of 2019.

One of the bits of CGI (I wrote a lot of perl back in the day) is that parts of it are still there, under the covers.

Looking at RFC 3875 https://tools.ietf.org/html/rfc3875 - you see things like PATH_INFO, PATH_TRANSLATED, QUERY_STRING...

Pull up Java HttpServletRequest, https://javaee.github.io/javaee-spec/javadocs/javax/servlet/... and there is getPathInfo(), getPathTranslated(), and getQueryString() along with many of the other parameters that would be familiar to someone writing a CGI.

You can find them in C# - https://docs.microsoft.com/en-us/dotnet/api/system.web.httpr... - PathInfo, QueryString and the like.

You can find them in Haskell - https://hackage.haskell.org/package/happstack-server- Route by pathInfo and the QUERY_STRING is clear in https://hackage.haskell.org/package/happstack-server-

CGI scripts aren't dead... they just got better plumbing.

OP does a poor job explaining what CGI scripts were.

From the RFC linked in the OP: https://tools.ietf.org/html/rfc3875#page-23

CGI defined an interface between a web server and an executable that would provide a response.

- Request meta-data i.e. path, query string, and other headers passed as environment variables.

- Request body passed via stdin

- Response header and body passed via stdout

In this way, a webserver like Apache could provide a platform for a wide array of languages. Yes there were security and scaling concerns, but it also was an opportunity to rapidly release and iterate on a product.

I’m really grateful I had to cut my teeth on cgi because it forced me to understand the whole http request/response cycle in detail. There were libraries to help (CGI.pm, anyone?) but they stayed down at a pretty low level (helping parse params from an url or POST, for example). To learn how to implement a web login form, I had to understand how cookies worked, how to set and then read a cookie header, how to send the right headers with the response, how to encode the cookie payload, how to store the password locally.

Today, a web framework will do this grunt work for you. Which honestly, if you’re handling passwords, is in many ways a good thing. But people are less likely to learn the basics of how their app talks to a browser.

> I’m really grateful I had to cut my teeth on cgi because it forced me to understand the whole http request/response cycle in detail.

That's kind of what I like Golang webdev with the stdlib only, it's sort of similar in that way.

>> But people are less likely to learn the basics of how their app talks to a browser.

Do they need to? Do you yourself understand every miniscule detail of radio wave propagation before making a cell call to your friend?

It certainly helps knowing enough to go outside if you have no signal, rather than just thinking your phone is broken.

That's the same as knowing that "there's a thing called cookie that needs to be 'set' for my user to be authenticated".

Speaking as somebody who regularly provides help to developers building webapps: this comparison is nonsense, and yes, understanding how HTTP works is actually necessary to build robust web applications.

The problem with almost every abstraction built on HTTP, is that it's leaky. There are a lot of details about how your application works that are only explained (and understandable) in the context of HTTP as a protocol. This can be fine in and of itself, if you understand that this is the case, and learn about HTTP.

I have quite often ended up in frustration for hours trying to explain to somebody how to approach a problem in their application, because they refused to think outside of the framework they were using, and didn't want to "waste time" (their words) learning HTTP. These are the same people who you see getting stuck on the same problem for months.

Making a call is different because it is not a leaky abstraction. You genuinely do not need to know anything about radio waves to do it or to understand how calls work conceptually; it's an implementation detail, the only relevant part of which is the signal strength, which itself has a dedicated and well-understood abstraction that's separate from the underlying technology (namely, signal bars).

The chances of a 'perfect abstraction' ever existing over HTTP are tiny at best. As we're talking about tools for developers here (ie. the people building things, not end users), any sufficiently capable general-purpose abstraction would approximate or even exceed the complexity of HTTP itself.

(Incidentally, this exact same problem is why general-purpose CMSes are invariably chaotic piles of badly-interoperable plugins, and why it is more difficult to build a genuinely good application with a general-purpose CMS, than without one. In practice, virtually everybody either drops some of their business requirements to stay within the CMS' native capabilities, or just piles hacks upon hacks.)

A more accurate comparison would be to compare web applications and PBX setups. Sure, PBX systems are an abstraction, and will paper over a lot of technicalities of the underlying protocol - but if you want to build a robust PBX setup with arbitrary capabilities, you will need to learn about telecom protocols and radio waves at some point, at the very least to understand the possible failure modes.

TL;DR: You can ignore low-level details if you're using a non-leaky abstraction. You need to understand them if you're using a leaky abstraction. Web frameworks are, almost without exception, leaky abstractions (by design).

Edit: And this can actually be further generalized as "any abstraction that is not very limited in scope, will be leaky".

CGI and derived protocols actually got one thing right compared to the reverse-http-proxying of today: they pass request headers in protocol variables, not the other way around. In CGI, no amount of accidental misconfiguration would permit a request to overwrite custom request variables on the way from the server to the script.

I mean, it's probably possible to configure a server to escape the original request headers as ‘HTTP-<Header-Name>: value’ and add custom ones on top, but I haven't seen it done, and frameworks depend on the headers being there intact.

That happened with the HTTP_PROXY environment variable and the Proxy request header: https://httpoxy.org/

Since the CGI "protocol variables" are actually environment variables, it creates a namespace collision and an injection opportunity for environment variables beginning with "HTTP_".

Well, putting protocol fields in environment variables, un-namespaced, is one thing CGI hasn't gotten right. That's where FastCGI and SCGI come in.

Also, once in a while I begin thinking that PHP is a pretty nice language, certainly doing its job and very performant compared to Python or Ruby, even if PHP code resembles Java more and more. And I forget the numerous questionable semantic-breaking decisions. But then, bam:

> Warning: if PHP is running in a SAPI such as Fast CGI, this function will always return the value of an environment variable set by the SAPI, even if putenv() has been used to set a local environment variable of the same name. Use the local_only parameter to return the value of locally-set environment variables.

You still have to be careful though. CGI scripts were (are?) suseptible to the shellshock exploit [1] if they were written in shell (or used the C function `system()`). I had to work around that for a few of my scripts at the time.

[1] https://en.wikipedia.org/wiki/Shellshock_%28software_bug%29#...

Written as if it's a 25 year old tech no one uses now?

I still use cgi scripts. Though these aren't "scripts", rather compiled binaries written in C.

That made some of the pages with heaviest calculations, load in under a second, which were taking more than 10 seconds in PHP.

I still use CGI in most of my web apps, some of them SaaS applications, and them performs pretty well.

The author mentions CGI overhead. Well, that's relative, please read this: http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Myths%20About%20...

One important side effect of CGI is the fact that if one route crashes (in a REST api for example) the CGI finishes its execution and nobody is harmed. On the other side, with server daemons, one bug can kill the whole app.

Thank God I thought I was the only one left.

I dabbled in PHP to create a blog[1], learnt C at work in the meantime, stumbled upon (F)CGI and am in the midst of (voluntarily) redoing my blog in C.

Does anyone have any links to a nice FCGI library? The official site went down ages ago AFAIK, and whatever I have[2] isn't working out of the box, at least not on Windows...

(Yes, I have a Linux server and I will deploy the FCGI on it too, but I develop on a Windows machine...)

[1]: https://morgen.ist/blog.php

[2]: https://github.com/toshic/libfcgi

i still run my blog [1] via CGI [2]. I've never really had a problem with it in 20 years.

[1] http://boston.conman.org/

[2] https://github.com/spc476/mod_blog

Wow! Why C? For fun? Practice? Just what you know?

When I started writing it back in 1999, it was what I knew. I also had a C library for dealing with CGI scripts I wrote from the mid-90s. Also, for several years my website was run off a 486 I had colocated at a company my friend was running, so there was a performance issue as well.

Because it's still the best language for writing library routines along with a header file, which then can be used from most other languages with least runtime overhead.

Do you have any more information available on this? I'm curious about the use case for using compiled binaries directly over PHP.

It sounds as though the compiled binaries optimize the compute portion, which the interpreted php code was taking 10 secs to run. It's still likely running as a cgi process, that is as a fork/exec in a separate address space. If you're using mod_php or one of the many php engines available for modern web servers, you're already enjoying a scalability and execution time improvement over standard cgi. Except with compute code, apparently.

Here is the C library that I have used in the past:


Astronomical calculations which can be very calculation heavy and depend on the geographical location of the requester, therefore can't be cached.

Isn’t modern PHP much faster though?

Correct me if I'm wrong, but PHP is interpreted at runtime, whereas a binary file... is directly executed?

Essentially, for the same task, PHP could never be as fast as a binary, because of the overhead of processing the language itself.

Creating a binary in place of PHP code is essentially reducing the calculatory load of a webpage to a minimum during runtime.

It won't be faster, but could be fast enough™.

Mind also: With a CGI process for each request the process has to be created, program loaded, runtime linked and executed. With any semi-modern PHP setup the PHP process is running already (be it mod_php, fastcgi, fpm, ...) and the "compiled" version of the script is in memory ready to be executed (opcode caching) thus PHP can win in startup time.

If you then out the heavy calculation in a C extension module you get simple templating around it in quite fast.

Future versions of PHP might also receive some form of a JIT (experiments are well progressing) an ideal JIT might even beat a C program (as it can compile to the specific CPU and can analyse and optimize the hotpath; see also hotspot and similar Java runtimes) PHP won't be there (as the jit has to call too often into PHP runtime) but there's room for future iterations.

> Mind also: With a CGI process for each request the process has to be created, program loaded, runtime linked and executed.

Yeah, that's CGI though, FCGI alleviates this by persistently running a single process/service/application that listens to the port and acts according to the received HTTP headers.

Whatever PHP can optimize can thus be done directly with FCGI as well. You can load everything the webpage needs into memory and just pipe that through to the user, and it would only need to read disk once at startup (or if you update the sources, but that's another issue).

The downside is of course the increased development time and complexity you'd have to invest...

EDIT2: Another downside specific to FCGI is that it is OS-dependent, because it needs to interact with the host in order to listen to a port, something that doesn't affect normal CGI because all you need there is stdout.

And and if you scrap it and build some custom webserver or nginx plugin or Apache module it can be even faster or maybe something like IncludeOS which makes your application logic part of a minimal kernel ... If that's your aim. In most cases development time however is more valuable than pure execution time. (Even Facebook survived quite long in PHP, till it was wothwhile for them to move to their own thing)

Anyways comparison by GP was between a custom CGI program and PHP and not other things.

Thanks for IncludeOS, that sounds interesting...

On Facebook: I'm willing to bet a switch to binary from PHP is one magnitude less of processing power required, and an early switch would've cut their server requirements by 90%. Not sure which would've been cheaper, the C devs or all those extra servers burning electricity to interpret a language during runtime.

EDIT: And that's just an economic incentive. From an ecological perspective, it's borderline criminal for a website as massive as Facebook to be running PHP.

By switching to HipHop for PHP and later HHVM they saved lots of servers. 90% is a bit too much (consider: they went "services" first, where different tasks were done in different languages in independent services) And yes, for the ecological argument that is true (even more ecological it would be to shut down Facebook ... )

However using PHP allowed them to hire lots of staff quickly and adapt to changed requirements quickly. Which is economical valuable. Finding the right time for a switch is simpler in a retrospective though ;-)

(I have no insight to Facebook, but was heavily involved in the PHP project and talked to different Facebook engineers privately and professionally)

For many purposes, it is not, and some of these use cases will be addressed in PHP 8.


Not for "heavy calculation loads."

I built a billing site for my software business and had to decide which tech to use. SPA? React + Rails? Elixir? Keep in mind we are talking about single digit req/min.

I went with CGI. It has some drawbacks but consider the advantages:

  * Requires nothing but Apache running, minimizes attack surface area
  * Deploy is a simple `git pull`, no services to restart
  * No app server running 24/7 so I don't have to monitor memory usage or anything else.
I love it. Takes little to no maintenance because it never changes. Runs on a $5/mo droplet.


I'm not about to dictate to someone that their choices are better nor worse than anybody else's but I don't think you make a particularly convincing argument in favour of CGI:

> Requires nothing but Apache running, minimizes attack surface area

Sure, but the attack surface remaining (CGI) is far less secure than pretty much everything else out there.

> Deploy is a simple `git pull`, no services to restart

Same could be said for a dozen other service side frameworks.

> No app server running 24/7 so I don't have to monitor memory usage or anything else.

I mean, if you exclude the server you're talking about, then sure. But you still have a server running 24/7 that you need to monitor so the framework becomes somewhat irrelevant from that perspective.


Personally I still use CGI for personal projects I hack together. But none of them are directly open to the internet.

The thttpd server is efficient in the extreme, can constrain itself to a chroot(), and is able to execute CGIs.

In that environment, even the largest attack surface will have great difficulty escaping confinement.

Apache httpd 2.4 can be chrooted (maybe 2.2 as well? Or at the very least some distros back ported that feature to 2.2) but it's not enabled by default and can take some set up to get working properly if you're not a seasoned sysadmin.

chroot is definitely not common amongst Apache installations, let alone common amongst CGI usage (which likely comprise of more than just Apache httpd users).

However one blessing is at least the execution directory is limited to /cgi-bin (even if software running inside cgi-bin can fork their own processes outside of that directory).

It's also worth noting that "efficient" is a poor choice of words for CGI - be it in the context of security or it's more common usage in terms of performance. CGI work by forking processes (which is slooooow). In fact CGI is ostensibly a $SHELL for HTTP (if you look at the mechanics of how shells are written vs how CGI works). Someone elsewhere else in this discussion described CGI as a "protocol" and I wouldn't even go that far because all the HTTP headers and such like are passed to the forked process via environmental variables. In fact could literally set the same environmental variables in Bash (for example) and then run your CGI-enabled executable in the command line and get the same output as if you hit httpd first.

But as I said in my first post: I don't hate CGI. Far from it, it's been a great tool over the years and I still use it now for hacking personal stuff together. But it's also not something I'd trust on the open internet any longer. It's one of those cool pieces of tech that simply doesn't make sense on the modern internet any longer (like VRML, Sockwave and Flash, the web 1.0 method of incremental page loads (I forget the name), FTP (that protocol really needs to die!) etc.

CGI security depends on the framework you use after. It's just different from reverse proxy.

True but considering most peoples use of CGI is without a framework (otherwise you might as well just use that framework as the HTTP server) you're then having to place a lot of trust in the developer not to accidentally foot-gun themselves.

It's like the Rust argument. Sure, a skilled developer could write good code in C++ but languages like Rust make it harder to accidentally foot-gun yourself and more obvious in the code when you do happen to do it.

> Sure, but the attack surface remaining (CGI) is far less secure than pretty much everything else out there.

That is completely false. CGI has basically no attack surface. And personally, I trust Perl or C or even bash (which do have an attack surface) more than I trust all these random frameworks.

> I mean, if you exclude the server you're talking about, then sure. But you still have a server running 24/7 that you need to monitor so the framework becomes somewhat irrelevant from that perspective.

Except oftentimes these days you need both an HTTP server and an app server (as mentioned). CGI only needs an HTTP server.

> That is completely false. CGI has basically no attack surface. And personally, I trust Perl or C or even bash (which do have an attack surface) more than I trust all these random frameworks.

While you're technically right that CGI doesn't include Perl/C/Bash, it feels like you're hand-waving somewhat to avoid discussing the real crux of the problem. Having spent a significant amount of the last 30 years writing software in Perl, C and Bash (languages that I genuinely do enjoy coding in too), I honestly don't trust anyone's ability to knock their own framework out as securely as many of the established frameworks already out there.

There's all sorts of hidden traps in those languages themselves, hidden traps in the way the the web behaves as well as bugs you could introduce just through human error.

CGI is fun for hacking stuff together but if you're building anything for public consumption - even if it's only going to be a low hit count - then you have to consider what damage could be done if that machine was compromised (though it's prudent to follow that train of thought regardless of the framework you end up on).

> Except oftentimes these days you need both an HTTP server and an app server (as mentioned). CGI only needs an HTTP server.

Except oftentimes you also don't. And even in the instances where you do, often something like S3 or some CDNs will fill the role (as often it's just static content you need hosting and some CDNs allow you to manually transfer content). Or if a CDN isn't an option and you do need a webserver + app server (eg nginx + php-fpm) then you can run them as two docker containers (for example) on the same server....and even if that isn't an option, it's really not that much more work monitoring 2 servers than it is 1 (if you were talking about a farm of dozens or more then you'd have a point. But then CGI also becomes a clear no-go because of it's performance penalties).

My point is there are a breadth of options out there these days and CGI rarely stacks up well against them.

That is why I use ngx_lua for some (small) projects, on deploy just git pull, reload nginx, and now your code is updated.

I've been meaning to try out the mruby nginx bindings, and maybe rewrite the Lua code to Ruby(ish) code but I haven't had the time yet.

> Deploy is a simple `git pull`

Did you have to do anything to lockdown the .git folder?

Why do you think the file system is accessible at all? Due to possible errors in the script?

It is common for noobs to put index.cgi (or whatever) in the root of the git repo, and to point Apache at that by cutting and pasting a hello world example from google.

A better approach is to put it in a subdirectory, and RTFM of Apache/nginx.

> Requires nothing but Apache running, minimizes attack surface area

Shellshock comes to mind as a counter point.

It's only a concern if the CGI script is written in shell or uses the `system()` function.

CGI scripts don't have to run with web server privileges. Nor should they. They should be set-UID to some other user.

I still use FCGI with Go programs. FCGI launches a service process when there's a request, but keeps it alive for a while, for later requests. It can fire up multiple copies of the service process if there's sufficient demand. If there are no requests for a while, the service processes are told to exit. Until you get big enough to need multiple machines and load balancers, that's enough.

Set-UID (by itself, at least) is not a feature to drop privileges. On Linux, you must also use setresuid to set all ID's to the EUID. And then you must hope that you were able to execute setresuid before any vulnerabilities can be triggered.

You should tell your web server to run CGI processes as a different user, instead (f.e. suexec).

Arguably, fcgi is still enough even after you add load balancer, front end cache, cdn, etc.

Python 3.8 is expected to include PEP 594 "Removing dead batteries from the standard library"... one of the modules schedules to be deprecated is cgi.


As out of date as the module is, reading the PEP made me nostalgic for the days of hammering out a quick CGI script, and I've probably got a few of those scripts still chugging away.

That module isn't actually for the CGI protocol. It dates from a time when "CGI" was a reasonable shorthand for "working with http requests". Mostly what's in it is stuff for handling HTML form values.

The Python standard library's CGI-protocol support is in the wsgiref module, and nobody seems to be suggesting removing it.

Thanks — fixed

It's worth noting that the RHCE (Red Hat Certified Engineer) used to require to at least be able to write a simple CGI script and make it available via Apache.

I haven't checked how this changed in the last RHCE update, but still.

I had the opportunity/necessity to write a couple of CGI script. While it's not comfortable at all, the nice thing is that the HTTP server makes not assumption on what language/runtime you're using. Literally, any out of the box apache will be able to run cgi scripts, no matter what language you used to write them.

If for any reason you cannot install other runtimes, you can still use CGI.

Whether you should, that's a different matter.

I still do a fair amount of CGI; it's just fine for low traffic simple web services. Startup time for the script isn't awesome, to fix that you want FCGI or SCGI or whatever. Apache's default MPM these days is event (or worker) which scales CGI pretty well. Beware that it's not thread safe, so if your CGI script is doing something multithreaded it will break. (Related: if you're doing something multithreaded, it's time to graduate from CGI).

Multiple CGIs under worker and event get their own processes, they don't run in the same process as Apache nor eachother

Author is speaking past tense, showing lack of knowledge. CGI programs are in use, nothing "passed". All major web servers have CGI support, and it has so much use in web applications. It is standard protocol that is not going to go away.

CGI is still "widely" (I don't know how to quantify) used in web interfaces for IoT devices that did not drink the Lua kool-aid. That's a lot of CGI out there. Not dead yet.

One of my most popular sites was CGI until 2014. Now it's just static files, rendered from the same cgi program. And the reason it changed wasn't related to performance.

I cut my web-teeth on CGI. I was a TA instructing physics undergrads in C (I think around 2007). I only really knew C and python myself, and was tasked with building a page which students could upload programs and results to. It was the perfect abstraction given the tools I had and my primitive understanding HTTP at the time.

It took a few yesrs to realise I was hooked, and while I don't use CGI nowadays I'm really glad I started out with it.

I'd like to point out that CGI (and its descendents) can be used with any language, not just C (e.g. there are various false dichotomies in these comments between PHP vs CGI+C).

There are also many fast, compiled languages which, unlike C, are memory-safe, make string handling easier, are higher-level, have stronger types, etc. In particular, ML-family languages like Rust/Ocaml/ReasonML/StandardML/Haskell are really good as meta-languages for safely generating other languages like HTML (that's what the "ML" stands for ;) )

It would be a shame if an inexperienced developer got the impression that their pages would load faster if they taught themselves C. Yes, it's possible to write reasonably-safe C; no, an Internet-facing CGI application isn't a good idea for a first C project.

Anyone else noticed that the page is constantly doing some requests to log the users behavior, e.g., on scroll? Also some of these requests (always?) fail.

Was annoying as the page did some jumps on these requests and the icon indicated the loading activity. If you have to do - do it in the background.

This weirdness also polluted my history with the page at different scroll intervals. I had to click back about 20 times to get back to HN because ever scroll action was in my history.

> since the server was directly executing the script, security issues could easily creep in (the script shares the permissions of the HTTP server).

This is still the case. PHP often runs as www-data, Rails or Django or Node or whatever often run as a normal user (usual guess, Ubuntu user id 1000) with read/write access to all the files in that user home directory. Running in a container gives some isolation now.

Anyway, writing my first CGI script in C back in 1994 was quite a hell (not a very convenient language for string processing), then Perl and CGI.pm got the upper hand for a while.

What were cgi script? Not sure that people have stopped using them. Old things don't always need replacing; some times, they still work fine (though you have to watch your security).

How CGI scripts differ from Amazon Lambda and other serverless solutions?

Not really at all. When you have been in the industry a few decades you see the same things coming around every few years, just with new jargon. If you recognise it and can embrace it you might be able to leverage it!

For CGI you have to maintain a server running Apache or some other web server software and persistent storage for a copy of your script along with a means of deploying them. You are limited to concurrency equal to the number of cores/threads and memory on your server or you have to manage some means of load balancing between multiple servers which you maintain along with deploying your script to each of them.

Certainly you could run a cluster of servers or even an auto-scale group of servers with all of your cgi programs and handle lots of http requests. Serverless/Lambda means you don't have to do this.

Also, not all lambda's are handling http requests, so you'll need to figure out the best way to deal with other events and monitor queues, etc.

The real difference is amazon is doing it for you, not that it isn't happening.

I believe i framed my entire response from the perspective of how it is different for you. That is the whole point of lambda. You do less.

Acting like CGI is dead was the reason I (feel like I) wasted three years with PHP. It's not, and if the thought of "programming" webpages in C/C++ sounds appealing to you, you should definitely check it out.

The obvious advantages: Total (binary) control of data streams and structures and execution as well as (nigh-)zero runtime overhead.

The obvious disadvantages: Static and rigid, not well suited for rapidly changing requirements (unless they are coded in), longer development times and increased complexity.

Isn't FastCGI (ok, I guess it's not really true CGI) still the canonical way to set up PHP with a webserver?

FastCGI is a binary (vs text) + enhancement version of the CGI protocol so it is less expensive to parse. CGI (the protocol) doesn't specify the execute-per-request, it's just a convention.

In the wild you will find php-cgi, php-cgid, mod_php, and php-fpm (fastcgi)

Find the path to Pearl and the magic will be guaranteed. I installed a lot of Perl cgi scripts.

Used to do mine in Perl... a tangled web of Perl scripts. They worked at the time!

Remember Lincoln Stein and CGI.pm?


I wrote a Perl module (Smil.pm: https://metacpan.org/pod/Smil) that mimics his, wrote him about it, and it was thrilling when he responded.

Remember mod_perl? is it still supported? Built a large ecommerce site in 2002 using embperl+mod_perl. Unfortunately, it was time for a re-write in 2017 and a new platform had to be selected since support is virtually non-existent.

This makes me ask the question: How much knowledge are we going to forget in the next 50 years...


We have forgotten so much. But our mental velocity has been so vast in the last ~100 years... we are going to foget key knowledge soon....

I wonder -- are there benchmarks out there of fastcgi vs reverse proxy via http server for various languages? I'd expect that fastcgi would be faster, but everyone uses reverse proxying nowadays.

What would be the main benefits of today's serverless against classic CGI? I can think of auto scaling, but I'm not sure it wasn't possible (or even used?) in the past.

What’s old is new again! Many of the benefits (and drawbacks) live on in webasm and serverless execution.


CGI is still the best choice for this. It's easier to guarantee your server's Perl installation has all the latest security patches than it is to guarantee your users' browsers all do. AJAXy nonsense is mostly too error-prone, too insecure, and too slow.

I miss the days when I had shell on a unix box and could just drop a .pl file into my ~public_web directory and have random people on the internet run the script with the permission of the webserver user. Back before there was any concern with doing such a thing.

crap I feel old now


There, I fixed it for you.

I wonder if we've reached the point where many of the people who would have been using CGI scripts (if they had not been succeeded by newer technologies to provide dynamic functionality) no longer know what it is.

I am pretty sure we reached this point quite a few years ago. From what I know, most people learning web programming nowadays start with Rails, Express JS or whatever framework of the day they have been recommended, and use the integrated HTTP server for development, I would assume, often without even really worrying about what is actually happening. CGI scripts are definitely not on the list of things to learn there.

CGI scripts sound like a very simple idea when described here, but I avoided learning about them at the time because the name was so arcane. Is there a good reason why these weren’t simply called “executable pages”, or “response programs”, etc.?

Because Perl scripts were only one application of CGI, which was intended to be a generic interface between the web and any sort of backend. The thing that became Oracle Web Application Server for example started life as a direct interface between a web server and an Oracle database called WOW for Web-Oracle-Web. There were others. Pretty soon the “application server” took over that role as a thing in its own right, and instead of a Perl script directly invoked by the web server you would have a JSP or something.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact