Wonder what the next generation will call it?
And yes, I am being sarcastic. There are differences. But do recall that when AWS Lambda came out, it had the exact same limitations w.r.t. one process per call, needing one fresh connection to a DB per request to handle, etc.
If I had to pick one thing to represent the CGI era, it would be Matt's FormMail .
Lucky you. At a previous workplace I had to cater with actual binaries produced from C source code that contained strings that contained script tags that contained JS code that generated HTML. I’ll let you sink that in and process the escaping.
Apparently it never occurred to some people that they could read those from disk, even from the C program.
PHP was using CGI like Perl. And then PHP had Apache's mod_php, which I believe is still widely used today.
Perl, of course, had mod_perl. And later, PSGI/Plack followed by Catalyst, Mojolicious, Dancer, etc.
Rumor had it the first version was written in Perl.
Especially when there won't framework already in use on this particular server.
Why add <framework of the day> to your setup when every 10 minutes or so, it's time to read the temperature sensor and return a string with the reading.
Also, since it's served by a transient process, there is no problem of memory leaks, file handle leaks, other resource capture by long running processes.
Per Murphy's Law, of course, everything is fine until one day the internal app goes public and suddenly there are 500 queries a second.
By which you mean, "Almost every website in the Universe which isn't actively being DDoSed" with the understanding that essentially nobody can stand up to a modern, competent DDoS.
> Per Murphy's Law, of course, everything is fine until one day the internal app goes public and suddenly there are 500 queries a second.
Omae wa mou shindeiru. You are already dead.
And for the most part, CGI went on the decline the day mod_perl came out. Which was 1996. So we really didn't use basic CGI for that long. The popularity of putting Perl right in Apache eclipsed running standalone Perl via CGI. It would be at least another decade before the world started to move on from this arrangement (mod_php or mod_perl).
Each CGI request has to create a new process, that can easily mean entire seconds of CPU per request.
A few sensors sending information from time to time will quickly turn into a massive resource hog. Having used CGI extensively, one of the biggest caveats is resource exhaustion and the ensuing DDoS.
There are situations like that.
I have seen CGIs where usage started very slow and then gre t the point of friendly DDoS, and it's not fun because the CGI is usually built deep into the architecture by that point.
Also some sensors read very fast, like in Linux "sensors -f" returns the CPU temperatures.
came to mine
Looking at RFC 3875 https://tools.ietf.org/html/rfc3875 - you see things like PATH_INFO, PATH_TRANSLATED, QUERY_STRING...
Pull up Java HttpServletRequest, https://javaee.github.io/javaee-spec/javadocs/javax/servlet/... and there is getPathInfo(), getPathTranslated(), and getQueryString() along with many of the other parameters that would be familiar to someone writing a CGI.
You can find them in C# - https://docs.microsoft.com/en-us/dotnet/api/system.web.httpr... - PathInfo, QueryString and the like.
You can find them in Haskell - https://hackage.haskell.org/package/happstack-server-22.214.171.124... Route by pathInfo and the QUERY_STRING is clear in https://hackage.haskell.org/package/happstack-server-126.96.36.199...
CGI scripts aren't dead... they just got better plumbing.
From the RFC linked in the OP: https://tools.ietf.org/html/rfc3875#page-23
CGI defined an interface between a web server and an executable that would provide a response.
- Request meta-data i.e. path, query string, and other headers passed as environment variables.
- Request body passed via stdin
- Response header and body passed via stdout
In this way, a webserver like Apache could provide a platform for a wide array of languages. Yes there were security and scaling concerns, but it also was an opportunity to rapidly release and iterate on a product.
Today, a web framework will do this grunt work for you. Which honestly, if you’re handling passwords, is in many ways a good thing. But people are less likely to learn the basics of how their app talks to a browser.
That's kind of what I like Golang webdev with the stdlib only, it's sort of similar in that way.
Do they need to? Do you yourself understand every miniscule detail of radio wave propagation before making a cell call to your friend?
The problem with almost every abstraction built on HTTP, is that it's leaky. There are a lot of details about how your application works that are only explained (and understandable) in the context of HTTP as a protocol. This can be fine in and of itself, if you understand that this is the case, and learn about HTTP.
I have quite often ended up in frustration for hours trying to explain to somebody how to approach a problem in their application, because they refused to think outside of the framework they were using, and didn't want to "waste time" (their words) learning HTTP. These are the same people who you see getting stuck on the same problem for months.
Making a call is different because it is not a leaky abstraction. You genuinely do not need to know anything about radio waves to do it or to understand how calls work conceptually; it's an implementation detail, the only relevant part of which is the signal strength, which itself has a dedicated and well-understood abstraction that's separate from the underlying technology (namely, signal bars).
The chances of a 'perfect abstraction' ever existing over HTTP are tiny at best. As we're talking about tools for developers here (ie. the people building things, not end users), any sufficiently capable general-purpose abstraction would approximate or even exceed the complexity of HTTP itself.
(Incidentally, this exact same problem is why general-purpose CMSes are invariably chaotic piles of badly-interoperable plugins, and why it is more difficult to build a genuinely good application with a general-purpose CMS, than without one. In practice, virtually everybody either drops some of their business requirements to stay within the CMS' native capabilities, or just piles hacks upon hacks.)
A more accurate comparison would be to compare web applications and PBX setups. Sure, PBX systems are an abstraction, and will paper over a lot of technicalities of the underlying protocol - but if you want to build a robust PBX setup with arbitrary capabilities, you will need to learn about telecom protocols and radio waves at some point, at the very least to understand the possible failure modes.
TL;DR: You can ignore low-level details if you're using a non-leaky abstraction. You need to understand them if you're using a leaky abstraction. Web frameworks are, almost without exception, leaky abstractions (by design).
Edit: And this can actually be further generalized as "any abstraction that is not very limited in scope, will be leaky".
I mean, it's probably possible to configure a server to escape the original request headers as ‘HTTP-<Header-Name>: value’ and add custom ones on top, but I haven't seen it done, and frameworks depend on the headers being there intact.
Since the CGI "protocol variables" are actually environment variables, it creates a namespace collision and an injection opportunity for environment variables beginning with "HTTP_".
Also, once in a while I begin thinking that PHP is a pretty nice language, certainly doing its job and very performant compared to Python or Ruby, even if PHP code resembles Java more and more. And I forget the numerous questionable semantic-breaking decisions. But then, bam:
> Warning: if PHP is running in a SAPI such as Fast CGI, this function will always return the value of an environment variable set by the SAPI, even if putenv() has been used to set a local environment variable of the same name. Use the local_only parameter to return the value of locally-set environment variables.
I still use cgi scripts. Though these aren't "scripts", rather compiled binaries written in C.
That made some of the pages with heaviest calculations, load in under a second, which were taking more than 10 seconds in PHP.
The author mentions CGI overhead. Well, that's relative, please read this: http://z505.com/cgi-bin/qkcont/qkcont.cgi?p=Myths%20About%20...
One important side effect of CGI is the fact that if one route crashes (in a REST api for example) the CGI finishes its execution and nobody is harmed. On the other side, with server daemons, one bug can kill the whole app.
Does anyone have any links to a nice FCGI library? The official site went down ages ago AFAIK, and whatever I have isn't working out of the box, at least not on Windows...
(Yes, I have a Linux server and I will deploy the FCGI on it too, but I develop on a Windows machine...)
Essentially, for the same task, PHP could never be as fast as a binary, because of the overhead of processing the language itself.
Creating a binary in place of PHP code is essentially reducing the calculatory load of a webpage to a minimum during runtime.
Mind also: With a CGI process for each request the process has to be created, program loaded, runtime linked and executed. With any semi-modern PHP setup the PHP process is running already (be it mod_php, fastcgi, fpm, ...) and the "compiled" version of the script is in memory ready to be executed (opcode caching) thus PHP can win in startup time.
If you then out the heavy calculation in a C extension module you get simple templating around it in quite fast.
Future versions of PHP might also receive some form of a JIT (experiments are well progressing) an ideal JIT might even beat a C program (as it can compile to the specific CPU and can analyse and optimize the hotpath; see also hotspot and similar Java runtimes) PHP won't be there (as the jit has to call too often into PHP runtime) but there's room for future iterations.
Yeah, that's CGI though, FCGI alleviates this by persistently running a single process/service/application that listens to the port and acts according to the received HTTP headers.
Whatever PHP can optimize can thus be done directly with FCGI as well. You can load everything the webpage needs into memory and just pipe that through to the user, and it would only need to read disk once at startup (or if you update the sources, but that's another issue).
The downside is of course the increased development time and complexity you'd have to invest...
EDIT2: Another downside specific to FCGI is that it is OS-dependent, because it needs to interact with the host in order to listen to a port, something that doesn't affect normal CGI because all you need there is stdout.
Anyways comparison by GP was between a custom CGI program and PHP and not other things.
On Facebook: I'm willing to bet a switch to binary from PHP is one magnitude less of processing power required, and an early switch would've cut their server requirements by 90%. Not sure which would've been cheaper, the C devs or all those extra servers burning electricity to interpret a language during runtime.
EDIT: And that's just an economic incentive. From an ecological perspective, it's borderline criminal for a website as massive as Facebook to be running PHP.
However using PHP allowed them to hire lots of staff quickly and adapt to changed requirements quickly. Which is economical valuable. Finding the right time for a switch is simpler in a retrospective though ;-)
(I have no insight to Facebook, but was heavily involved in the PHP project and talked to different Facebook engineers privately and professionally)
I went with CGI. It has some drawbacks but consider the advantages:
* Requires nothing but Apache running, minimizes attack surface area
* Deploy is a simple `git pull`, no services to restart
* No app server running 24/7 so I don't have to monitor memory usage or anything else.
> Requires nothing but Apache running, minimizes attack surface area
Sure, but the attack surface remaining (CGI) is far less secure than pretty much everything else out there.
> Deploy is a simple `git pull`, no services to restart
Same could be said for a dozen other service side frameworks.
> No app server running 24/7 so I don't have to monitor memory usage or anything else.
I mean, if you exclude the server you're talking about, then sure. But you still have a server running 24/7 that you need to monitor so the framework becomes somewhat irrelevant from that perspective.
Personally I still use CGI for personal projects I hack together. But none of them are directly open to the internet.
In that environment, even the largest attack surface will have great difficulty escaping confinement.
chroot is definitely not common amongst Apache installations, let alone common amongst CGI usage (which likely comprise of more than just Apache httpd users).
However one blessing is at least the execution directory is limited to /cgi-bin (even if software running inside cgi-bin can fork their own processes outside of that directory).
It's also worth noting that "efficient" is a poor choice of words for CGI - be it in the context of security or it's more common usage in terms of performance. CGI work by forking processes (which is slooooow). In fact CGI is ostensibly a $SHELL for HTTP (if you look at the mechanics of how shells are written vs how CGI works). Someone elsewhere else in this discussion described CGI as a "protocol" and I wouldn't even go that far because all the HTTP headers and such like are passed to the forked process via environmental variables. In fact could literally set the same environmental variables in Bash (for example) and then run your CGI-enabled executable in the command line and get the same output as if you hit httpd first.
But as I said in my first post: I don't hate CGI. Far from it, it's been a great tool over the years and I still use it now for hacking personal stuff together. But it's also not something I'd trust on the open internet any longer. It's one of those cool pieces of tech that simply doesn't make sense on the modern internet any longer (like VRML, Sockwave and Flash, the web 1.0 method of incremental page loads (I forget the name), FTP (that protocol really needs to die!) etc.
It's like the Rust argument. Sure, a skilled developer could write good code in C++ but languages like Rust make it harder to accidentally foot-gun yourself and more obvious in the code when you do happen to do it.
That is completely false. CGI has basically no attack surface. And personally, I trust Perl or C or even bash (which do have an attack surface) more than I trust all these random frameworks.
> I mean, if you exclude the server you're talking about, then sure. But you still have a server running 24/7 that you need to monitor so the framework becomes somewhat irrelevant from that perspective.
Except oftentimes these days you need both an HTTP server and an app server (as mentioned). CGI only needs an HTTP server.
While you're technically right that CGI doesn't include Perl/C/Bash, it feels like you're hand-waving somewhat to avoid discussing the real crux of the problem. Having spent a significant amount of the last 30 years writing software in Perl, C and Bash (languages that I genuinely do enjoy coding in too), I honestly don't trust anyone's ability to knock their own framework out as securely as many of the established frameworks already out there.
There's all sorts of hidden traps in those languages themselves, hidden traps in the way the the web behaves as well as bugs you could introduce just through human error.
CGI is fun for hacking stuff together but if you're building anything for public consumption - even if it's only going to be a low hit count - then you have to consider what damage could be done if that machine was compromised (though it's prudent to follow that train of thought regardless of the framework you end up on).
> Except oftentimes these days you need both an HTTP server and an app server (as mentioned). CGI only needs an HTTP server.
Except oftentimes you also don't. And even in the instances where you do, often something like S3 or some CDNs will fill the role (as often it's just static content you need hosting and some CDNs allow you to manually transfer content). Or if a CDN isn't an option and you do need a webserver + app server (eg nginx + php-fpm) then you can run them as two docker containers (for example) on the same server....and even if that isn't an option, it's really not that much more work monitoring 2 servers than it is 1 (if you were talking about a farm of dozens or more then you'd have a point. But then CGI also becomes a clear no-go because of it's performance penalties).
My point is there are a breadth of options out there these days and CGI rarely stacks up well against them.
I've been meaning to try out the mruby nginx bindings, and maybe rewrite the Lua code to Ruby(ish) code but I haven't had the time yet.
Did you have to do anything to lockdown the .git folder?
A better approach is to put it in a subdirectory, and RTFM of Apache/nginx.
Shellshock comes to mind as a counter point.
I still use FCGI with Go programs. FCGI launches a service process when there's a request, but keeps it alive for a while, for later requests. It can fire up multiple copies of the service process if there's sufficient demand. If there are no requests for a while, the service processes are told to exit. Until you get big enough to need multiple machines and load balancers, that's enough.
You should tell your web server to run CGI processes as a different user, instead (f.e. suexec).
As out of date as the module is, reading the PEP made me nostalgic for the days of hammering out a quick CGI script, and I've probably got a few of those scripts still chugging away.
The Python standard library's CGI-protocol support is in the wsgiref module, and nobody seems to be suggesting removing it.
I haven't checked how this changed in the last RHCE update, but still.
I had the opportunity/necessity to write a couple of CGI script. While it's not comfortable at all, the nice thing is that the HTTP server makes not assumption on what language/runtime you're using. Literally, any out of the box apache will be able to run cgi scripts, no matter what language you used to write them.
If for any reason you cannot install other runtimes, you can still use CGI.
Whether you should, that's a different matter.
It took a few yesrs to realise I was hooked, and while I don't use CGI nowadays I'm really glad I started out with it.
There are also many fast, compiled languages which, unlike C, are memory-safe, make string handling easier, are higher-level, have stronger types, etc. In particular, ML-family languages like Rust/Ocaml/ReasonML/StandardML/Haskell are really good as meta-languages for safely generating other languages like HTML (that's what the "ML" stands for ;) )
It would be a shame if an inexperienced developer got the impression that their pages would load faster if they taught themselves C. Yes, it's possible to write reasonably-safe C; no, an Internet-facing CGI application isn't a good idea for a first C project.
Was annoying as the page did some jumps on these requests and the icon indicated the loading activity. If you have to do - do it in the background.
This is still the case. PHP often runs as www-data, Rails or Django or Node or whatever often run as a normal user (usual guess, Ubuntu user id 1000) with read/write access to all the files in that user home directory. Running in a container gives some isolation now.
Anyway, writing my first CGI script in C back in 1994 was quite a hell (not a very convenient language for string processing), then Perl and CGI.pm got the upper hand for a while.
Certainly you could run a cluster of servers or even an auto-scale group of servers with all of your cgi programs and handle lots of http requests. Serverless/Lambda means you don't have to do this.
Also, not all lambda's are handling http requests, so you'll need to figure out the best way to deal with other events and monitor queues, etc.
The obvious advantages: Total (binary) control of data streams and structures and execution as well as (nigh-)zero runtime overhead.
The obvious disadvantages: Static and rigid, not well suited for rapidly changing requirements (unless they are coded in), longer development times and increased complexity.
In the wild you will find php-cgi, php-cgid, mod_php, and php-fpm (fastcgi)
I wrote a Perl module (Smil.pm: https://metacpan.org/pod/Smil) that mimics his, wrote him about it, and it was thrilling when he responded.
We have forgotten so much. But our mental velocity has been so vast in the last ~100 years... we are going to foget key knowledge soon....
CGI is still the best choice for this. It's easier to guarantee your server's Perl installation has all the latest security patches than it is to guarantee your users' browsers all do. AJAXy nonsense is mostly too error-prone, too insecure, and too slow.
There, I fixed it for you.