Recently I enjoyed some "retrofuturistic" development with WASM and CGI. Spin[1], a webserver written in Rust, can execute WASI[2] binaries that speak CGI. You can then deploy it to Fermyon Cloud or your own server and it "just works". It's a wonderful mix of old and new. I used it for PHP (Prolog Home Page): https://github.com/guregu/php
[2]: WASI is a POSIX-ish standard for WASM that gives you all the low level stuff like standard input and output. It includes all the bits and pieces needed for CGI to work.
Kind of feels like the AssemblyScript guys were right - WASI is becoming a standard for WASM modules that foists Unix on Webassembly, including stuff that isn't really portable like symlinks.
That makes sense if you want to compile existing software to WASM but that isn't the case here is it?
This does use existing software. The onus behind this project was to test my port of Trealla Prolog (which was written by someone else) to WASM. Without a Unixy standard environment it would have been a lot harder. In theory, WASI should enable many CGI-like usages of pre-existing software, but in my experience the quirks between the WASI implementations push you towards using nonstandard things anyway. For example, trealla-js is mostly WASI but uses a couple extra imported functions to coordinate calling into JS. If I wanted to let PHP make outgoing requests I'd have to implement Spin's custom nonstandard components. I hope the WASM Components proposal makes this easier in the future. Personally, I might have preferred a more Plan 9-style approach that uses special files to do things like make outgoing connections, purely because it would be trivial to implement client-side as a simple printf vs. integrating wit-bindgen into the build system and all of that. However, I can see many benefits of the strongly typed component model as well.
I don't have a strong opinion on the AssemblyScript controversy[1]. I agree that WASI is imperfect, especially in browsers, but I'm also glad that _something_ exists with not-horrible runtime support. WASI was pretty nice to use as someone who just wanted to get a C program running on WASM.
Wasm allows you to consume as much or as little of the API surface area as you want. You are free to construct Wasm modules in whatever way you see fit for your application.
CGI was abandoned due to performance problems, but it is surprising scalable on modern OSs, the kernel does a good job of caching the code so it's not really a full round trip to the file system and cold start on each request. Obviously you still have the overhead of an interpreter for many languages, but they also tend to do a fairly good job of caching their byte code. I absolutely wouldn't recommend it for new systems, but for a quick and dirty script it's fine.
The minute you want to do url routing and have middleware, a persistent process makes everything much simpler and keeps everything in one place. I much prefer routing inside a framework than using Mod_Rewrite.
> a persistent process makes everything much simpler ...
I started with PHP and never trusted persistent processes for anything.
What if your persistent process hangs or crashes for whatever reason?
Is your whole site down then, not just the part with the bug?
The only site I ever made in persistent process paradigm was in ASP.NET and it didn't cause too many problems but it was more of an app, not website, with limited number of logged in users and pretty much no anonymous/guest part.
> I started with PHP and never trusted persistent processes for anything.
Well it's (maybe was, it did get a bit nicer over years but old shit code is still there) terrible language so no wonder you have no trust.
> What if your persistent process hangs or crashes for whatever reason?
depend on language. For example in Go (in most frameworks at least) panic() in goroutine handling request will just... nuke that request and nothing else. Sure, memory leaks can be issue if the code is terrible but essentially restarting app after every request is just shitty workaround for shoddy code.
That's how most other handle code. Other, that do multi-core processing badly like Ruby, just spawn multiples of server so it isn't really that different from PHP model, you get "supervisor" that spawns few processes of app and feed them requests, if one dies it just gets restarted.
> Well it's/twas terrible language so no wonder you have no trust
The reason someone who was shaped by working in PHP might not trust long-running processes probably has less to do with any specific shortcomings of it as a language and more to do with the fact that overwhelming execution model is single-request shared-nothing.
This conceptual model works well enough (especially for the web) there's entire cloud services built around recreating it for other languages called "serverless," and they do well because it's a useful simplification which means the issues associated with persistent processes become somebody else's problem.
Devs being what we are, of course there's always those of us who prefer certain things to remain our problem. This is fortunate as someone needs to assume problems like long-running processes in order for others to make it somebody else's problem. Software development has some things in common with comparative advantage.
> What if your persistent process hangs or crashes for whatever reason?
You've got the same problem with your web server, and your operating system kernel. Simply write a program that can't hang or crash – at least, not due to an application bug. It takes attention to detail, but this is a lot easier than writing a program that always behaves correctly; if you limit the syscalls you make, use timeouts appropriately, and don't segfault, it's often achievable.
I don't think any mainstream framework in any mainstream language has these kinds of precautions.
Rails? Django? Express? .NET? Do they kill long running requests so they don't affect whole application?
Can the application still mostly work despite syntax error in one of the files?
Another thing is level of trust. I trust Linux kernel, I trust apache webserver. But some webserver written in scripting language few years ago? Not so much.
Well try in a nodejs server to put a `while(true);` in a request handler. Do you know the answer? The whole process blocks, since nodejs is single process.
I trust more the code that I wrote than the code written from others... not because I'm better than others, but because I know what it does, and I know how to fix it if it breaks.
Fault tolerance was how Next branded it, IIRC. If an instance of a long running process doesn’t communicate with an external monitor, then it’s assumed to have hung or crashed at which point it is restarted by the external tool. Or at least that’s how things were done in the early/mid 90s. Like techniques are employed by Node apps these days, and undoubtedly many others.
My point really was when having url routing, such as ids/slugs in the url, it is much easier to implement in your application code than in the web server configuration. It keeps application code in one place.
Caddy doesn't support CGI out of the box. You need to use a module (which requires a custom build of it and the hope the module author continues to maintain it), or you need to run something like fcgiwrap to convert FCGI (what Caddy does support) to CGI (and now you need a process manager to make sure your fcgiwrap and caddy processes are kept running, restarted on failure, etc).
I'm having fun with django + htmx and going to deploy on elastic beanstalk.
It hits that sweetspot of speed of development and responsiveness. You also have the benefit of staying in one language.
Django has so much out of the box that you don't have to decide for yourself after the fact. So much accumulated knowledge of what works too.
Sure go ahead rewrite it if it starts making money and you can afford development time/salary. But these older technologies are easier to fit inside one head and let you get enough speed to take off.
This is also a very sane combination. HTMX is great and Django is also very impressive :) We need tools that work for us not us working to make them work.
Wasn’t that what Microsoft used to give us back in the days of Visual Studio, .NET and especially LightSwitch? Amazing technology, more than 10 years later we don’t have anything coming close to it for developer productivity.
I wrote Eksi Sozluk, the most popular Turkish social platform to date, back in 1999 using Delphi as CGI executables[1]. Delphi was the tool that I knew best, and CGI worked for me at the beginning. I almost immediately started having problems with EXE files being constantly locked, making updates impossible (as it was Windows based), which required restarting the server for any kind of update.
CGI executables also caused problems when I switched to an Alpha AXP RISC server which emulated x86, bringing the performance to a crawl. That made me switch to classic ASP, and about 10 years later, to ASP.NET MVC with routing, unit tests, abstractions, jQuery, all the shiny things at the time.
Web didn't stop there of course; there came SPA's, React, Vue and whatnot.
Now, seeing the yearning for CGI in 2023 feels funny. Have we come full circle? :)
> Zero external configuration, other than telling your webserver to enable CGI on your file
This is only true if you've done it before, and know what you're doing. In reality, it looks like a mess of `mod_cgi` configuration, trying different combinations of file permissions, finding the magic `cgi_bin` directory, finding the right obscure log files when there are inevitably errors, wrestling with CORS and other subtleties of HTTP headers, and other complexities that are only easy to navigate if you're already an experienced CGI user.
That being said, I love the philosophy of using CGI for scripts. Instead of using CGI itself, though, I wrote a (single-file, statically-linked) web server called "QuickServ" to bring this philosophy into the twenty-first century. It has all of the upside of CGI, but is much easier to set up and run, especially for beginners.
One of its benefits is that it automatically parses submitted HTTP forms, and converts the form fields to command line arguments. That means it's extremely easy to put existing CLIs on the web with minimal changes other than writing an HTML form front-end.
If you like CGI, I will (shamelessly) ask that you check it out!
I thought about that for few of the personal stuff but in the end I just opted to have a binary that's just http router + abstracted away way to handle requests. So when I need new feature I just add a module and deploy whole (that includes static files) blob. No need to create new service, add it to loadbalancer etc.
I do contract work for a company that has it's pretty customer facing site as a mix of angular/react/react native and .net core apis and the less pretty internal staff site which is vanilla html, js and c++ cgi apis, so I get to work in both on the same day. The internal c++ site has been running for ~25 years. The customer site seems to need to be re-written every time we add a new feature, because all the tech is out of support and there are breaking changes to libraries
I started writing my dissertation project in Perl CGI back in like 2001 or whatever it was. I moved to PHP pretty quickly when I realised I could do the same things in 5 lines that took 70 in Perl. I'd been making websites for about 6 years at that point and PHP just absolutely changed my life. It was, arguably, exactly like this post talks about CGI - put some script in a file, FTP to your host and you're good to go. Web development was simple and exciting - you could get a long way very quickly.
I was talking to a friend the other day who wants to learn some web dev for fun - he had no clue where to start because there's now SO much, so many 'best practice' posts, so much 'you must learn like this'. I know the world changes (and often for the better), but it was a magic time when you could get so far so quickly.
As an aside, it's why I have huge amounts of respect for people like Pieter Levels and his 'just get it done' approach to building things.
To me PHP is still a great tool to do these sort of stuff. Not to do a whole application, it was not what it was designed for, but for simple stuff it's great!
And it's improved a lot PHP, nowadays it's not that bad language...
For web apps, PHP is still a great candidate. Check out Laravel or Symfony. It's not like the old days of PHP files mixed with procedural code. Although those were good days :)
I don't look at them for one reason, to me the great thing of PHP is the fact that is super easy and fast to add some dynamic generation to a mostly static website. By using a framework I can as well go to Next.js, for example.
The thing is that PHP is unbeatable for doing simple stuff, for example a simple contact form in a website, or a page where you have to book slots, and similar stuff like that, since you add a few PHP instruction to the site, upload it to a webserver, enable PHP that is super easy to install, and done. Need to do a modification? You can connect through SSH to the server and edit the PHP files in /var/www with vim, no need to compile, have a build system, restart services, nothing like that!
A lot of that goes away if you avoid Javascript/Node as your first ecosystem. The unnecessary complexity and cargo-culting there is off the charts. It's overwhelming even for the experienced engineers.
I got so jaded dealing with this JS nonsense just to get a simple front-end going for my pet project that I just said "screw it" and started using straight up ES6 in the browser. No libraries, no transpilers, no minifiers.
Perhaps we should start making tutorials that follow that idea from PHP: start with HTML, add an extension (e.g. .eex, .erb) and make it dynamic. This would lower the barrier for beginners.
By the way, the best HTML tutorial I found so far is Scrimba. The idea of having interactive videos where you can pause and edit the code is amazing.
I wrote Perl cgi scripts for a self-serve customer admin interface for an ISP in the early 2000s. As users grew, we started hitting the limits of what 1-2G ram could do. Thank goodness for mod_perl and HTML::Mason, which could REALLY speed up performance. I remain a huge fan of Perl these days.
I used Perl's CGI::Application to transform an Excel spreadsheet of teachers, used for fielding telephone enquiries, into a PostgreSQL-backed web service used by 33 household brands. Everything ran on Apache CGI without a hitch for 15 years.
I built a bunch of "contact us" pages for folks over the years and always used python with the built in CGI library. The python folks thought so little of CGI that they ripped it out of the standard library. I wish I had just used Perl like god had intended CGI scripts to be written.
On that note, I recently revived some nearly 20 year old perl CGI stuff. Had some minor syntax issues (language ambiguities which got fixed), but it was running just fine after having fixed these. There was also some C code with it (some tooling), needed header include fixing + function renaming, otherwise still worked. Small project all in all, not reliant on a lot of external things.
Recently I built a CGI script using Perl for my own pleasure. It was quite easy indeed, just a case of importing Template Toolkit, GD and CGI.pm then printing the result to stdout. Since I'm only using CGI.pm to send the HTTP header it isn't really necessary.
I always felt that the best software should be designed like an onion. Very easy to get started (no need to peel too much to get to the outer layer), but there should always be the possibility to go deeper and peel more layers. Easy to get started is not great (as a general purpose tool) if there is no decent upgrade path.
What replaced CGI scripts is slightly more complex, but also more naturally grows to other use-cases.
CGI was also a security nightmare full of footguns. Not that modern software isn't, but CGI was especially bad.
- Take untrusted input and pass it directly to shell scripts on your filesystem. What could go wrong?
- Executables in cgi-bin were nearly always included in your document root on shared hosting servers.
- File drops and web shells were rampant
- A trivial misconfiguration (not setting +x?) could result in the web server just serving up the source code of your scripts directly to the client. There's your database credentials hard-coded right there at the top of your file now exposed for all to see.
- You had to pay very close attention to resource limits on the server itself to prevent malicious things like fork-bombs
- Apache was a nightmare to configure securely for CGI
You didn’t have to hard-code database credentials into your app, though. That’s a design flaw, not a requirement of CGI.
Who passes non-sanitized or otherwise untrusted input to executables, any more than is necessary? Why do you (seem to) believe modern approaches to dynamism are immune when the bulk of user input still comes from HTML forms? How do modern web applications avoid input from untrusted sources?
Same applies to the directory layout of servers they were hosted on. That’s more a lack of server admin competence than a fault of CGI. Was far easier than configuring Tomcat to safely host Java web apps. I recall it being far easier to configure than WebObjects, PHP, Rails, etc.
CGI programs were far from the only form of web application effected by local security implementations. In my experience common gateway interface applications consist of fewer files than something written against Spring, WebObjects, or 99% of server side web app frameworks. Fewer files means less system configuration and less attack surface, no?
As to forking leading to DDOS, that was something that could happen to any forking server, which was pretty much all servers when CGI first came into existence. FastCGI helped address that issue.
In 2023 I don't think CGI is as easy as it was back in the day. The major proxies/servers most people use like nginx only support FCGI which isn't directly compatible with CGI--you need to run helpers like fcgiwrap or other new layers of complexity. The only major servers that still support plain old CGI are apache and lighttpd. IMHO properly configuring apache is a mess.
I think what people really want when they pine for the days of CGI is simple file-based routing for dynamic content. Look around and you can see new frameworks are coming back around to this... or consider plain old PHP.
CGI is slow [ spawns an OS process for every request ]. But FCGI is as fast as any other solution, like compiling the whole app in a single binary and running it as a HTTP server.
CGI is _computationally_ slow, but in practice it's faster than it needs to be for most sites. Some examples of CGI-powered sites i'm aware of include:
Many people confuse "X is slower than Y" with "X is slow in some absolute sense". Starting a new process takes well under a millisecond on most modern machines, so that would be completely lost in the latency of most web requests. The real benefits of persistent processes are in getting to reuse database connections and the like.
YMMV, but thats why i love Forth. it compiles the full app (not interpret) in ms. some think its dead or only for the embedded, but for me and my projects its a deadly combination + ultra fun.
Yeah, I don't think the speed performance of FCGI vs CGI is really about spawning in 2023: the benefit lies in the fact that the app can keep state and resources between requests. In particular, it can maintain a database connection all the time (with a retry logic for the case when it gets disconnected).
However, perhaps 15+ years ago I converted an application to an Apache module from a cgi-bin binary and the performance benefit was immense in an embedded Linux device. We certainly were not talking even about tens requests per second in the CGI version.
> the benefit lies in the fact that the app can keep state and resources between requests.
You can share state between processes using temporary files/shared memory. Like CGI itself, it doesn’t scale as well, but it’s not even a blip on resource usage unless you’re dealing with thousands of requests per second.
> In particular, it can maintain a database connection all the time (with a retry logic for the case when it gets disconnected).
You can run a daemon that performs the connection pooling for you, and have each request connect to the daemon instead.
Temporary files/shared memory won’t solve the cost of starting up a Python interpreter or a JVM or whatever and importing all your code with its dependencies. As for the connection pooling daemon, you still need to connect to it. It might be faster than talking to the database directly on a different server, but no new connection still beats new local connection.
> Temporary files/shared memory won’t solve the cost of starting up a Python interpreter or a JVM or whatever and importing all your code with its dependencies.
This is true, CGI is sensitive to startup latency. It can be addressed with pre-forking, but doing so bears the consequence of increased memory usage at idle workloads.
> As for the connection pooling daemon, you still need to connect to it. It might be faster than talking to the database directly on a different server, but no new connection still beats new local connection.
It’s a nothingburger. A connection pool’s bottleneck is in the TCP connection to the database. A UNIX socket has orders of magnitude less latency (a couple μs) and supports orders of magnitude more throughput (a few GBs/sec). Compared to spawning new processes running interpreters, it’s noise.
Some context: in 2017, I found by informal testing that I could run (just on a dev laptop) about about 180 lua no-op programs per second, 140 perl5, 35 py2, 30 py3, 15 node, 12 ruby; /usr/bin/true, as a benchmark, a bit over 500 Hz.
Doesn't necessarily extrapolate back another 20 years, but perhaps sheds some light on how "slow" CGI would've been.
I would also try benchmarking the new io_uring_spawn interface, which is said to be significantly faster than the traditional fork+exec way of spawning processes.
Spawning a process is not that slow on modern hardware. CGI is considered slow, because back in the 90's, everything was slow and resource constrained. Slow processors, limited RAM, slow disks for swap, slow internet connections... I remember developing Perl CGI scripts on a Sparc 10 with 32 megs of RAM connected to a T1 that was nearly always maxed out.
On a related note, "modern" cloud-based application development strategies (like AWS Lambda) seem no faster than CGI. We've come full circle. Though now things are more distributed, of course.
It might interest readers to know that the Fossil SCM's primary mode of operation is CGI, and has been since it began life in 2007: https://fossil-scm.org/home
When i first discovered fossil, in December 2007, its ability to run as a CGI was one of its two "killer features" for me, and it's still right at the top of its list of killer SCM features for me.
(Disclaimer: i'm a long-time fossil contributor, but its CGI support pre-dates my joining the project.)
I remember CGI very fondly. I'd still use it to this day in my personal projects if it was not for the fact that I'm running vanilla Caddy 2 that comes without support for it.
Now, CGI and FastCGI are cool, but who else here built CGIs for running on Pre-OSX Macs? I remember AppleScript/AppleEvents based CGI. The Web Server would be a GUI program running on the Mac and the CGI would also be one. They'd communicate to one another sending AppleEvents. I once made good money with a system for running school exams with it.
I still use FCGI. FCGI works a lot like CGI; it loads programs when a request comes in, and the programs deal with the request. But it's much higher performance, because it doesn't have to freshly load the target program each time.
FCGI is an orchestration system. It starts up more worker processes if there are more requests, up to some limit it figures out by itself. If there are few requests, some of the worker processes get an end of file notification and exit. If a worker process crashes, it loads a fresh copy. Little or no human attention required. One of my systems has been up for 1788 days now.
FCGI works fine with Go programs, Python programs, etc. If you use Go, you can get performance reasonably close to what the hardware can do. Until you get to the point that you're renting additional servers as load increases, you don't need more than that for most applications.
You can run FCGI on some cheap shared hosting systems. Why pay more?
I'm really interested in this. How do you handle DB connections? I guess each CGI file becomes stateful because it's not reloaded all the time, and there's a connection handle there.
Each FCGI program has a database connection. That database connection persists as long as the program instance is running. So it's not establishing a new database connection for each transaction.
package main
import (
"database/sql"
"net/http"
"net/http/fcgi"
)
//
// Called by FCGI for each request
//
func (sv FastCGIServer) ServeHTTP(w http.ResponseWriter, req *http.Request) {
body := make([]byte, 5000) // buffer for body, which should not be too big
if req.Body != nil {
len, _ := req.Body.Read(body) // body of HTTP request
bodycontent := body[0:len] // take correct part of buffer
Handlerequest(sv, w, bodycontent, req) // handle request
}
}
// Run FCGI server
func main() {
sv := new(FastCGIServer)
fcgi.Serve(nil, sv)
}
// Handlerequest -- handle one request from a client
func Handlerequest(sv FastCGIServer, w http.ResponseWriter, bodycontent []byte,
req *http.Request) {
// GENERATE REPLY CONTENT HERE
w.WriteHeader(statuscode) // internal server error
w.Write(content) // report error as text ***TEMP***
}
That will run on low-end hosting at Dreamhost.
So there you are, in a modern language, with hard-compiled code with good performance, and reasonably good fault tolerance. Next step up would be a load balancer and redundant databases.
I really like Haserl (https://haserl.sourceforge.net/).
It acts as a middleman for CGI scripts and and makes it easy to write them with
a shell or lua while keeping the look a HTTP response with a html body.
"""
Haserl is a small cgi wrapper that allows "PHP" style cgi programming, but uses a UNIX bash-like shell or Lua as the programming language. It is very small, so it can be used in embedded environments, or where something like PHP is too big.
It combines three features into a small cgi engine:
It parses POST and GET requests, placing form-elements as name=value pairs into the environment for the CGI script to use. This is somewhat like the uncgi wrapper.
It opens a shell, and translates all text into printable statements. All text within <% ... %> constructs are passed verbatim to the shell. This is somewhat like writing PHP scripts.
It can optionally be installed to drop its permissions to the owner of the script, giving it some of the security features of suexec or cgiwrapper.
"""
> No dependencies or libraries to install, upgrade, maintain, test, and secure.
So you don't actually want this program to do nearly anything useful unless you write it all yourself.
> Zero external configuration, other than telling your webserver to enable CGI on your file.
So now you have to have a webserver external dependency and also configure it. Configuring Apache for CGI securely was no fun unless you were an expert webmaster.
People really put their rose-colored glasses on when they reminisce about this stuff. But there's a reason why we don't use it anymore. We don't need to make up nonsense about why it was so great.
With all due respect, I think you missed some points there.
> > No dependencies or libraries to install, upgrade, maintain, test, and secure.
This is mentioned as opposed to what we have today with Node.js, Pip and so on. Old CGI programs surely had dependencies. But dependency management was dramatically simpler. Of course there are pros and cons in the comparison. But simplicity is the key reminiscence here.
> But there's a reason why we don't use it anymore.
However I wonder: are all those reasons still valid? To what extent were those reasons motivated by rational decisions instead of pure trend-following? We shall not fool ourselves: not every technical decision we make have grounds on reason. I suspect most of them are not.
> So you don't actually want this program to do nearly anything useful unless you write it all yourself.
I think the idea is that there's no required dependencies. Unless I misunderstand, you can take on whatever dependencies you want from your chosen language.
You can use flask with CGI too ;)
I did this last month for a website that only needed to live for a few weeks so it could piggyback off an existing Apache setup and require less work to setup and run.
I found CGI to combine quite well with bleeding edge JavaScript 7 : async / await (compared to JS6 promises that left me confused as a JavaScript newbie).
I am more interested to know in which situations it's a bad idea to use CGI ?
In the old days (1990s), it was a "bad idea" to use CGI for heavily trafficked sites because there was a lot of system overhead involved in spawning a new CGI process for every request, firing up an interpreter (usually Perl), parsing the code, then executing it.
As one of the previous comments mentions, operating systems do a better job of caching now, so there's less overhead there. Fast CGI also gets rid of that problem by running your CGI script as a single persistent process.
CGI may not be the best solution for really huge applications that have hundreds of routes or very complex business logic.
That said, it's still incredibly useful. I use it on some personal sites, and it's nice to know that I can drop it into any new site without having to worry about dependencies and without having to do much configuration. It's simple and it works. Those are two big pluses in a world where things have gotten so complex.
There is something elegant about PHP and CGI. Just put file somewhere and query it by browser.
How would you fix the design of PHP and CGI to be scalable?
I wrote a userspace scheduler that multiplexes lightweight threads onto kernel threads. It is a 1:M:N scheduler since there is a scheduler thread that preempts other thread's loops. I think I can write a runtime that uses files. I also have an epollserver. If I merge them together and write a HTTP router, then in theory you could have a modern server runner that can execute in-process with threads.
I would need to think how to execute Ruby, Lua, Javascript code in a thread. Perhaps that's similar to NodeJs but it's a file runner rather than a server application.
>How would you fix the design of PHP and CGI to be scalable?
A lot of sites get pretty far with just FastCGI. If that's not enough, then a user-space/mmap cache like Apcu. Then, if that's not enough, horizontal scaling and Redis or similar.
I developed a need recently to gather some answers to a few simple questions from people all around the internet. Maybe a few hundred at most.
I don't want to make a Google survey or a Doodle or any of that. It's not for me, not off the clock anyway.
I built a simple HTML form to start with. Just typed it out like it was 1997 again (and then added a little Javascript so it remembers that you've submitted the form to soft-block people from submitting more than once; and some CSS to make it look a little nicer and work on phones and such).
Then I thought I had to build a small backend with a database... I'd originally planned to do a simple CGI thing that stuck the form in SQLite or whatever, but then, when I was doing the frontend, I saw that the web server that I use to serve the file actually puts the entire query string in the access log on disk.
So that's it. That's the backend. I can just grep the log file when I want the answers.
I love CGI and still use it all the time. It's natural MVC too which is great.
Someone should make a framework like Electron or Tauri or NeutralinoJS that works like CGI. Neutralino on is probably the closest but... could be simpler. Just push everything over a pipe.
> Someone should make a framework like Electron or Tauri or WebKit that works like CGI. Don't need to use TypeScript or Rust or whatever. Just push between backend and frontend over a file stream.
I’d argue that already exists in the form of your $SHELL.
You can get graphical UIs in shells. Albeit they’re usually TUIs. But there’s nothing stopping someone from creating a more ”attractive” UI using sixel or even embedding web components via proprietary ANSI escape sequences.
Personally though, I don’t think HTML and JavaScript is a particularly nice abstraction for writing UIs. But each to their own
CGI is fine for unfrequently accessed pages on a machine that has some other purpose. Say, a stats / monitoring page on a file server, or something similar.
[FastCGI] keeps most of the CGI's simplicity of the interface, while keeping the worker processes running, instead of starting a fresh copy every time. It's very widely used to this day.
CGI can be used for high performance sites/routes when used with HTTP caching. Though CGI leaves response time on the table by taking arguments via the environment such that you can't start/pool CGI processes before requests are arriving.
FastCGI needs multiplexed transport of HTTP traffic in to and out of a single process, changing code complexity drastically and taxing memory management. It's far from keeping CGI's simplicity.
I once got a term sheet on the basis of a "website" which was a quick-n-dirty lash-up of CGI, cron, and email. Low-code can be done not only with new tech but also with old.
IIRC from when I wrote a basic http server, the problem with CGI is that you
have to store the output of the script in memory in order to insert the Content
Length field value in the http respondse header. It seems like a rather
innefficient operation, especially if repeated a million times on a server.
Else you could just read from the your CGI process in a non
blocking manner, writing back to the socket whenever output
is available. Am I correct ?
transfer-encoding: chunked, and then send data in 64kb packets. You need to tell the size of each packet at its header. Even then, it is an optional good practice, but not required by RFC.
I've been doing a good bit of research and planning for self-hosted Functions as a Service (also marketed as Serverless). Now, right off the bat, setting up one of the frameworks in kubernetes to accomplish this is going to be more complicated than setting up a cgi-bin capable system. So I wouldn't quite recommend self hosting it for a personal project.
But if someone else set it up, or you're using one of the cloud ones, FaaS sort of reinvents the cgi-bin mindset. Here is a talk by one of the Fission.io devs showing the similarities.
I would ocassionally see dynamic web pages with .exe extensions before 2000s. I assume Windows IIS was also CGI capable and people were building dynamic web handlers in whatever language, compiling down to an executable.
It makes perfect sense but always felt out of place to me for some reason.
I recall eBay used to have ISAPI.dll in most of its request URLs. It bewildered my younger self who thought DLLs were only for a Windows machine, not a web server.
A previous employer of mine build a CMS for newspapers, written in Delphi and ran on Windows under IIS, it too was had a .dll extension. I believe it was the same IIS API (ISAPI) that made it work. It was an insane product that had simply failed to move forwards.
It's not that you can't do what you need to do using the ISAPI, but the ecosystem is just to small for things like the Delphi stuff and keeping up with expectations of customers becomes to much work.
It's impressive that anything that built the correct DLL could be leveraged by it.
But I assume ISAPI didn't have anything like CPAN or pip. It seems likely you'd have to write everything yourself - perhaps worthwhile if you were a very large corporation with the manpower. But something as simple as PHP would swamp it for the rest of us.
IIS was and still is CGI capable, but starting a process on Windows is a very heavy weight thing (compared to starting a process on linux), so it doesn't scale well and a noticeable initial delay whenever it executes. To solve that MS made ISAPI, which runs inside the webserver similar to an Apache httpd module.
In the CGI RFC, IIRC they say that what they call a script can totally be a
binary. The fact that execve on linux can take a script with a #! shebang makes
it it easy to implement, you don't have to care whehter you are exec'ing a binary
or a script.
I started implementing a FastCGI client library for Racket, and (though I have experience getting into the bytes and bits of protocol implementation) it seemed harder than it needed to be. So I decided to try SCGI first, and SCGI ended up working great in production:
You don't even need CGI. Just write your programs to tail the webserver access log file and generate the appropriate pages/changes. :) I'm not making fun. I actually do this to implement my comment system.
"If I'm ever in the situation again of helping new people learn web technology then I'm going to get or convert them to use CGI right off the bat. It's easier to teach, it's easier to understand, easier to get working on most webservers and isn't locked in to any particular language or framework.
The only downside of CGI that I know about is the fact it starts a new process to handle each user request. Yes that's a problem in big sites handling hundreds or thousands of visitors per second. But by the time a new student gets to running a big site they will have already encountered many, many other scalability issues in their code and backend/storage. Let alone teaching them database and security concepts. There's a reason we have quotes like "premature optimisation is the root of all evil".
I don't think students new to webdev should be started on anything other than CGI. They can use any language they want. They can actually understand what they're doing. And they're not hitting any artificial barriers or limits set by frameworks or libraries."
I wrote a web app that launched from inetd to control my cd player. It worked great but wasn't super practical since I tended to use it from the same computer that it ran from. Everything web was fun in the 90s though.
As someone who wrote a large, commercial social network for schools in Perl + CGI.pm in ~ 2000, here were the problems:
* Perl was very memory inefficient so you could often only serve 4 users at the same time, given the limited RAM available (eg. 512M - 1G). Note that Perl's speed was not a problem. I suspect this one has probably gone away with modern servers with massive amounts of RAM, plus the cloud allows you to quickly scale up and down with demand. Nevertheless using a compiled language would be better. (I later wrote a vastly more efficient framework in C).
* Every action on the page had to round-trip to the server, and always involved a database check (at minimum to map the user's cookie to a user ID). This is sort of solved now that Javascript is widely supported, although that brings its own issues along. Also memcached neatly solves the database access problem. Our website was developed a little bit before memcached.
* It's very clumsy and time-consuming to write any non-trivial CRUD-style action using CGI. eg. Just having a table with update/delete buttons is going to involve writing paging code, update form, update submit form (repeat for every possible action). This is the kind of thing that Ruby on Rails solved quite nicely.
* Organizational problems interfacing with the database. We had to negotiate with the DBA for every schema change, which is a problem when the DBA is a sociopath supported by management. Devops sort of solves this (but also I appreciated not being on call).
* The general hassle of dealing with the web, like setting all the non-obvious HTTP headers to make it secure. I think modern frameworks just deal with this, although I've not used them very much.
However if I was to go back and change anything, it wouldn't have been to change the technology. It would have been to extend the service first to university students and later to the whole world :-)
"Perl was very memory inefficient so you could often only serve 4 users at the same time, given the limited RAM available (eg. 512M - 1G). Note that Perl's speed was not a problem. I suspect this one has probably gone away with modern servers with massive amounts of RAM. Nevertheless using a compiled language would be better. (I later wrote a vastly more efficient framework in C)."
I cannot even imagine the strange complexities of this circa year 2000 web application if it could only fit 4 simultaneous page loads on 1 GB of RAM. It's baffling, in fact. Perl isn't really memory hungry these days. Was it really that bad 20 years ago? I would be surprised if the Perl 5 interpreter wasn't in fact much leaner back then. At least my immediate impression of this bullet point is that it wasn't a programming language problem, but a software design problem.
It was definitely a software problem. Small Perl app would take few tens of MBs max back then (and now).
Also Perl FCGI existed much earlier than 2000 (looking at CPAN ~'96-97 appear to be first versions), althought obviously the code would be bit more complex than just "dump html on stdout"
Well it depends what you're doing of course. Part of the application was an email interface which I remember was a particular problem - it was likely loading whole emails into memory to do MIME parsing. But the point here is that large, slow processes gradually take over your system crowding out smaller and simpler requests. We had our own cage in a Docklands data centre, so "spinning up another server" was a weeks long process involving many people.
In modern terms Perl isn't a problem, especially when you have servers with hundreds of gigabytes.
I can safely say that it's not really an immediately impeding problem with 1 GB of RAM either. I have a bit of experience in the context as I develop and manage a very large Perl software stack that runs on a fleet of servers ranging from itty-bitty to small/medium. Some of it is inefficient CGI, some of it is FastCGI, some of it is resident in the background and some of it fires up on intervals. All of it is heavily trafficated.
But, certainly, I'll concede that if the platform is completely CGI-based and for some reason is written such that there's a minimum footprint of 50-100 MB for any single invocation, then servers will be needing a bit of memory in lieu of a software rethink.
One of the strengths of building web apps in PHP is that you have this nice "Every pageload starts from scratch" system. Much easier to develop and reason about.
It's a petty that in Python, long running threads are the norm.
I hope the Python ecosystem can slowly convert to a CGI style approach.
nothing prevents you from spawning a new worker in every endpoint.
it’s a nice way to orchestrate long running workers actually: the main process exists exclusively to keep the app up/ talking to gunicorn/uvicorn and register PIDs of a worker process in a queue.
but we’re all in on wsgi, i don’t see that changing.
I first saw PHP around that time, too, and you could see it set up on CGI on some hosting providers; I think it was usually older ones who had Perl originally and then just tacked PHP on to follow demand.
If all the tools you need is in the shell, and you don't need to maintain a state, CGI is extremely simple and powerful, all you need to do is to write a shell script.
I found this useful when I create a HTTP service to check a remote service's status over SSH.
I felt exactly the same as the author. That's why I created https://trusted-cgi.reddec.net/ (currently on the way to v1 and under the heavy refactoring)
I've built anything from CGI guestbooks to webrings in the '90s but IMO the sweet-spot for solo projects in terms of time to market and scalability/stability is the humble combo of Flask & Caddy.
Is there a way to maintain a sort of process pool so CGI doesn't have to start a new process with each request? It's also important that a CGI crash doesn't kill the web server.
It is, since C has horribly deficient string handling facilities in its stdlib. But a compiled language is not necessarily C.
And, well, simplicity can be relative: if you already know language X, using it will likely be simpler for you than some other better language that you would have to learn first. The beauty of CGI is that its simplicity makes it a language-agnostic API, letting you pick the simplest language for yourself.
We had CGI program written in C in production in 1999 or thereabouts. Worked fine. The problem was, that there was only one person who could understand it and update it.
It was ditched at some point in favour of PHP I think.
I don't necessarily find that adding a compiler step defeats the simplicity, if simplicity is the goal. It might be easier to use a language with better string handling than C, though.
> Personal projects remain unfinished or abandoned because I’ve wanted to do things the “right” way. So I start building it the “right” way with complicated tools and processes, then raise my hands and think it’s all too much for what I want to do. It puts all sorts of things out of reach.
> Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don't think about some big picture and fancy design. If it doesn't solve some fairly immediate need, it's almost certainly over-designed. And don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project.
A quote from Linus Torvalds someone posted on HN and I saved almost a year ago
"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system."
The thing that nobody tells junior programmers, but which you really have to pick up from experience, is this:
"The right final architecture" is never achieved by just immediately going out and building that architecture, i.e. by hooking up all the tools required to support that architecture. That's cargo-culting the architecture.
Facebook's use of Cassandra and CI lint-checks and blue-green deployments is just like military cargo planes' use of radio towers — they didn't build those first; they scaled the thing they were doing to the point that these things became necessary support structures, and then they built them.
The "right way" — the right process for engineering a solution — has very little to do with up-front architectural design. The "right way" — the way that'll be most likely to get you to that "right final architecture" eventually — is really the tenable way: the iterative approach that allows you to build up your solution while keeping only one change or consideration in your head at a time. Which means that engineering "the right way" involves not doing all those cargo-cult practices unless/until they become necessary, and even then, only adopting them one at a time. Just like you wouldn't try to make ten different refactorings in a codebase in one patch-set.
Or, to put that another way: YAGNI applies to processes and tools just as much as it does to code. Some projects never exceed 1000 lines. Do those projects need CI cyclomatic-complexity checkers? No.
Only introduce support structures to a project, as the pain of not having them starts to outweigh the pain of adding them.
> Facebook's use of Cassandra and CI lint-checks and blue-green deployments is just like military cargo planes' use of radio towers — they didn't build those first; they scaled the thing they were doing to the point that these things became necessary support structures, and then they built them.
This is so true. In fact, a lot of the technology choices companies at the scale of Facebook use are only necessary because they're so large/popular, and were added later to stop everything falling over.
Because it's so unlikely your project will ever need to contend with a billion users or whatever, you really shouldn't be designing under the assumption that's likely to happen.
I feel this. The other day I decided to just use Rails for one of these personal projects that I’ve been over-engineering for a while and it’s making progress.
Completely forgot how fast Rails makes everything. Based on the experience I’m probably going to force myself to just use Rails for any personal stuff now and figure if any of them actually take off I’ll rebuild if I need to.
This.
Find a tech (NOT the fotm shenanigans) that will help you get a product up and running. Then you can always go back and swap out parts as needed.
Try not to be influenced by the "omg, that tech is SOOO old!!!" crowd.
this is the exact opposite of the point the comment you replied to is making. Rails is not simple. we are talking about SIMPLE, like CGI. Rails is a classic example of a slow bloated framework.
Rails makes a lot of very complicated things, very simple.
Doing what I'm trying to do with CGI would be a huge headache by comparison. When I said "how fast rails makes things" I meant from a standpoint of productivity.
Rails will be significantly faster than a CGI script that spawns a whole new process, starts up your script interpreter, etc. with every single web request.
Yeah, I like the analogy of a march, as a contrast to the sprints typically employed in commercial development. Like my search engine project is 2 years in now, and still not anywhere near "done". Yet it keeps getting better and more capable at a slow but steady pace.
What's funny is this is how Agile works in a functioning organization. You validate the idea first and quick, and only then iterate with new layers of complexity. Building something the "right" way before ever having seen the product live is called waterfall.
100%. I finish projects with Rails... Can't say this about things I start in Go with Clean Architecture and a distributed frontend written in SvelteKit etc. pp.
That could also be reduced down to "JUST USE RAILS".
Why not get the best of both worlds where you finish projects AND be happy about it? The "for now" makes you think it's a bad decision and you hacked something together to get it done.
But you can finish, stick with Rails and be insanely successful based on what's important to you (start, finish, maintain, profit, IPO, etc.).
Some of the "right" tools are right only in context of a professional team anyway. Especially true when deciding how to organise a project.
If it is just me, for me, I will write it simply and from scratch or use a familiar general purpose framework like rails or laravel. One I have lots of experience with so the project actually gets done, instead of spending all my energy learning another new tool.
This brings to mind some of Paul Virilio’s writings, who discussed this idea of speed (to include not only velocity but also the sensory assault that comes with complexity and speed combined. This has been paraphrased as “speed is a continuation of fear by other means.” How many framework decisions are made in this mindset?
I find it kind of interesting that people frowned upon CGI because it was not efficient. And yet here we are using lambas on AWS like it’s the best invention ever
If there is another request "in the queue", a lambda that is done with a request will just pick that one up instead of closing down and letting a new lambda spin up to deal with the next request. So as soon as you have enough traffic that you have $CONCURRENCY_LIMIT amount of lambdas running (otherwise you would never build up a queue of course), the execution model of lambdas approximates fastCGI much more than regular CGI.
Personally I think the enthusiasm for lambdas stems from several things:
- A (legitimate) interest in outsourcing non-essential things like server maintenance to somewhere else with more ops experience. This frees up the devs to work on business logic instead of worrying about backups and log rotations.
- Some people are REALLY enthusiastic about the whole scale-down-to-zero thing. I personally think that that is somewhat shortsighted, since that means trading (expensive!) engineering time to save a few 100 USD per month on your AWS bill. Sometimes it is definitely worth it to cut costs, but usually it's engineers overoptimizing the one aspect of profit they have significant influence over (ie hosting costs).
- Some people are very enthusiastic about how you can instantly get to "web scale" with lambdas by just putting a very large max concurrency number. That is somewhat true, but massive overkill for the vast majority of companies. It's "engineer it like Google" syndrome but with Amazon instead.
No web dev here, but if memory serves, the main concerns about CGI were about safety, not performance or efficiency. If that's the case, I wonder if new more safe languages could make its use viable again.
One issue was that people configured CGI wrong. You had executable scripts that had free range on the entire filesystem. Ideally you'd need to limit the cgi-scripts to a chroot, but that made it difficult to use Perl, Python and PHP. OpenBSD shipped/ships a chrooted Perl for use with CGI, but other operating systems just left that bit as an exercise to the programmer.
More modern languages like Go or Rust provides a lot of safety and is pretty easy to statically compile, so it's easily chrooted. It would however be weird to do a CGI program in Go, because that language already have a really good build in webserver and more powerful/easily accessible features for interacting with requests compared to what CGI provides.
> It would however be weird to do a CGI program in Go, because that language already have a really good build in webserver and more powerful/easily accessible features for interacting with requests compared to what CGI provides.
Noting that not everyone has the ability to run standalone servers on their web hoster. Shared hosters generally enable CGIs but cannot support per-user standalone servers, in particular not on common ports like 80 or 443.
With people reaching for kqueue, epoll and io_uring to handle the C10K problem it seems that spawning processes wouldn’t be a great idea, with the latency and memory overhead of spawning itself, as well as the meagre limits of the number of concurrent processes in a system.
What safety concerns are you referring to? CGI scripts often had issues with SQL injection and shell command injection, but that was because of badly designed (or poorly used) libraries and unrelated to CGI itself. Or if CGI programs were written in C (as a result of performance concerns) those would of course have the usual memory safety issues of C programs.
You could write this off as a library implementation problem, but it comes about because of unfortunate mapping of HTTP parameters to well-known environment variables in CGI, and has tripped up multiple library authors.
Usually poor ops hygiene, same really as PHP problems:
* Code runs as web server user
* Code is deployed as web server user, because web monkey doesn't understand unix permission (or alternatively chmod 777 everything for same effect)
* Code gets hacked, attacker can modify any file so they just leave backdoor in code
If you deploy code as different user attacker can still of course steal all the data but at the very least they can't modify the code that is running.
On top of that, a lot of poor security practices when it comes to qouting stuff. Perl had kinda interesting feature regarding that where in special mode (recommended for apps dealing with untrusted inputs) the variable was considered tainted [1] till something "cleaned" it by passing it thru regexp. So say taking argument directly from environment (so headers for web) or stdin and passing it with no processing to system() would trigger it.
I had to click and read the article to see whether CGI in this context was "Computer Generated Imagery" or "Common Gateway Interface". I've always been a fan of simple CGI interfaces. More middle-ware always means more bugs.
[1]: https://spin.fermyon.dev/
[2]: WASI is a POSIX-ish standard for WASM that gives you all the low level stuff like standard input and output. It includes all the bits and pieces needed for CGI to work.