> I had the chance to stumble upon this beautiful article, titled: my website is one binary.
> It was an idea crazy enough (in a positive way) to try it myself in C.
Doing it in C is pretty crazy, but it's pretty easy to do in either Go or Rust. No more difficult than writing a website in any other language. It's a great option if you want a high-performance (or low resource utilisation) website but with a little more dynamism than a static-site generator will easily allow.
If you want to do something like this, I'd highly recommend using a package like rs-bindata[0] (Rust) or go-bindata[1] (Go) to essentially embed the static data into your .data section automatically, rather than having to paste it into constants. (Of course, writing it out manually in source code will do the very same - and that's all these packages do, albeit in the preprocessing phase - but this way is less clunky, and I suspect likely more performant and footgun-free than your own implementation is likely to be.)
It's possible that you could even use this to enumerate all the files in some directory, such that you wouldn't require source code changes in order to add a new post - you'd simply need to run the preprocessor again in a different build environment.
Also, in the spirit of the post: it should be relatively easy to implement this oneself with a build script in Rust. In Go I don't think it's as easy, and iirc the author of the package uses the semi-public 'build tags' API[2] (which always gave me a sense of being grudgingly released, in one of the countless "we core developers mustn't allow the unwashed hordes of lowly application developers to have these dangerous, powerful tools" fights that seem to characterise Go development).
Embed is fairly new; not new this release, but not that old. If you're not keeping up with every new release it is easy to not yet be aware of the new features.
But, yes, to samhw's issues, it is now trivial in Go and built into the standard library to embed arbitrary files and/or directories into a Go executable. Also have no idea what the "semi-grudging" comment about build tags are; they've been there since the beginning and heavily used by the Go standard library for many things. Go, of all languages lately, is the most comfortable with reserving things for the compiler/runtime; if they wanted to, they would have. It was always obvious that build tags were necessary and I don't see what's "grudging" about them at all.
Yeah, this is the right answer in my case. I wrote Go professionally for a few years, a few years ago, and I haven’t really touched it since. All my knowledge will definitely be a few years out of date.
The bindata package that samhw linked goes further by walking a whole directory and embedding it all, supplying a function that maps from pathname to file contents. Looks like it returns an Option<Vec<u8>> though [1] which means it must be copying the file contents each time. I don't know why they didn't just return a &'static [u8] instead.
There's another crate includedir which looks more popular. It also supports compression. But if the file is stored in compressed form, looks like it also will (decompress/)copy into a fresh Vec, whether you request the uncompressed or compressed forms. [2] That's not what I would like. I'd prefer (lazily or eagerly) decompressing once and keeping it around in RAM for next time. YMMV.
I could use a high-quality implementation of this same idea. Personally I don't get using it for a personal blog (I'd rather be able to change the content without recompiling/restarting), but on my todo list is producing a zero-dependency, single-binary form of software I'm working on, including its web interface. I might end up writing my own.
Oh nice, thank you! I’d like to pretend that I intentionally linked the crate for the reasons the other person gave, but to be perfectly honest I wasn’t aware of that. Much appreciated!
God seeing C like this again throws me right back to college. I'm only in my 20s, but this feels nostalgic.
While I wouldn't do something like this for production (as most of us wouldn't), it's fun to do in C, especially just with the standard library. Feels like you have the entire world in your hands and you can do anything you want, as long as you don't segfault.
I know you can embed binary blobs into compiled executables. I also (often) use the trick to include a shell script header in Java JAR files to make them self executing (assuming Java is installed).
But, would it be possible to do something like this where new posts are appended to a binary? You have one executable, but then to update the content, new data is appended to that file?
I assume it’s possible, but I’m not familiar enough with executable formats to know where to start. I’m also not sure how this works security wise — but I again assume that you can update a file, even if the memory holding the code loaded from that file is protected from writes.
I've not used Windows for a long time, but in .exe format it was possible to include additional data at the end of it and read it during program execution. I am relatively certain writing was not possible while the executable was running, but that I guess it could probably be bypassed by a secondary executable that does the writing and re-starts the original. Not as nice, but not sure if anything else can be done. Maybe nicer option would be to always copy the original executable, and spawn the secondary one while writing to the original. The running copy could be keeping data in memory anyway.
I like your project a lot and adhere 100% to your minimalist views. Using #include in this way is neat and I suspect it may be possible to further this 'brutal' approach with pre-processor tricks and struct permissiveness.
To help code legibility you could maybe put the socket functions in a separate file ? Or even make it a lib. (Though I admit it may go against your compactness goal).
Same for HTTP stuff.
This is how many sites were written in the mid-90s. I remember writing a whole webmail portal as a single C file in about 1995. It was really cool and pretty revolutionary and me and my friends used it a lot. I never even remotely thought about releasing it to the world. Then HoTMaiL came along and I thought "Oh." And then they got bought by Microsoft for $400m and I thought "Oh, dear."
An idea for going further still would be a blog, including a content management system, that is a single, self-contained, self-modifying binary. Write a new blog post, save it and the executable modifies itself to contain the new post and be able to output it.
I’ve been playing around with wrapping SQLite in a similar way for applications. Suppose folks would call it “low code/no code” but I find the title a bit cringe.
Just been crappy little experiments so far but I’ll prob push something out when I get some more time for it.
The Von Neumann architecture sounded good in the beginning when computers were hulking beasts and had to be timeshared to be economical. Scaling this to individual global-Internet-connected network PCs provided to be a disaster as is evidenced by all the "expert C programmers" over the years that couldn't keep their buffer overflows in their pants and things like NX support in CPUs and pointer authentication that just wouldn't have to exist if the CPU would get instructions from a 100% separate address space than its data (kinda like that old 8051 that's still kicking).
You are not executing data. You are just wrapping it up in your binary and "force it" in memory. Video games used to this in the old times. I remember Deltaforce exe to be tens of megs. Is this a good approach ? Probably not for a bigger site, but it can theoretically work for a small blog that is mostly text. Is C the right language to do this ? It sure is a little bit masochistics to work with c strings.
In the end of the article I've told this should be taken lightly, as an exercise in minimalism and a joke.
Most of my products lately are some backend servers that respond to remote requests from SPAs or various sister backends. All servers are written in modern C++ and each one is a single executable. The performance is stellar. Modern C++ along with some libraries helps to keep application specific code small - about the same size as if written in more "traditional" web languages.
Interesting choice. I'm now wondering whether the opposite approach might also be completely viable and surprisingly fast: it looks like the overhead of starting a process is single-digit miliseconds, so having a static binary that looks at PATH_INFO and returns content would work. Of course that's difficult to distinguish from just having a static site ..
Even on older systems that approach could scale quite a ways. Things like Perl with an expensive startup needed to do fancy preloading and preforking, and it was easy to mentally account that to the process of starting a process per CGI, but the vast majority of difficulty scaling that approach came from the incredibly heavyweight process of starting one of those processes. (Perl was convenient for CGI back in the day, but in a lot of ways was a bad choice in almost every other way. And I'm not talking about syntax at all here, I'm talking about all the characteristics of its startup and execution model. Python has almost identical problems, for instance.)
If you've got a tiny little C program that doesn't do much DLL loading and runtime linking, and is likely already in cache to be memmapped directly into a process, and is shoveling out content with little-to-no computation you'll have a website that'll take an HN'ing without even noticing even on old hardware.
"A tiny little C program" worries me on other fronts, but there's a reasonable selection of things that are safe enough to put on the internet and still compile to single fast binaries that you could use in 2022. Alas, the options were not so nice in 2005.
I consider "C", and "C armed with the best of static and dynamic analysis tools" to be two different languages for this purpose. I freely acknowledge this is a personal opinion not necessarily shared or understood by the programming community as a whole. I'm not a fan of the latter, but it is at least acceptable. I prefer something where the base language is less of a mess, and my analysis tools can spend less effort cleaning up those obvious messes, but technically, it can work. The former should not be exposed to the internet. This particular article appears to be the former.
I didn't understand that quote. Why is CGI cheating? Because the author wants to do himself whatever it is that (i.e.) Apache does under the hood to run the CGI script?
Doing it in C is pretty crazy, but it's pretty easy to do in either Go or Rust. No more difficult than writing a website in any other language. It's a great option if you want a high-performance (or low resource utilisation) website but with a little more dynamism than a static-site generator will easily allow.