Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How were video games from the 90s so efficient?
286 points by eezurr on Nov 6, 2021 | hide | past | favorite | 232 comments
Title says it all. I am interested in discovering how games like roller coaster tycoon, sim city 2000, warcraft II, and descent (blown away by this one) managed to be created and function on computers that had 500 MB HDD, 300 MHz CPU, and 4 MB RAM.

I'd even broaden the question and ask how did windows 95 stay so small?

Is it possible to recreate this level of efficiency on modern systems? Im curious because I'm interested in creating simulation video games. Dwarf Fortress and Rimworld eventually both suffer from the same problem: CPU death.

If I create a window with C++ and SFML, 60 MB of RAM is used (not impressive at all). If I put 3,000,000 tiles on the screen (using Vertex Arrays), 1 GB of RAM is used (admittedly, that is impressive) and I can pan all the tiles around smoothly.

What other tricks are available?




I built games in the 90s. Graphics was obviously the hardest part.

We thought about things in terms of how many instructions per pixel per frame we could afford to spend. Before the 90s it was hard to even update all pixels on a 320x200x8bit (i.e. mode 13h) display at 30 fps. So you had to do stuff like only redraw the part of the screen that moved. The led to games like donkey kong where there was a static world and only a few elements updated.

In the 90s we got to the point where you had a pentium processor at 66 Mhz (woo!) At that point your 66Mhz / 320 (height) / 200 (width) / 30 (fps) gave you 34 clocks per pixel. 34 clocks was way more than needed for 2D bitblt (e.g. memcpy'ing each line of a sprite) so we could beyond 2D mario-like games to 3D ones.

With 34 clocks, you could write a texture mapper (in assembly) that was around 10-15 clocks per pixel (if memory serves) and have a few cycles left over for everything else. You also had to keep overdraw low (meaning, each part of the screen was only drawn once or maybe two times). With those techniques, you could make a game where the graphics were 3D and redrawn from scratch every frame.

The other big challenge was that floating point was slow back then (and certain processors did or didn't have floating-point coprocessors, etc.) so we used a lot of fixed point math and approximations. The hard part was dividing, which is required for perspective calculations in a 3D game, but was super slow and not amenable to fixed-point techniques. A single divide per pixel would blow your entire clock budget! "Perspective correct" texture mappers were not common in the 90s, and games like Descent that relied on them used lots of approximations to make it fast enough.


Agree with everything you said. We used x87 though and paid extreme attention to fpu stalls to ensure nothing wasted clocks.

As developers, we were also forced to give the graphics guys a really hard time: "no that texture is too big! 128x128" & "you need to do it again with less polygons". We used various level of detail in textures and models to minimise calcs and rendering issues. Eg. A tank with only 12 vertices when it would only be a pixel or three on screen. I think it only used 2x2 texels as part of a 32x32 texture (or thereabouts)...

This was around mid 90's.


Ha. Yeah, there was not a lot of memory. The toolchain we built at the the time automatically packed all individual game texture maps into a single 256x256 texture (IIRC). If the artists made too many textures, everything started to look bad because everything got downsampled too much.

Any yeah, the design of the game content was absolutely affected by things like polygon count concerns: "Say, wouldn't it be cool to have a ship that shaped like a torus with a bunch of fins sticking out? Actually, on second thought... How about one that looks like a big spike? :)"


A fantastic early example of memory limitations in games are that the clouds and bushes in Mario 1 are the same sprite with a different palette.

Those were certainly different times. :) It was so much cooler to see what developers and artists did within those limitations than what we are doing today. The entire game dev community was like a demoscene in a way.


> If the artists made too many textures, everything started to look bad because everything got downsampled too much.

That's really clever. On the surface of it, it's "just" about dynamically adapting fidelity to fit in the available memory.

But really, it's a deeper organisational solution: it pushes the breadth vs fidelity of assets trade-off back to the designers, who are the ones that should decide along that curve anyway. It provides a cheap way for them to visually evaluate various points on that trade-off curve. Very clever.


Honestly, though the numbers are bigger and floating point arithmetic is fast on a modern GPU this still sounds a lot like how we work nowadays.

I recently spent two years on the performance team for a large indie title and a huge portion of it was asking artists to simplify meshes, improve LODs, increase the amount of culling we could do on the CPU, etc.

My own work was mainly optimising the rendering itself.


The 199os were a time when we tried all sorts of tricks to get the last bit of performance out of the hardware. One book which stands out is the one Michael Abrash wrote:

Michael Abrash's "Graphics Programming Black Book" https://github.com/jagregory/abrash-black-book

https://www.drdobbs.com/parallel/graphics-programming-black-...

As CPU power, the number of cores, RAM sizes, HDD sizes, graphics card capabilities have increased, the developers are no longer as careful to squeeze out the performance.


I may be naïve/out of the loop here, but it’s fun to imagine what would be possible with the same gusto and creativity applied to today’s hardware. I imagine that a significant amount of modern hardware, even in games, is eaten up by several layers of abstraction that make it easier for developers to crank out games faster. What would a 90’s developer do with a 16-core CPU and an RTX3080?


You can check out some demoscene demos [0], which usually do this (albeit to save executable size instead of just to run fast). These days you don't even have to run them yourself; most have YouTube recordings.

[0]: https://www.pouet.net/prodlist.php?platform%5B%5D=Windows&pa...


Well to be in the spirit with the parent it should also try and use terabytes of storage to push the storage to the limit.


You see a bit of this sort of optimization at the end of a game console's life cycle.

I expect the PS3 still has some headroom since few games adequately exploited the Cell processor's SPUs.

Since the Switch is underpowered compared to PS5/Xbox Series X/PC perhaps we'll see some aggressive optimization as developers try to fit current-gen games onto it.


I remember reading the comp.graphics.algorithms news group back when Descent was just out. People were going a little bit crazy trying to figure out how the hell that thing worked. I found this page that talks about some of the things done to do texture mapping: https://www.lysator.liu.se/%7Ezap/speedy.html


Descent blew my mind, as well. IIRC it predated Quake and was the first ’true’ 3DoF FPS?

The source code has since been released on GitHub, if you’re ever interested in seeing it!


6DoF, as there are six degrees of freedom. Three correspond to rotational movement around the x, y, and z axes, commonly termed pitch, yaw, and roll. The other three correspond to translational movement along those axes, moving forward or backward, left or right, up or down.


Yep close, I think it was even more constrained then that though.. https://news.ycombinator.com/item?id=20973306


I dabbled with graphics using mode 13h and later with VGA. It was orders of magnitude simpler than using Vulkan or DX12.

CPUs were simpler DOS and Windows 95 were very simple compared to Windows 10.

That means that writing optimized C or even assembler routines was pretty easy.

If we go 10 years back in time, programming Z80 or MOS Technology 6510 or Motorola 68k was even simpler.


Yes the instruction sets were simpler, but the developers at the time had invented a lot of clever solutions to solve hard problems.

I think the most innovative timespan was between 1950–1997'ish, and hope we get back to get the most out of hardware again as common sense.


Developers at that time were great. But is harder to be great today when complexity is 1000x.


Some interesting related stuff in this talk:

HandmadeCon 2016 - History of Software Texture Mapping in Games

https://www.youtube.com/watch?v=xn76r0JxqNM

I think they say at one point it went from 14 to 8 instructions, and then the Duke Nukem guy (Ken Silverman) got it down to around 4.

Quake would do something where it only issued a divide every 8 pixels or something, and then only interpolate when inbetween and the out of order execution on pentium pro (I think?) would let it all work out.


For perspective correct texture mapping quake did the distance divide every 8 pixels on the FPU, and the affine texture mapping on the integer side of the CPU in between (you can actually see a little bit of “bending”, if you stand right next to a wall surface in low res like 320x200).

Since the FPU could work in parallel with the integer instructions on the pentium, this was almost as fast as just doing affine texture mapping.

This worked even on the basic Pentium.

It was likely also the reason Quake was a unplayable on the 486 and Cyrix 586.


Great discussion! Early on, they mention what I think is the book Applied Concepts in Microcomputer Graphics. It sounds like it would be right up my alley, but I can only find it with very expensive shipping.

Does anyone know of a book like it? I'm very interested in getting started with software rendering from the ground up, mainly to scratch an intellectual itch of mine. I learn better from well-curated books than online material in general.


This calls back a lot of good memories. In additional to the cool tricks you mentioned, I recall the limited 256 or 16 color also created some innovative ways to use color palettes dynamically. The limited built in PC speaker also pushed the boundary for sound fx and music implementations.


Out of curiosity, How did you know how many clock cycles your rendering code took?


You look at the assembly code, grab an Intel Programmer Reference manual (they were about 1000 pages), look up each instruction opcode and that would tell you the clock cycles. For memory operations it is much more difficult due to caching. However, for many hot regions of code, the data is already in the L1s so manual counting is sufficient. (At the time there was a book called The Black Art of ... Assembly? I can't recall, and I should be flogged for forgetting it... but it was directed at game programmers and covered all sorts of assembly tricks for Intel CPUs.)

Also, a little later in the 90's: VTune. When VTune dropped it was a game changer. Intel started adding performance counters to the CPUs that could be queried in real-time so you could literally see what code was missing branches or stalling, etc. Source: I worked with Blizzard (pre-WoW!) developing VTune, feeding back requirements for new performance counter requests from them and developers.


It was common back in the day for machines to ship with detailed technical documentation.

I spent many an hour as a young child reading the C64 programmers reference guide, calculating the op speed and drawing memory maps.

https://archive.org/details/c64-programmer-ref


What were you drawing memory maps for?


Mainly making cheats for games. Back then you have devices which allowed you to view memory, so I would play games and look for changes made in memory when you did certain actions. Made a map of them. Most of the time a life counter was good enough.

Also for my own games - you don't have pointers as such, you have your memory, and you need to know what lives where (and when it lives where).


You might be thinking of "Zen of Assembly Language" or "Zen of Code Optimization" by the brilliant Michael Abrash. I own the latter and in addition to plenty of low-level code optimization advice for the microprocessors available at the time it also includes timer code for DOS that lets you measure code performance with high precision.


Yep! Thanks! I was also thinking of his "Graphics Programming Black Book". Black Art ... hah! My bad memory. That dude abused the hell out of int 13! Kinda surprised he's with Oculus under Facebook. I wish his brain wasn't owned by Zuck. Maybe he's in that phase that musicians hit, when they get old and gray and start performing their classics at Vegas for a million bucks a night to audiences of boomers.


(At the time there was a book called The Black Art of ... Assembly? I can't recall, and I should be flogged for forgetting it...

Probably not the book you're thinking of, but Michael Abrash's Graphics Programming Black Book was a mix of reprinted old articles and new content that had a bunch of highly optimized (for 486) graphics routines. IIRC there was a 9-cycle-per-texel texture mapping routine.


> You look at the assembly code, grab an Intel Programmer Reference manual (they were about 1000 pages), look up each instruction opcode and that would tell you the clock cycles.

Wouldn't this "just" tell you how many "cycles of code" there are, and not how many cycles actually run? Branches etc. will of course cause some cycles to be double-counted and others to be skipped, in the dynamic view.


Then you just calculate them all.

If Branch A is taken ... total X clock-cycles.

If Branch B is taken ... total X clock-cycles.

If Branch C is taken ... total X clock-cycles.

And so on. And then make sure that the "longest" branch fits into the clock-cycle budget.


Were games at this point of low enough complexity that the combinatorial explosion of branches could be contained and reasoned about by humans? Or did you have software doing this?


There is no such thing as combinatorial explosion here, just take the longest branch every time.


But this means if you have a situation like

    if (a) { ... }
    ...
    if (b) { ... }
Where the branches are long, and b = !a, you significantly overestimate the amount of code. I guess that was considered good enough, then?


Yes.

However, this is part of the reason why you always try to avoid performing if/then in critical loops. Obviously the index counter cannot be hoisted, but if you are doing 1000 iterations, 2-bit Yeh prediction (which was common at the time) can be amortized.

Later CPU architectures speculatively executed and then re-executed instructions that were incorrect due to branching, and VLIW allowed you to "shut off" instructions in a loop rather than have to predict.


You just need your code to finish before the next frame. Then you wait and start the next cycle when you receive the interrupt from the start of the next frame from your graphics hardware.

Of course once you start writing for heterogeneous hardware like PCs with very different performance between models, you may use adaptive approaches to use all the available resources rather than just providing the most basic experience for everyone.


You could just measure the isolated inner loop with accurate timers and figure out down to the clock how many cycles it was taking.

You also basically knew how many cycles each instruction took (add, multiply, bit shift, memory reference, etc.) so you just added up the clock counts. (Though things got a bit harder to predict starting with Pentium as it had separate pipelines called U and V that could sometimes overlap instructions.)


Yeah, it's helpful to remember that games in the early 90s at least would have been expected to run on a 486, which were still very widespread, and the 486 was neither superscalar nor out of order. It was pipelined (the main difference between a 486 and a 386) but it was still at that time simple to reason about how long your instructions would take. And there was no speedstep or any of that stuff yet.


This is so different from the current state of affairs.

Code for the CPU might get optimized beyond recognition, vectorized, executed out of order, ...

The shader code I write these days is translated to a whole list of intermediate formats across different platforms, maybe scheduled in a jobs system, then handed to graphics drivers which translate it to yet another format...


On the C64 I will change the border color when my frame routine starts and change it back once my routine finishes. This will tell me how much of a fraction of the total frame time I use, in quite literal terms. I wonder if similar tricks were used for VGA. I think you could change color index 0 to the same effect.


Speaking about the C64 the exact instruction timing was also key to the two rather clever hacks to put sprites outside the nominally drawable 320x200 pixel screen area.

First in the top and bottom border. This was done by waiting for the GPU to start drawing the bottom line of text on the screen and then switching to a mode with one less line of text. The sprite hardware would then not get the signal to stop drawing after the last line since that presumably already happened with fewer lines to display. This would cause it to keep drawing sprites below the bottom border. The in the vertical blanking before the top line was draw you would switch it back from 24 to 25 lines of text.

The side-border is a variation on this where you time your code to wait for the 39th character of a scan line then switch from 40-character wide to 38-wide. Again the sprite engine would not get the signal to stop drawing and continue drawing in the side border outside the 320x200 pixel nominal capabilities of the hardware.

For side border it was necessary to do this for every scan line (and every 8th line would have different timing, probably related to fetching the next line of text), so timing was critical to the exact CPU cycle.

These machines were not much by modern standards but for the hacker mind they were an amazing playground where you had just your assembly code and the hardware to exploit without the layers and layers of abstractions of modern gear.

Edit: spelling


I did that for A game in DOS days. It needed a virtual retrace interrupt using a timer interrupt to trigger shortly before retrace, busy waiting until retrace then recalibrating the timer for the next frame.

Pretty soon after that Operating Systems managed memory and interrupts themselves. The added latency on timers made precision tricks like that impractical.



Good write up. I’d also add to that, that modern techniques are written to use the modern hardware.

It’s easy to forget that each upgrade to graphics is an exponential jump. Going from 8 colours to 256 colours on screen. Jumps in resolution. Jumps in sound, number of sprits the hardware can track. Etc.

When we look at graphics now and the tangible visible difference between 8k, 4K and HD and Moore’s Law no longer in effect it is easy to forget just how significant the jumps in tech was in the 80s and 90s if you hadn’t lived through it and/or developed code during it.


A game like Donkey Kong uses a static tile mapped background and some small dynamic objects that move around, and the hardware combines them.

These machines don’t even have enough memory to store one frame buffer, you can’t program like a modern game where everything is customizable and you can just do whatever you want as long as it’s fast enough.

In a game like Donkey Kong you do what the hardware allows you to do (and of course the hardware is designed to allow you to do what you want to do).


I’ve created a 4k UHD video editor for Haiku OS (https://github.com/smallstepforman/Medo), it’s a C++17 native app, with over 30 OpenGL GLSL effect plugins and addons, multi threaded Actor model, over 10 user languages, and the entire package file fits on a 1.44Mb floppy disk with space to spare. If I was really concerned about space, I could probably replace all .png resources with WebP and save another 200kb.

How is it so small? No external dependancies (uses stock Haiku packages), uses the standard C++ system API, and written by a developer that learned their trade on restrained systems from the 80’s. Look at the old Amiga stuff from that era.


Nice, this may be worth a Show HN. I'm not a Haiku user although I try it from time to time should it become interesting for my uses. Also I agree on the value of being exposed to the old way of doing things. I started coding on the Amiga, and the way its OS worked (no memory management) forced me to grow sane habits when for example dealing with memory allocation: if I didn't free a buffer before my program exited, that buffer would remain allocated until the next reboot. I once had to debug a small assembly program of mine that lost a longword (4 bytes) at every run; turned out I missed it when doing pointer calculations with registers, and the journey to monitor the program activity, find the problem and correcting the error has been of tremendous help years later.


> Nice, this may be worth a Show HN.

smallstepforman has submitted it before: https://news.ycombinator.com/item?id=25513557. Unfortunately, it didn't garner much attention.


This looks amazing. Thank you for sharing.


Lower resolutions on smaller monitors. 256 colors at a time, no truecolor yet.

Lower-res samples, if any.

Lower framerate expectations - try playing Descent 2 on an emulated computer with similar specs to the lowest specs suggested on the box. Even one in the middle of the spec range probably didn't get a constant 60fps.

More hand-tuned assembly (RCT was famously 99% assembly, according to Wikipedia; this was starting to be unusual but people who'd been in the industry a while probably did at last one game in 100% assembly, and would have been pretty comfortable with hand-optimizing stuff as needed).

Simpler worlds, with simpler AI. Victory conditions designed to be reached before the number of entities in the game overwhelmed the CPU completely.

Simpler models. Most 3d games you play now probably have more polygons in your character's weapon than Descent would have in the entire level and all the active entities; they certainly have more polys in the overall character.

I mean, really, three million tiles? That's insane by that day's standards, that's a bit more than 1700x1700 tiles, a quick search tells me the maximum map size in Roller Coaster Tycoon 3 was 256x256, and it's a fairly safe assumption that RCT1 was, at best, the same size, if not probably smaller. I can't find anything similar for Sim City 2000 but I would be surprised if it was much bigger.


Quite simply: No frameworks.

Some games nowadays are built using Electron, which means they include a full web browser which will then run the game logic in JavaScript. That alone can cause +1000% CPU usage.

Unity (e.g. RimWorld) wastes quite a lot of CPU cycles on things that you'll probably never use or need, but since it's a one-size-fits-all solution, they need to include everything.

For Unreal Engine, advanced studios will actually configure compile-time flags to remove features that they don't need, and it's C++ and in general well designed, so that one can become quite efficient if you use it correctly.

And then there's script marketplaces. They will save you a lot of time getting your game ready for release quickly, but they are usually coded by motivated amateurs and super inefficient. But if CPUs are generally fast enough and the budgets for game developers are generally low, many people will trade lower release performance for a faster time to market.

=> Modern games are slow because it's cheaper and more convenient that way.

But there still are tech demos where people push efficiency to the limit. pouet.net comes to mind. And of course the UE5 demo which runs on an AMD GPU and a 8-core AMD CPU:

https://www.youtube.com/watch?v=iIDzZJpDlpA


That demo is eerily beautiful. Honestly the first time I’ve seen photorealistic detail in a game engine.


These tricks are things that you could still pull off today. The difference is, outside of competitions or the demoscene, nobody _needs_ to pull them off - it greatly decreases your time-to-market to do things in a straightforward way, rather than a novel way. That wasn't true 20+ years ago - the straightforward way often brought you up against hardware limitations.


Having been a game programmer back in the day (C64, Amiga, Atari Jaguar, N64, and beyond to newer machines) I fully agree with your point.

My earliest days had the most fantastic tricks. So called game engines were useless. Now it’s the complete opposite.


What did you program on the N64? And which among those platforms was was your favorite?


> it greatly decreases your time-to-market to do things in a straightforward way

I mean, it's hard to disagree when you phrase it this way, but... really? In the old days (mid-'90s) studios like id released many games per year, some of which with completely new technology.

Modern studios and indie developers (!) who "do things in a straightforward way" can be happy to release even one game per year, and that's with a lot of reuse. Forget novel technology once a year!

So maybe these tricks don't really increase time to market that much, compared to other variables that are also in play?


I have zero information about (due also to little interest for) computer games, so this is just a wild speculation: maybe the visuals in terms of "levels/textures/objects/mesh/characters/voice/audio" dominate the planning, now?


The visuals are crap. Greed dominates the game industry.


I wrote "planning", meaning that the development of the fame assets maybe now takes more time than the actual engine/logic of the game itself.


Greed does dominate the industry, but there's more indie games than ever


> ... it greatly decreases your time-to-market to do things in a straightforward way ...

I'd actually like to link the YouTube channel of a person who is writing their own game engine and game at the same time: https://www.youtube.com/c/RandallThomas/videos

You can see how using Godot, Unity or Unreal (or most other engines) would have been much faster in regards to time to market.

Similar differences show up when you try to build the same project, once while using an engine and another without it, the performance can be much better if you write your own optimized code (supposing that you can do so in the first place), however the development still takes much longer, for example: https://www.youtube.com/watch?v=tInaI3pU19Y

Now, whether that matters to you or not is a different matter entirely: some care about learning a lot more about the lower level stuff, others just want to develop games and care more about the art/story/etc., while others care about selling them ASAP.


The simple answer is you have to work within the constraints you're confined in. I used to work for an ecommerce company, a very early Amazon competitor, and because the Internet was so slow in the early years, we had a rule that our homepage had to be less than 100k, including image icons. Every 1k squeezed was a success and celebrated. Even today Amazon's homepage is less than 1MB, go ahead and check.

Now with CSS frameworks, JS frameworks, ad frameworks and cross-linking, pages take forever to load with even less actual content.


> Even today Amazon's homepage is less than 1MB, go ahead and check.

Pingdom reports 4.8mb (3.3mb of images), 661ms to load, and 298 requests.

GT Metrix reports 2.65mb (2mb of images), and 307 requests.

An incognito window with Firefox on my system says ~3mb and ~265 requests. Just the top six or seven media assets combined that loaded initially weigh in at about 1mb.

Certainly not the worst page ever, granted.


Nitpick: MB = Megabyte. Mb = Megabit. mb = millibit?


Related, in my opinion binary prefixes are better, e.g. MiB > MB (ha)


Hit the nail on the head. Constraints fuel creativity and innovation. If I give you everything, then what do you have left to do?

It’s like the General that wins every battle because he has the most elite soldiers. Could the General win with a rag tag shoddy group of soldiers?



Yup. I am currently developing a game for African players and I have to build my own engine and do old school tricks to get the game under 1MB on a website. CSS sprites and all


I was just thinking about some console game that launched with a 60GB day one patch, and how disappointing it must have been for players without good internet to put in the disc and not be able to play or missing half the content.


The major difference is probably the kind of game being produced. There is a reason DF starts out running smoothly and eventually hits CPU death. I'm not certain what the reason is these days, but it used to be something like pathfinding combined with polynomial explosions in how a dwarf chooses their next job.

Old-school games would have been forced to redesign the game so that that wasn't a problem that gets faced. For example, none of the games you list try to simulate a deep 3d space with complex pathfinding. Warcraft II I know quite well and the maps are small.

One of the reasons systems used to be highly efficient was evolutionary - it was impossible to make resource-heavy games, so people kept iterating until they found good concepts that didn't require polynomial scaling algorithms with moderate-large N inputs.


I recommend watching this: https://youtu.be/izxXGuVL21o

Naughty Dog Co-founder Andy Gavin discusses various hacks that were used on the Playstation to get Crash Bandicoot to run smoothly. The fuller version is also worth watching.


The art of working with constraints does not seem to be lost; just look at late cycle console games.

A PS4 was/is effectively unchanged hardware between Nov 2013 and today, yet the late lifecycle games look great. Upon release of the hardware devs had plenty of performance to play with, then 5 years in they have honed their craft and are able to use all the tricks at their disposal to squeeze as much graphical and performance life out of a limited resource budget.


For those who like text, his series of blog entries is also extremely good. https://all-things-andy-gavin.com/video-games/making-crash/


This one is my favourite. I would call this kind of programming both art and beauty.


If I remember correctly the two main hacks were making draw lists such that only certain polygon faces were rendered based on Crash's xyz position (the theory being that it isnt possible for other faces to be seen from that location), and also that he removed functions/files from Sony's standard C libs on the PS1 SDK?


They realized that untextured polygons were way faster to draw than textured ones, so rather than texture the Crash model, they cranked up the polygon count and simply colored them. Often times, the triangles were on the same order of size as the pixels. It also avoided the playstation's lack of perspective correction on textures. It both made the character look gorgeous for the day, and made the code run faster.


Yes, I played many many hours of crash as a kid and he looked extremely vivid and bright. When I code in Three.js these days I also skip loading textures and just go for hex colors on polygons :)


I believe two other huge wins were how they padded the disc for faster reads(though maybe this was common) and how they could trivially cull polys based on the constraint that the game is on a rail. By knowing your position on the rail you just lookups what polys can be culled via precalculated tables.


Right, for the polys I think we are talking about the same thing :)

For the disk, I guess they viewed the disk as merely just an extension of RAM? If the read/write speeds were sufficient for them... page files galore :)

I also remember that they used to write bytes to disk in specific orders so that they would maximize the correct bytes being read into the buffer every microsecond


I'm gonna stray a little from the norm here.

Most people seem to think programmers of yore were smarter, and generally, that's probably true -on average-. I mean, there weren't boot camps back then.

That aside though, the scene has changed. Write your super efficient perfect game, and nobody will play it. Look at Ludem Dare, and all the JSxK competitions. Those smart people still exist today.

But the landscape has changed so much, that consumers want the pretty graphics, heavy resources, social aspects. They don't care if it's 1kb or 1tb.

In short, people of yesterday were resource constrained so had to write smart hacks. People today have many more resources available, and use those. Both are using every bit of what they have available.


Players have always cared a lot about graphics. Descent looked fckin amazing in 1995


The last game I was big into was Q1. When GLQuake came out, it felt like I was in the future. Compare GLQuake to even some indy game today and it looks awful.

You're right - expectations change and most everything else is nostalgia.


I had an Xbox hooked up around 2002 at a friends house (high school years for me) and their father commented that Dead or Alive looked like a movie to them. They didn’t know how games could look more realistic.

Crazy how far we have come since then.


Graphics programmer has got to be up there with chef and prostitute for evergreen, recession-proof careers. And the nice thing is, the tech doesn't even change that quickly. People are still using OpenGL, the original version of which was released before DOOM!


Whether it's recession proof or not is a fairly open question, one thing for sure is that it's not layoff proof. Game companies are notorious for doing mass layoffs or shutting down with little to no advance notice and as a game developer you can never be too sure about your job security at any one company, not to mention it's kind just accepted that you will be working longer hours for less pay.

As for the notion that the tech doesn't change that quickly, that is simply false. Graphics programming is unbelievably cut throat, competitive, and advances very fast. Not only are there advances in technology, but different games also have different aesthetics that often require very niche or customized development to get just right so as to avoid your game feeling generic.

Graphics APIs and hardware change almost every other year and the major titles have to adapt to the latest features. You can certainly make good games without focusing on graphics, plenty of good indie games or games that don't focus on graphics, but if you are a graphics programmer and your game does focus on that, you are constantly having to keep up with advances in technology.


I stand corrected! You sound like you actually know what you're talking about.

Sigh. The search for a field where you can learn some stuff once and coast continues....


I'm sure this is covered elsewhere in this large thread but I also worked on a triple A game in the nineties, one of the first 3D FPS games with a huge story. Every aspect of the game's memory footprint was tuned. I spent over two months at one point just going in and reducing every structure to remove unneeded waste and to quantize variables down to the absolute minimum. Bools were held in a single bit. If a variable only needed less than 256 values it would be held in a single byte. I changed a lot of data structures to be highly optimized and crushed down to just what they needed to still work. The non technical producer couldn't understand that I spent two months and was so joyful that the game was running and _nothing_ looked different.


The graphics (and sound) was much lower res, smaller worlds, fewer characters, less motion poses per character, prebaked character frames, the floors and ceilings were flat, the textures had far fewer colors, the lighting and shadows were prebaked onto static surfaces, game code was performance tuned at the assembly code level, symmetrical textures, low contrast textures … and we all needed to pay for 2MB more RAM to play Doom.


Hard disagree with many comments here.

For ennemies AI, ok: we made progress. But for human-vs-human gameplays, we basically had everything in the nineties.

Warcraft II over Kali (to simulate a LAN over the Internet for Warcraft II didn't have battle.net) was basically the gameplay of Warcraft III reforged. Half-Life and then it's counterstrike mod were basically the FPS of today.

Diablo II (development for Windows 1995 started in 1997 and game came out in 2000, so not 2000 per se but nearly) was basically so perfect from a collaborative gameplay point of view and was the perfect "hack n slash" game.

It was so perfect that a few weeks ago Blizzard released the exact same game, with an identical gameplay, even down to most bugs (some were fixed, like the eth armor upgrade bug). It's the same game, but with modern looking graphics.

Games were so good that more than two decades later you can re-release the exact same gameplay with updated graphics.

That's how perfect things already were.

So very hard disagree that "games had simpler gameplay".

Many game genre back then had already reached perfection from the gameplay point the view.

Since then we added shiny.


Not to mention some of the games back then had vastly more complexity because it wasn't as necessary for every game to have universal appeal on account of much smaller budgets.

I think the golden age of games is behind us but I'm looking forward to a new golden age of VR/AR sometime in my lifetime.


> Not to mention some of the games back then had vastly more complexity because it wasn't as necessary for every game to have universal appeal on account of much smaller budgets.

Indeed...

Ultima IV, Ultima V and then "Sundog: The Frozen Legacy" (on the Atari ST) comes to mind in my case.

There were, not arguably, but factually, more complex game than 99.5% of the games produced today.


Warcraft II is a bad example here - it's completely unplayable if you're used to any kind of modern RTS. Terrible unit AI (not computer AI), selection limits, no rally points or queuing. It also has just 2 factions. Starcraft, which only came out just over 2 years later, blows it away from gameplay perspective. Warcraft 3 came out in 2002 (6 and half years after Warcraft 2) and represents an even larger improvement in gameplay and a huge increase in complexity (all of the improvements from Starcraft carried over, 3D terrain, perspective and unit rendering, 4 factions). Arguably that's when things started to slow down in terms of improvements - the gap between Starcraft 2 (2010) and Warcraft 3 (2002) is much smaller than the gap between Warcraft 3 and Starcraft 1 (1998), which in turn is much smaller than the gap between Starcraft 1 and Warcraft 2 (1995).


Not to mention that Warcraft 3 was Blizzard's first ever 3D engine. It was actually carried over from an engine developed for Descent (https://news.ycombinator.com/item?id=20973306), effectively a simplified perspective shooter. Constraint sometimes bubbles up to the surface the best solutions. A bit late to this thread. I was the QA Graphics Lead for WarCraft 3.


I agree it’s usually less simpler gameplay and more simpler simulation. For some genres, notably RTS games they’re actually simpler in their modern incarnation. There’s basically a complexity limit for gameplay we definitely had already hit in the 90s. These days we can do much more complex simulation but usually the surfaced complexity isn’t much different.


Roller Coaster Tycoon was famously written in assembly by hand. There are some cool blog posts about it that have been posted to HN in the past iirc.


I love this video: https://youtu.be/ZWQ0591PAxM

As part of a Kickerstarter campaign, Morphcat Games made this video explaining how they eked out a really incredible game with only 40 Kb (a lot like Super Mario Bros 2). I definitely recommend checking this out as they go over interesting compression methods and more general thought processes.


Thank you for sharing. These tricks are really clever!


In addition, there was an extremely rapid increase of PC specs in the 1990s, and you if you had a 3-4 year old PC you probably could not play those games. "The PC you really want always costs $3500" as a famous columnist said.

People bought bleeding-edge games to show off their conspicuous consumption. Everyone else played them years later. People like me would stay late and play Decent, Quake, Warcraft, etc (RCT was a trailing edge game) on our work computers, because we certainly didn't have those kinds of specs at home until much later.


My impression (from being a teen at the time) is that this was still true at least until sometime in the 2000s.


Yea I stopped paying attention, but the last "show off" game I remember was Crysis in 2007. But at that point Valve games, Sims, and etc were very lower-end friendly.

The premise is just bizzare where OP thinks these bleeding-edge mind-blowing graphical games were somehow "efficient" and conservative in how they abused your hardware.


There was no other alternative. They had to be efficient. These were the only resources they had to work with.

Reading about the tricks that were used to make the most out of the limited resources is fascinating to me as well! We should not take our current hardware for granted!


So many great responses in this thread, but I’ll just throw it out there that the original Pokémon games for GameBoy were in hand-written assembly specific to the GameBoy. I would imagine “back then” the programmers had limited memory and thus more obvious constraints. It’s probably possible to recreate this level of efficiency nowadays but both the number of platforms and the expectations of gaming have exploded since the 90s. Not to mention the question if it is even profitable to do so. Back then, it wasn’t a matter of “efficient,” it was “will this run at all on the amount of memory we have?”


As far as I remember, Pokemon Gold/Silver were given to some special assembly optimization guru after the game was pretty much finished.

He was able to squeeze out so much memory that the Kanto region of the original games was more or less added as an afterthought.

And it was totally worth it, having the entire first game available in the sequel, just set a couple of years in the future, seeing how the world changed by "your" actions in the first games, blew the minds of many ten-year olds at the time.


Wow, thank you for this info, I didn’t know that. It’s fascinating what they did back then.


I am glad you found that interesting.

If you want to look the story up: The "assembly optimization guru" was Satoru Iwata, who would later become CEO of Nintendo of America, and who unfortunately died of cancer a couple of years ago.


Also the gameboy is effectively a game engine in hardware form. It's way easier to write the little bit of code to glue the hardware objects to your internal entities/game logic than to write engines from scratch on modern hardware.


Disagree. The Game Boy will take sections of RAM and render them to backgrounds and/or sprites. That's it.

Any connection between those hardware elements and objects in a game's state, as well as updates to that state, must be maintained and implemented by software, and that software is the game engine.

For example, on the NES, the hardware can display 64 8x8x2bpp sprites, on a roughly 64x64 tile background where 1/4 of it is visible at any given time. Mario Bros' uses the hardware to "draw" Mario as 8 sprites by adjusting registers per frame, but things like what happens when movement commands are received, when Mario hits bricks and other objects, does not involve the PPU at all.


This holds true for 99% of Game Boy games fyi. Afaik not many games were written in C during the actual release years.


This is also the reason they are so many weird bugs in the original Pokemon games; they reused storage and parts of code for different things, but if you exercise that in a specific order it starts doing weird things instead.


Broadly speaking, modern developers don’t have to deal with resource limitations (neither computing nor financial) so they instead devote that extra energy on completely unnecessary orchestration systems and build toolchains to feel like they’re doing something complex and challenging.


> Broadly speaking, modern developers don’t have to deal with resource limitations (neither computing nor financial) so they instead devote that extra energy on completely unnecessary orchestration systems and build toolchains to feel like they’re doing something complex and challenging.

That is massively far from truth. Rendering graphics to the level modern games require is driving even current powerful hardware to its limits and it's far from "not having to deal with resource limitations".

Your post seems to be horribly condescending and not realising just how much work goes into squeezeing performance from modern GPUs and consoles.


Sure, but there are <1% of the games being produced that need to tackle those problems. Look at the most of 10,000+ games released on Steam or the endless mobile games released over the past year.

For every Naughty Dog, Riot, Take-Two, CDPR, or Bethesda, there are thousands of garbage studios. Thus my “Broadly speaking” caveat.


I’d hazard a guess the VR development for platforms like Quest, where you’re effectively trying to render stereo at 90fps on mobile chipsets AND deal with the physical environment, is bringing back a lot of those “get every last cycle out of the hardware” skills.


There are definitely startups with enough cash to spend time polishing their their CI systems, but I suspect the pressure in the game industry means this doesn't happen. That said, I have zero experience in the game industry.


Developers “didn’t feel like”, they were actually doing something complex, given the limitations of that time and context.


Im not sure i understand, but I can tell you it felt good when one learned that setting the stack pointer and using push and pop was faster than using register indirect addressing, bit twiddling to avoid multiplying or dividing, and fitting a mm/dd/yy on 14 bits. Constraints fuels a certain type of creativity. This is not absent today, but less frequent.


Right. I think it's mostly subconscious. If you crank out CRUD app or some basic FPS in Unity, you're left wanting more and turn to the toolchain to find that complexity. 25 years ago, devs had no unmet craving after optimizing netcode for an 8-player game with 28.8k users or implementing another soundcard API by hand.


A game like RimWorld is also a lot more complex; the "AI" for your park's visitors in Rollercoaster Tycoon was quite simple; the AI of your RimWorld colonists is much more complex. Add to this the "time speedup" which is quite resource intensive (events happen faster → more CPU needed).

I'm sure RimWorld can be made more efficient; but it actually runs fairly well on my cheap laptop. There is, essentially, no real need. And any time spent on making the game run faster is taken away from all other development tasks (fixing bugs, adding features, etc.)


RimWorld, especially with the runtime garbage collector mod, is super super well optimized. Massive mod lists with 0 bugs or performance issues and the only thing that cripples performance at all is the "perfect pathfinding" mod, and we can't really blame them for not being able to improve on A*!


Dyson Sphere Program is maybe more interesting Unity game that can do impressive visuals Qnd scale with what is available.


Some suggestions. Go check out PICO-8. All games take 32k or less (discounting the system runtime)

https://www.lexaloffle.com/pico-8.php

Also JS13k

https://js13kgames.com/

I know that's not the same but the concepts are similar. The core of most of those games is only a few k, everything else is graphics and/or interfacing with modern OSes.

Older systems had much lower resolutions, they also used languages that were arguably harder to use. In the 80s it was BASIC (no functions, only global variables and "GOSUB"). Variable names generally were 1 or 2 letters making them harder to read. Or they used assembly.

90s (or late 80s) C started doing more but still, most games ran in 320x240, sprites were small, many games used tiles, many hardware only supported tiles and sprites (NES, Sega Master System, SNES, Genesis). It wasn't really until 3DO/PS1 that consoles had non-tiled graphics. PC and Amiga always did had bitmapped graphics but Atari 800, C64, the majority of games used the hardware tiled modes.


https://www.quora.com/Why-was-Roller-Coaster-Tycoon-written-...

Funny you should mention Rollercoaster Tycoon because it was actually written in Assembly for performance reasons.


For a rube like me who views assembly coders as masters of occult incantations, this is incredibly impressive


You should consider a mental framework of lean objectivity in the processes you code: what is to be done step by step operating with the real thing: memory. You want to turn that pixel on, so you should set that memory address to that value; structures become sequences of mentally labelled memory cells; your tools are the basic, sort of elementary: copy data, store somewhere the results of arithmetic and logic operations, branch conditionally etc. You reason in practical terms close to the actual low-level reality instead that through abstractions.

I think it helps develop a very good mindset, lean and faithful to base facts.


Assembly language is actually much simpler than even C, much less Lua or JS or Python. It just takes a lot more work to get anything done: five times as much code that's twice as hard to debug per line. Roller Coaster Tycoon is an incredibly impressive achievement, but not because writing it required memorizing a lot of occult trivia. It didn't. It just required effectively using the stuff you have available.


It’s still just programming and there are many assemblers that feature high level constructs to help out. I’d recommend trying out some Z80, 6502 or something more esoteric like uxn. Lots of emulators exist for old consoles. Then you’ll demystify it for yourself and look like a wizard to everyone else.


No XML Parsers. Limited i18n / l10n support in the OS. File formats that were serialized structs. Limited art (I remember looking at the time zone map in the resources of some Win 9X DLL and it was a palettized bitmap. The palette index for the selected time zone would be set to the highlight colors and the others to the unselected color). Less diversity in hardware meant simpler and fewer drivers, or just writing directly to video memory. Caring about the working set of programs because on a cooperatively multitasked system like Windows 3.1 with 4MB of RAM if you’re lucky it counts.


Software was simpler, so it was easier to optimize. Also, hardware was very limited and that meant you were forced to optimize if you wanted to sell anything.

The public was accustomed that the software should be nimble and run fast. If you would have given the regular Electron app to the '90s public they would have been displeased.

Hardware limits and developers being more thoughtful and less lazy are pretty much the answer.

In the 90s most software was C/C++ running on hardware. Now we have layers upon layers of abstractions. VMs, Kubernetes, Docker, jitted languages, OOP codebases written against Uncle Bob's preachings and GoF patterns. And the jitted language is best scenario. Many software is written in interpreted languages.

If anyone cares to make a trivial benchmark, it would be telling: write a simple C program and time it. Write the same in Java or .NET, but use a class for everything and introduce some abstractions like builders, factories etc. Run the program in a container in a virtalized Kubernetes cluster. Time it.


Michael Abrash wrote a book or two about it, and a lot of code, and was one of the experts in tweaking performance out of hardware back then.


Michael Abrash's Graphics Programming Black Book Special Edition

https://www.phatcode.net/res/224/files/html/index.html


I have this book, it was such an important and informative book at the time. It's so interesting that such an important book filled with a wealth of knowledge doesn't have much use these days.


This is probably an unsatisfying answer, but simply put, a lot of systems these days are just bloated. I'm not only talking about the OS/programs running (though it often contributes), but rather the creature comforts developers have taken on over the years in the name of productivity. In fairness, the scale of recent games wouldn't be quite so possible without our superfluous tooling, but it definitely comes at a cost. A great example is Minecraft, which was originally written in Java but later ported to C++ for more consistent performance. Java's extensive library support, object-orientation and quick-and-dirty graphics access made it a great tool for prototyping, but when it came to delivering a high-performance product it often fell short. There are many such anecdotes in the gamedev community, so choosing the right tool for the job is hugely important.

Dwarf Fortress and Rimworld are both interesting topics on their own though, and while I'm dreadfully underfamiliar with their codebases I do love the games to death. I'd guess that if you profiled them, the heaviest slowdown would be accessing/manipulating memory with hundreds of values across thousands of instances. Both games are extremely heavy-handed in their approach to simulation, so it wouldn't surprise me if that constituted the bulk of their usage. Dwarf Fortress itself is an interesting case study though, because it's world generation can take hours. As more years pass, it takes longer to generate, which is probably a testament to how many interconnected pieces it's weaving.


> Java...a great tool for prototyping, but when it came to delivering a high-performance product it often fell short.

I think this only been said in the context of games.


It really depends. Java isn't bad as server-side software, and there are benefits to using it's runtime. For client-side software though (particularly in 2012-2020), not many commercial PCs could play Minecraft at a decent framerate. Even now, feeding the Java version huge amounts of high-bandwidth memory is the only way to mitigate slowdown, and that still doesn't account for the micro-stuttering that you get when world generation occurs. In the context of Minecraft, it was a pretty obvious mistake. YMMV, but I'd still highly recommend against writing Java software for client-side stuff.


It's more understandable considering that it started as a Java applet which could be run in the browser.


The jetbrains IDEs are written in Java and they run very nice.

I'd wager the micro stuttering is GC?


We used to prototype game ideas in Java and I know a large embedded software firm who uses Java for prototyping. After it has been shown to work, they port to c/asm.


How about Valheim which takes "only" 1GB and has great world, graphics and gamepay?

Most of games these days weight so much due to high-res assets. 4K textures weight a lot that's why game with single map like COD: Warzone takes 200GB -.-

As we move more into realistic graphic I think this trend will reverse as this can be simulated with ML at real-time already. So you will get away with custom-ML-compressor and low poly assets that will be extrapolated at run time.


> managed to be created and function on computers that had 500 MB HDD, 300 MHz CPU, and 4 MB RAM.

Proceed from the other direction: you have those resources, what creatively can you do with them?

I think the three big differences are:

- nothing was "responsive", everything targeted a (pretty small) fixed resolution. Fonts were bitmap, ASCII, and fixed-width. No need to run a constraint solver to do layout.

- object orientation was rare. Data would be kept in big arrays. No virtual dispatch, minimal pointer-chasing (this is bad for pipelines), no reflection, no JITs, no GC pauses.

- immediate-mode GUIs rendering on a single thread. Draw everything in order, once. If something's off screen, ignore it as much as possible.

You can see all these in play in the Windows 3.0 -> 95 era. Classic "Windows Forms" apps render in response to a WM_PAINT message, in the (immediate) thread of the message loop, directly into the framebuffer. This could be seen by moving a window on top of a crashed app or part thereof - moving it away would leave a white space. Classic Windows apps are also not responsive - all the positions for controls are fixed pixel locations by the designer, so they look stupidly tiny on high resolution monitors. ( https://docs.microsoft.com/en-us/previous-versions/windows/d... )

Microsoft tried to supercede this with WPF/XAML and the compositing window manager Aero, but it's not been entirely successful as you can see by the number of obvious Forms dialogs you can find from Control Panel.

> Dwarf Fortress and Rimworld eventually both suffer from the same problem: CPU death.

Simulation games tend to expand to fill the space available. It's very easy to have complexity runaway of O(n^2) if you're not careful, which gets much worse at high numbers of objects. The trick there is to constrain the range of interactions; both what interactions are possible and over what range, so you can keep them in a quadtree/octree/chunk system and reduce the N input to each tick of the simulation.

Further reading: not just Masters of Doom, but Racing The Beam.


yes, it is possible to be that efficient now, but there is little incentive to do so.

it is far easier, and faster, to use a library that is bad than to write what you need efficiently.

"CPU is cheap." "RAM is cheap." two of the many cancerous sayings that developers use to excuse their poor efforts.

we don't do this now because we simply do not care enough as an industry. we want the software written immediately and for as little money as possible.


It's not just money.

Security is another reason. Buffer overflows are a thing of the past in application code except in very special cases (or silly tech stack choices).

But even if it were just money, so what? Time is money, buddy. I can be home to my family sooner by spending a few MB of your RAM. Sorry not sorry.


> we don't do this now because we simply do not care enough as an industry. we want the software written immediately and for as little money as possible.

It's not about "as little money as possible" it's about priorities. Above, someone mentioned spending 2 months individually packing structures for a game to reduce memory to fit. Given the choice between having a developer spend 2 months on that or a developer spend 2 months localising the game, or adding accessibility options the choice today is features.


I think the games that are pushing graphical boundaries today are prob just as efficient relative to how high fidelity they are and (have to) use just as many clever optimization tricks. It's just that there's so much room now to make something compelling from a gameplay and/or aesthetics POV but that is technically basic / cookie cutter / off the shelf


For modern-ish examples of very smart optimization-minded programmers still having to work to fit exceptional experiences into hardware limitations, I recommend Jonathan Blow’s talk about fitting Braid’s time rewind system onto the XBox, Natalya Tatarchuk’s talk on Destiny’s architecture, and Factorio’s blog posts on optimization.


Thanks! There was a good GDC talk from a Forza guy too that came to mind after reading your comment. Hmm lemme see if I can dig it up...aahh yup, here it is:

https://youtu.be/COqSiGFnK50


The games from '90s that are still remembered are the best ones. There were a lot of games that were slow and bloated for that time. In the '90s we've wondered why did the '80s games were so efficient. Games that used to run on 64KB of RAM or fit on floppy needed CD-ROMs and megs of RAM. This hardware was not considered small — it was huge compared to 8- and 16-bit computers and consoles before it.

At that time PC did not have hardware-accelerated scrolling and sprites like consoles and Amiga, so many PC ports of console games had higher hardware requirements and lower fps.

Windows 95 did not have a reputation of a small OS. It was seen as a complete waste of RAM for MS-DOS games. Microsoft had to invest a lot of effort to turn that around (with DirectX).

It's a matter of perspective. You can already wonder how Portal 2 is so small compared to multi-GB games. In 30 years when GTA 6 comes out with 7TB day-one patch, we're going to marvel how GTA 5 could fit in mere 30GB.


A nice talk on this: https://m.youtube.com/watch?v=kZRE7HIO3vk

The short of it is that game devs used to write code for the hardware as well. Most code now is written for a layer in between, but now there's so many layers that code is unreliable and slow.


For a very deep and specific dive, Fabien Sanglard has explained in detail Wolfenstein and Doom games technical implementation in his books Game Engine Black Book: DOOM and Game Engine Black Book: Wolfenstein. The writing is from the point of view of reverse engineering and understanding why they were implemented as they were.


It’s not that modern devs have forgotten techniques so much as they now spend their time on techniques that operate at much higher levels of abstraction.

Every abstraction layer is designed to be multi-purpose, meaning not optimized for a single purpose. As these layers accumulate, opportunities for optimization are lost.

A 90s game like Rollercoaster Tycoon was coded in assembly language on bare metal and limited OS services.

A modern game might have game logic coded in Lua, within a game engine framework coded in C++, which calls a graphics API like OpenGL, which all runs in an OS that does a lot more.

These modern layers are still clever and efficient and optimized using tricks as far as they can be, and the increased resource requirements come mostly from the demand for higher resolutions, higher polygon counts, and more graphical special effects.


Because people worked hard to do it. You also used a lot of frowned upon patterns. You bit twiddled. Today you use a guid, a 128 bit number, in string format, as an id. Back then you used as few bits as possible, as bits. As computers got faster and has more memory. We used better techniques, to make our code now robust and do more things. But we also lost sight is the premature optimization. Some of it would do no good today. Today it isn't usually with trying to trim a few extra megabytes... Back then you worried about every bit Guess, there are trucks. But the majority of them works probably get you fired if you tried to use them today.

That being said, better art is also very expensive. Consider expensive it is to display a 4k screen versus a vga screen with 256 colors.


Computers aren't that much faster if we consider there's only an order of a magnitude difference between 300mhz and 3ghz. In order to benefit from the full speed of our modern ~3ghz processors your problem needs to fit into about 32kb of memory, due to physics and locality. So size of RAM today is kind of a red herring. I think if you compare the sheer size and scale of what a Factorio game looks like once it turns into a struggle for resources, versus an old sim city game, while taking into consideration how much more complicated the mechanics of factorio pieces are, then that's clear evidence there's been a lot of progress in software engineering since the 90's.


> Computers aren't that much faster.

I don't know about that. I grew up in the 80's and was always trying to make games, first in BASIC, then in Turbo Pascal, and it was always a struggle to get the required speed. You always had to resort to ridiculous assembly language hacks, etc. At some point in the early 90s, I kind of quit trying to make games. Then in 2007, I was thinking about how much faster computers were "now", and it occurred to me to try to make a game. And so I wrote a little game kind of like Williams Defender in C[1], and I didn't have to do anything special to make it fast enough. I just wrote it in a straightforward way. No weird assembly language hacks, no opengl, nothing but just plain old GTK. "Only" an order of magnitude faster is a lot faster.

And you haven't even considered the GPU. If you start using OpenGL, suddenly you can do so much more. I think you're underestimating how hard it was just to even get the simplest thing to be fast enough back in the days of the 8086 or even a 486DX.

[1] https://smcameron.github.io/wordwarvi/


It's not a question of difficulty. It's not a question of skill. The issue is belief. Once you stop believing that assembly is just some weird hack is the moment it gets easy. In my experience when it comes to simple functions, plain code normally processes data at 400mbps and if you use the weird hacks it goes 40gbps which is two orders of a magnitude. With algorithms it gets even better when you can make things exponentially faster. Even something as constrained as old i8086 is too easy when we believe that nothing is beyond our ingenuity. People have built things like fighter jets using those chips and sent humans to the moon with less.


The source code for Doom is available, and much has been written about how it works.[1] Go look.

[1] https://doomwiki.org/wiki/Doom_source_code


For Doom, I’ve heard this book is highly recommended:

https://fabiensanglard.net/gebbdoom/


Fabien's The Game Engine Black Book Doom is so good if you're interested in how how it was possible to do something like Doom on the hardware of the time.

https://fabiensanglard.net/gebbdoom/

He also wrote one for Wolfenstein which is good as well.


Most of his books I say. Him and Holm I buy whatever they produce.


The biggest thing is lower resolution. 640x480 at 256 colors was a high-color, high-resolution mode at the time.

Small bitmaps, with indexed palettes and run-length-encoding to further shrink them.

Caches were tiny, memory was slower, hard disks massively slower, and if you had to hit a CD to load data, forget about it. So packing data was important, and organizing data for cache locality, and writing algorithms to take advantage of instruction pipelining and avoid branches.

Fabian Sanglard has a great blog that goes into the nitty gritty of a lot of these techniques as used in different games.

https://fabiensanglard.net/


There are no tricks, you just have to give up mountains of levels of abstraction accumulated over the years.

https://www.youtube.com/watch?v=Ge3aKEmZcqY


My old boss was in the video game industry from the late 90's until 2017 or so and would talk about this a lot.

It's less about old games being optimised so much as modern non-game software being mind-blowingly wasteful.

Modern, well optimised AAA stuff like Doom 2016 or The Last of Us 2 is as much a work of genius design (if not more) as Rollercoaster Tycoon or Warcraft II. If anything, Vulkan is bringing us closer to the metal than ever before.

It's just a general shift in consumer expectation over time. There is no longer any pressure for regular apps to be performant, so they aren't.


Before the late 90's it WAS about games being optimized. The earlier, the more optimization was required.

An example I like to use is Elite on the BBC Micro computer. They fit 8 galaxies of 250 stars each with a planet with its own economy, and a space station, along with real-time 3D wire-frame space flight with hidden line removal in less than 32KB of RAM on a 2 MHz 8-bit 6502-based system. It was incredible for its time. For me, it was more incredible than DOOM when it came out. I found Quake a lot more impressive than DOOM.

The second version written for the IBM PC was coded by hand in assembly language by the same programmer who coded Rollercoaster Tycoon, mentioned by the OP above.

https://en.wikipedia.org/wiki/Elite_(video_game)


One reason I'm not seeing mentioned when glancing over the comments is that outside the generic PC world, the hardware basically was the 'game engine API' for assembly code, e.g. the hardware designers built the hardware to be easily programmed by the 'application programmer', instead of a 'driver programmer'.

For instance checking if a mouse button is pressed on the Amiga is one assembly instruction to test a bit in a memory-mapped register. Compare that to all the code that's executed today between the physical mouse button and some event handler in the game code.

On 8-bit home computers like the C64, the video hardware was basically a realtime image decompressor which decoded a single byte into an 8x8 pixel graphics block, without CPU intervention. Scrolling involved just setting a hardware registers to tell the video hardware where the framebuffer starts instead of 'physically' moving bytes around. For audio output the CPU just had to write some register values to control how the sound chip creates audio samples.

On the PC the video and sound hardware is much more generic, much less integrated with the CPU, and there's a driver layer to hide hardware differences and mediate hardware access between different processes and on top of that are even more operating system and game engine layers, and that's how we got to where we are today :)


They used very low level programming and access to machine low level interfaces.

You could do the same thing today but spend at least 4-5 more time developing your software.

For example, I was a kid and used to fill the screen using Look up tables fo colors, you had access to the frame buffer and do things like displace the elements in the table, as a cycle, the entire image will change giving a movement impression. The work was done by the electronics of the framebuffer.

In the past the people that had access to real terminals did the same, change the information of the text on screen and the hardware will render it beautifully(even prettier that today's terminals). But those terminals were super expensive and proprietary(tens of thousands of dollars). Today shells are standardized, free and open source(and usually use much ugly fonts)with fonts from all over the world(terminals have very limited character sets).

Also game consoles did the same. The hardware did draw tiles that were previously drawn. You told the console what tiles to draw and the hardware did most of the work.

>Is it possible to recreate this level of efficiency on modern systems?

Of course you just need to pay the price: Development is so slow and painful. Low level programming is prone to very hard to find bugs.

Programming things like FPGAs or GPUs is hell in life. You work super hard, you get little reward.


Dwarf Fortress mostly has two bottlenecks for historical reasons: It's single-threaded and is a 32bit application, meaning it can use less than 3GB Ram. So interestingly a modern gaming PC could be worse at running Dwarf Fortress than an older one, as nowadays multithreaded performance is prioritized and anything beyond 4GB Ram doesn't help you.

And I guess at a certain world size good performance would need less detailed simulation in other parts of the game world.


64-bit support was added in Dwarf Fortress 0.43.05, released in July 2016.


I was once working for a startup formed by ex-Microsoft employees from DirectX team. I was part of a team to port Mortal Combat Deadly Alliance from Nintendo GameBoy Advance(GBA) to a very innovative Linux phone developed by the startup. I(as well as most of the team members) had no game development experience. We ported the game to the Linux mobile phone. The phone had no graphics acceleration and had very puny CPU and 16MB RAM :) The GBA though puny was custom built for games. So obviously our port was dismal in performance. We were getting 15fps and we were given a target of atleast 25fps so that it could be demoed to investors and at E3. We were able to pull it up to 23fps with various optimisation. But the hard requirement was 25fps. So the team met with the senior mangement to discuss the status. In the meeting one of the team members asked, "How did the game on GBA achieves 60fps with such low spec hardware?" and one of the management member(who was technical) said "Game developers(at the company that originally developed the game) have devilish minds." What he was saying was that game developers are highly resourceful in getting performance out of every last bit.

Some of the current game developers have lost that art because most of the games that are developed by these developers are on top of some massive framework.

One classic example of what level of understanding of the platforms earlier game developers had was to look at the very well known function from Quake for inverse square-root: https://en.wikipedia.org/wiki/Fast_inverse_square_root


> management member(who was technical) "Game developers(at the company that originally developed the game) have devilish minds."

Doesnt sound all that technical. TLDR is GBA has hardware sprite(128) engine with scaling and rotation support + 4 individual background layers with full hardware scrolling. Cpu doesnt have to move pixels around, just update sprite indexes. You can be a total idiot and still get 60fps on GBA.


If you wanna know about it, you should watch in youtube about making crash bandicoot. They were trying to make the gameplay as efficient as possible and making sure that user experience is not affected by it. For example, you only need to load data per level, in case big data needs to be loaded once per level, they will load it from the disc.

Another big problem they have to solve is how to make the crash character fits into the system with too many "models"


My thoughts:

- The difference between 8 or 16 bit graphics at 320x200 30 fps and modern hdr 2k at 120 fps is orders of magnitude less data to manage.

- Software developers were not abstracted as far away from the hardware as they are today.

- Most games ran on DOS which was basically “bare metal” vs windows/multitasking/background processes..

- And you HAD to make efficient use of the compute and ram when dealing with limited resources, which honed the skills of serious game devs.


> Most games ran on DOS which was basically “bare metal”

Remember how games made you pick your sound card because the OS didn't provide an API for this?


Remember EMM386 and HIMEM configs?


Sadly.


I've been working in games since the early/mid 90's, from before we had 3d accelerators right at the start of my career to today. Many of the comments I've read so far from other devs ring true, but seem to miss an important point.

3D accelerators, to a large extent, changed what we (game developers) needed to focus on. If I'm doing a software rasterization of a bunch of triangles, then the speed at which I sit in that tight loop rendering each pixel is of utmost importance. Getting a texture mapping routine down to 9 clock cycles matters. The moment I worked on a game that used 3d accelerators, that all changed. I submit a list of triangles to the 3D accelerator, and it draws them. It rarely matters how much time I spend getting everything ready for the 3D accelerator, since it is rendering while I am processing, and my processing for all intents and purposes never takes longer than the rendering.

Once we had that architectural paradigm shift, the clock cycles it takes for me to prepare the drawing list are entirely irrelevant. The entire burden of performance is in getting the 3d accelerator to draw faster. This means that optimization is in reducing the work it has to do, either through fancy shading tricks to simplify what it takes to make something look right, or reducing the amount of geometry I send to the 3d accelerator to draw.

The neat thing is that we've gone full circle in a way. When it used to be important to hand optimize the inner loop of a rasterisation function, now we have pixel, vertex, and fragment shaders. The pixel and fragment shaders are getting run millions of times, and we used to have to code shaders in shader assembly. Now they're generally written in higher level shading language, but nonetheless, optimizing shader code often becomes quite important. It feels like the same work as assembly optimization to me.


This is just a terrific question and discussion all around. Game dev is a hot topic on HN. And the world needs a dedicated Game Forum to discourse on topics of the moment ;)

I don't have any answer. But I am interested in the transition that took place mid-90s from 2D to 3D. The latter SNES era pseudo-2.5D sprite animations are masterpieces of digital art. And then: discontinuity. Low-poly first gen 3D game design arrives. And no one knows where to put the camera.

One thing I have noticed is that dev cycles are roughly the same. It takes roughly the same time to create a game for N64 or PS5: 18 months. But one is 50 megs the other 50 gigs. They took the time and care back then to get things right. Although many releases were still unplayable disasters

"What were the worst video game excesses of the 90s?" It's another question to learn from!


In short, "necessity is a helluva thing." It is not a technical explanation, but there is a great amount of truth in saying they were that way because they had to be to make it work.

Coming from another angle, the developers at that time considered those resources to be an amazing amount of plenty. Compared to, say, an 8086 with 128k or 256k of RAM, no HDD and maybe 512k or 1.44MB of floppy storage, those specs were huge.

If you're looking for other kinds minimal examples, I recommend glancing through the demoscene (https://en.wikipedia.org/wiki/Demoscene) stuff. To some degree, the "thing" there was amazing graphical outputs for minimal size.


I finally got around to reading "Masters of Doom" a few years ago, and it gave some color on how they thought about things, where they learned things, what hardware they were targeting, etc.

IIRC assembly language expert Michael Abrash makes an appearance.

I 100% recommend it!


There's a channel called GameHut on youtube that has some really interesting videos, eg https://www.youtube.com/watch?v=gRzKAe9UtoU


If you think those feats are impressive, the first game I remember playing is 3D Monster Maze on a ZX81, these are the specs Wikipedia lists:

> CPU > Z80 @ 3.25 MHz[4] > Memory > 1 KB (64 KB max. 56 KB usable)

Now that boggles my mind.


When needed, even today programmers achieve high efficiency levels. As an example, look at John McCarmack's write ups about the first oculus headset they developed. It was supposed to run entirely on Galaxy S7 phones. They had to devise an overclocked mode for the screen to be able to maintain a consistent 90Hz in order to avoid nausea. I used the Gear VR and it was true that no matter what content was being displayed, the headset never lagged or fluctuated. Really impressive considering it is only now that flagship phones have started offering 90Hz and 120Hz screens.


People were creative but worked with constraints; no need to do that anymore. At least a lot less, which, imho, is a shame. It is all very wasteful, but logically so.

Constraints can fuel creativity as well; the first Metal Gear, for the MSX, was supposed to be a Commando competitor, but because of the constraints of the hardware, it turned into a stealth game [0] (there is a better story from an interview with Hideo but I cannot find it now).

[0] https://metalgear.fandom.com/wiki/Metal_Gear


One of my favorite techniques you don't see at all today but looked amazing from that era is color cycling (or palette shifting). Its more of an artistic technique, but it shows how creatively people can think with and around constraints.

http://www.effectgames.com/demos/canvascycle/?sound=1

https://en.wikipedia.org/wiki/Color_cycling


The biggest trick was not caring for hardware other than the requirements you print on the box. VGA is dead simple to render on: you write bytes to RAM to turn pixels on. To play a sound, you write a pointer to your sound to a specific memory location and call an interrupt. Network support? What’s that? It’s amazing how efficient your code can be when you know the exact hardware layout you’re targeting and can ignore all the nice abstraction layers we have now.


There is no incentive to optimize code right now. Like, if I had an employee doing some academic low level stuff I would fire them for wasting my time, so unlike many tech spots that pretend developers would do this I also don't interview for that skillset either.

Just copy and paste the solution but conform it to our networking or state storage system.

The bottlenecks for most apps I work on (not games) are somewhere else, like the network.

So there's a lot of crap code that makes a lot of money.


Layers and layers of abstraction are created in scale with the abundance of memory and processing assumptions. Some people call it bloat. Others call it efficient.



nostalgia neurons activated, I can't see this and not think of Terminal Velocity aka Microsoft Fury3

oh, right, I own that on GOG (I buy more than I can play unfortunately). Guess I know what I'm doing tonight.


Personally I think the reason they were faster than today is abstractions. Today developing a video game for a browser doesn’t require you to know anything about the details of graphics or pixels or GPU. Such abstractions - such as the V8 engine or a video game library - allows ease of development, hence easier to hire people that can manage and understand it.

Unfortunately the abstraction comes at a cost - in CPU cycles.


I highly recommend looking through the flipcode archives here https://flipcode.com/archives/ to see some techniques from back in the day. I also recommend anything by Fabien Sanglard here https://fabiensanglard.net/


One point to make is that back then, consoles were the thing -- and while their specs were pitiful by today's standards, they had one huge advantage over PC's -- predictability. You knew exactly what hardware was going to be available. You could count CPU cycles by looking at assembly code.

The gist of programming console games of that period is I suspect somewhere between an Arduino and a Raspberry Pi.


Yes, it’s possible. Obviously things won’t get back to the VGA levels of size because we have higher resolution everything. But you can get similar results by thinking in terms of a memory budget, CPU budget, etc. and then measuring to make sure you stay within that. Once you’ve determined your constraints, then you’ll start optimizing when you bump up against them.


You are not calibrated for the actual hardware we had at the time. Nobody was running Descent on a 300MHz computer. It ran perfectly well on the 486DX2-66, or any Pentium. 300MHz would have been an impossible luxury, something you could just barely get in very high end workstations near the end of 1996. The first 300MHz Pentium didn't come out until 1999.


nitpicking your nitpicking :)

1997 https://en.wikipedia.org/wiki/List_of_Intel_Pentium_II_micro...

with everyone being able to afford fast 450MHz in 1998 https://en.wikipedia.org/wiki/List_of_Intel_Celeron_micropro...


Optimizing software means 99% percent of the time to not pessimize it. That is not write code that is not necessary, don't make the CPU run code which isn't needed to accomplish the task.

Casey Muratori explains it well: https://youtube.com/watch?v=pgoetgxecw8


YT link doesn't work


Thanks, I fixed it.


I got started in the 8-bit era. You had no option but to get down to the metal. Even then you often had to rely on “tricks” or quirks of the hardware to cut down the work required. 3D graphics and point and click GUIs in 1985: https://micro-editor.github.io/


Write C, get good


Honestly - a lot of the best early games were written in Assembler. C is kinda just one extra layer of abstraction. Even up through the PS1/N64 days there was still a lot of assembly stuff.


I was gonna suggest assembly but modern instruction sets are too complicated nowadays


It’s best if you follow examples for developing for - say - the SEGA Genesis - that’s how I got my start on Assembly. :)


For me it was Gameboy


I think the get good part really takes a meaningful job (that asks for better C skills) plus a good number of years.


You don't need a job to get good.


As with any software project, optimization is part of the trade-off equation. In those days, that type of optimization was a necessity. You just couldn't build a decent game without doing it.

It can still be done today of course, but it is less of a necessity. Teams choose to spend that time working on other aspects of the game instead.


Slightly off-topic but check out https://js13kgames.com/

"js13kGames is a JavaScript coding competition for HTML5 Game Developers running yearly since 2012. The fun part of the compo is the file size limit set to 13 kilobytes."

The results are really impressive !


It helped that your program was the only task and that it was running on bare metal.

There weren't a zillion layers of OS underneath.


I wonder how many years ahead a game using optimization tricks would look compared to a game without that focus.


Modern game libraries very much use “tricks” in critical paths as far as resources are concerned.


I'll count this as modern: GTA San Andreas not having any zones was pretty impressive.


Factorio’s visuals may be merely “nice”, but Factorio running a megabase in real time is pretty mind blowing, logically.


The games were of low resolution and low details which meant that games could render with limited resources.


take a look here : https://github.com/OpenSourcedGames/Descent-2/tree/master/SO...

esp the 3d dir, tons of assembly.


This recent Strangeloop talk on 8-bit game development covers some of the tricks used to squeeze performance out of a limited platform https://youtu.be/TPbroUDHG0s


A lot of the gameplay, AI, etc was far more simplistic. Descent for instance is quite barren.


Sometimes it feels that people just don't care about performance. In 2021, if you ship a game that doesn't weigh gigabytes of assets and take half a minute to load, people will think you didn't put much effort into the game.


For a start, don't try to draw 3 million tiles on a screen that's likely to have around 2 million pixels.

I'm guessing most of those tiles are off-screen, and you're creating a mesh for an entire game level, not just the visible area?


This is a cool story on how the Crash Bandicoot developers hacked toe PlayStation to get free up more memory for their game.

https://youtu.be/izxXGuVL21o


There was no frameworks or high level languages to add a bunch of bloat, maybe?


Considering the average spring boot application sucks up 300MB of ram upon launching, I am really impressed that the recommended system specs for the OG roller coaster tycoon is 16MB of dedotated wam...


Look into the 64k demoscene, lots of tricks and tips that still work today.


They were efficient because they had to do less compared to a modern game.


It helps to give some context to 90's game coding by looking at the predecessors. On the earliest, most RAM-starved systems, you couldn't afford to have memory intensive algorithms, for the most part. Therefore the game state was correspondingly simple, typically in the form of some global variables describing a fixed number of slots for player and NPC data, and then the bulk of the interesting stuff actually being static(graphical assets and behaviors stored in a LUT) and often compressed(tilemap data would use either large meta-tile chunks or be composited from premade shapes, and often streamed off ROM on cartridge systems). Using those approaches and coding tightly in assembly gets you to something like Mario 3: you can have lots of different visuals and behaviors, but not all of them at the same time, and not in a generalized-algorithm sense.

The thing that changed with the shift to 16 and 32-bit platforms was the opening of doing things more generally, with bigger simulations or more elaborate approaches to real-time rendering. Games on the computers available circa 1990, like Carrier Command, Midwinter, Powermonger, and Elite II: Frontier, were examples of where things could be taken by combining simple 3D rasterizers with some more in-depth simulation.

But in each case there was an element of knowing that you could fall back on the old tricks: Instead of actually simulating the thing, make more elements global, rely on some scripting and lookup tables, let the AI be dumb but cheat, and call a separate piece of rendering to do your wall/floor/ceiling and bake that limit into the design instead of generalizing it. Simcity pulled off one of the greatest sleights of hand by making the map data describe a cellular automata, and therefore behave in complex ways without allocating anything to agent-based AI.

So what was happening by the time you reach the mid-90s was a crossing of the threshold into an era where you could attempt to generalize more of these things and not tank your framerates or memory budget. This is the era where both real-time strategy and texture-mapped 3D arose. There was still tons of compromise in fidelity - most things were still 256-color with assets using a subset of that palette. There were also plenty of gameplay compromises in terms of level size or complexity.

Can you be that efficient now? Yes and no. You can write something literally the same, but you give up lots of features in the process. It will not be "efficient Dwarf Fortress", but "braindead Dwarf Fortress". And you can write it to a modern environment, but the 64-bit memory model alone inflates your runtime sizes(both executable binary and allocated memory). You can render 3 million tiles more cheaply, but you have to give up on actually tracking all of them and do some kind of approximation instead. And so on.


Please research the theory of constraints and how they drive creative solutions.


You should see some of the stuff that can be done with 64k of memory (c64)


(Disclaimer: I've only ever written extremely simple games and 3-D programs, and no 3-D games.)

Sure, you can recreate this level of efficiency on modern systems. But you have to throw most of the modern systems away.

httpdito is 2K and comes close enough to complying with the HTTP spec that you can serve web apps to any browser from it: http://canonical.org/~kragen/sw/dev3/server.s. Each child process uses two unshared 4K memory pages, one stack and one global variables, and there are three other shared memory pages. On my laptop, it can handle more load than all the HTTP servers on the entire WWW had when I started using it in 01992. It's only 710 lines of assembly language. I feel confident that no HTTP/2 implementation can be smaller than 30 times this size.

BubbleOS's Yeso is an experiment to see how far you can get wasting CPU to simplify doing graphics by not trying to incrementally update the screen or use the GPU. It turns out you can get pretty far. I have an image viewer (121 lines of C), a terminal emulator (328 lines of C), a Tetris game (263 lines of C), a real-time SDF raymarcher (51 lines of Lua), a death clock (864 lines of C, mostly actuarial tables), and a Mandelbrot browser (75 lines of Lua or 21 lines of Python), among other things. Most of these run in X-windows, on the Linux frame buffer, or in BubbleOS's own windowing protocol, Wercam. I haven't gotten around to the Win32 GDI and Android SurfaceFlinger ports yet.

On Linux, if you strip BubbleOS's terminal emulator executable, admu-shell https://gitlab.com/kragen/bubbleos/blob/master/yeso/admu-she..., it's only 35 kilobytes, though its glyph atlas is a JPEG which is another 81K. About a quarter of that is the overhead of linking with glibc. If you statically link it, it ends up being 1.8 megabytes because the resulting executable contains libjpeg, libpng, zlib, and a good chunk of glibc, including lots of stuff about locales which is never useful, just subtle bugs waiting to happen. There's a huge chunk of code that's just pixel slinging routines from these various libraries, optimized for every possible CPU.

Linked with shared libraries instead, an admu-shell process on this laptop has a virtual memory size (VSZ in ps u) of 11.5 megabytes, four megabytes of which are the pixmap it shares with the X server, containing the pixels it's showing on the screen. Several megabytes of the rest are memory maps for libc, libm (!), libX11, libjpeg, and libpng, which are in some sense not real because they're mostly shared with this browser process and most of the other processes on the system. There's a relatively unexplained 1.1-megabyte heap segment which might be the font glyph atlas (which is a quarter of a megapixel). If not I assume I can blame it on libX11.

The prototype "windowing system" in https://gitlab.com/kragen/bubbleos/blob/master/yeso/wercaμ.c only does alpha-blending of an internally generated sprite on an internally generated background so far. But it does it at 230 frames per second (in a 512x828 X window, though) without even using SSE. The prototype client/server protocol in wercamini.c and yeso-wercam.c is 650 lines of C, about 7K of executable code.

Speaking of SSE, nowadays you have not only MMX, but also SSE, AVX, and the GPU to sling your pixels around. This potentially gives you a big leg up on the stuff people were doing back then.

In the 01990s programs usually used ASCII and supported a small number of image file formats, and the screen might be 1280x1024 with a 256-color palette; but a lot of games used 640x480 or even 320x240. Nowadays you likely have a Unicode font with more characters than the BMP, a 32-bit-deep screen containing several megapixels, and more libraries than you can shake a stick at; ImageMagick supports 200 image file formats. And you probably have XML libraries, HTML libraries, CSS libraries, etc., before you even get to the 3-D stuff. The OS has lots of complexity to deal with things like audio stream mixing (PulseAudio), USB (systemd), and ACPI, which is all terribly botched, one broken kludge on top of another.

The underlying problems are not really that complicated, but organizationally the people solving them are all working at cross-purposes, creating extra complexity that doesn't need to be there, and then hiding it like Easter eggs for people at other companies to discover through experimentation. Vertical integration is the only escape, and RISC-V is probably the key. Until then, we have to suck it up.

Most of this doesn't really affect you, except as a startup cost of however many hundreds of megs of wasted RAM. Once you have a window on the screen, you've disabled popup notifications, and you're successfully talking to the input devices, you don't really need to worry about whether Wi-Fi roaming changes the IP address the fileserver sees and invalidates your file locks. You can live in a world of your own choosing (the "bubble" in "BubbleOS"), and it can be as complex or as simple as you figure out how to make it. Except for the part which deals with talking to the GPU, I guess. Hopefully OpenCL 3.0 and Vulkan Compute, especially with RADV and WGSL, will have cleaned that up. And maybe if the underlying OS steals too much CPU from you for too long, it could tank your framerate.

To avoid CPU death, use anytime algorithms; when you can't use anytime algorithms, strictly limit your problem size to something that your algorithms can handle in a reasonable amount of time. I think GPGPU is still dramatically underexploited for game simulation and AI.

Unreal Engine 5's "Nanite" system is a really interesting approach to maintaining good worst-case performance for arbitrarily complex worlds, although it doesn't scale to the kind of aggregate geometry riddled with holes that iq's SDF hacking excels at. That kind of angle seems really promising, but it's not the way games were efficient 30 years ago.

Most "modern systems" are built on top of Blink, V8, the JVM, Linux, MS-Windoze, AMD64, NVIDIA drivers, and whatnot, so they're saddled with this huge complexity under the covers before the horse is out of the barn door. These systems can give you really good average-case throughput, but they're not very good at guaranteeing worst-case anything, and because they are very complex, the particular cases in which they fall over are hard to understand and predict. Why does my Slack client need a gibibyte of RAM? Nobody in the world knows, and nobody can find out.


Well, how about this: how come Turbo C++ could run on 486s, when Clang uses both way more RAM and cycles than you could ever hope to have on a 486 computer?

The answer is: Everything.

If you are cynical, you could say that all of the resources are just being wasted on bad software that is bloated and coded lazily by less skilled people who only care about money. If you are optimistic, you might point to advancements in both the C++ language that requires more resources as well as improvements to compilers, like ridiculously powerful optimizers, static analysis, and language features, as well as the modularization of the compiler.

I think the truth is basically a bit of both, though probably mostly the latter. Software naturally expanded to require more because the hardware offered more. When computers ran 486s and Pentiums, spending 10x more resources on compile times was probably not a popular proposition. But computers becoming faster made compilers faster, too, so compilers could make more use of those resources. At the same time, they could clean up their codebases to remove hacks that reduce memory or CPU utilization at the cost of making code difficult to read, debug, understand, and modularize code to allow for compilers that are easier to expand, can provide language servers, support multiple language frontends, and support compiling to multiple architectures transparently.

What does this have to do with games? Well, a DOS game would more or less write directly to VGA registers and memory, but doing so is a sort-of fraught-with-peril situation. It’s hard to get 100% right, and it exposes applications to hardware implementation details. Plenty of games behaved strangely on some VGA cards or soundblaster clones, for example. Using an abstraction layer like DirectX can be a dramatic improvement as it can provide a higher level interface that shields the application from having to worry about hardware details, and the interface can do its best to try to prevent misuse, and drivers for it can be tested with extensive test suites. If a VGA card behaves differently from what a game expects and it doesn’t work, you’re SOL unless the game is updated. If a Direct3D game doesn’t work properly, even if it is ultimately due to a hardware quirk, it can usually be fixed in software because of the abstraction.

SFML is even further up. It is a library that abstracts the abstractions that abstract the hardware, to allow your application to run across more platforms. There could be three or four layers of abstraction to bare metal. They all do important things, so we want them. But they also introduce places where efficiency suffers in order to make a more uniform interface. Also, a modern app running on a modern OS incurs costs from other similar improvements: language runtimes, OS security improvements, etc. sometimes offer improvements that come at non zero CPU and memory costs that we were willing to eat because computers improving was more than offsetting it.

Programmers today have different skills and programs today have different challenges. It’s only natural that the kinds of things that made Sim City 2000 run well are not being applied to make modern games on modern computers.


RCT and other Chris Sawyer games would be an excellent example for a case study. They are both a product of the limitations of their time as well as Chris Sawyers self-chosen handicap of writing the entire game in assembly.

In addition, the source code for both RollerCoaster Tycoon and Transport Tycoon Deluxe have been decompiled by volunteers and released as OpenRCT2 and OpenTTD respectively. So we can actually get a glimpse at how the games worked originally.

Disclaimer: I am not an expert in both of these games, and the following examples may be wrong. In any case, you can just take these as hypothetical made-up examples.

As far as I remember, both games have a very fast heuristic for wayfinding.

In RCT the agents ("peeps") just wander most of the time (at the very least 50%) and just pick a path at random at each intersection. This is obviously very cheap even for many agents. Peeps also have the possibility to seek out an actual goal (like a specific ride in your park), but even then peeps do not employ global pathfinding to reach that target, but again they just check at each intersection which path would lead them closest to their target and move there.

This works well in most cases. But it is well-known to RCT players that certain patterns of paths can lead to peeps getting stuck in cycles when the heuristic fails. In addition, at least in RCT1, for that reason double paths were completely broken, as every path segment is an intersection and it becomes evident that peeps wander completely aimlessly.

The thing is: Players usually see this as a challenge rather than an annoyance. The game design even incorporates this, for example by giving you a negative "award" for a "confusing park layout", or scenarios which are built around the gimmick that the pre-existing path layout in the scenario is actively horrible and should be reworked asap. The problems of double paths actually make the game better as those are a naive and overly cheap solution for overcrowding (another mechanic which penalizes you for having to many peeps on a stretch of path at the same time.)

Another example of technical limitations turning into important gameplay elements can be seen e.g. in Starcraft. Only being able to select 12 units at the same time, and having wonky pathfinding, especially in chokepoints, is completely outdated. Still, having to deal with these limitations is actually considered part of required skill in high-level play.

In addition, some of these limitations are actually realistic, as people in a real amusement park will not have a perfect map of the park in their heads and just wander around like the peeps do. In game you also have the possibility to sell park maps, which are consulted by the peeps when they look for something. Theoretically, even if the game had implemented a perfect pathfinding algorithm, you could still cut down on the opportunities where you need to run that algorithm by tuning how often the peeps will check that map. Peeps also have a mechanic where they have a "favorite ride" which they seem to visit all the time. When they stay in close vicinity of the ride, even targetted pathfinding gets very easy.

Transport Tycoon actually had pretty much the same pathfinding algorithm as RCT. One of the first things that OpenTTD did was reworking the pathfinding. As you are building a railway network in that game, splitting the network into an actual graph, keeping that structure in memory, and running A* or something on it, is actually not that inefficient.

It seems that it would have been possible even on slower PCs, but remember that Chris Sawyer wrote the game in assembly, and having a simple pathfinding algorithm actually really helps managing the complexity. After decompiling to a higher-level language like C++, resolving these issues became much easier.

I also remember playing The Settlers 2 a lot as a kid. This game had a weird mechanic where you had to place flags on the map and you built paths between each of those flags. Each of those path segments would then be manned by one of your infinite supply of basic settlers. Those settlers would never leave the path, instead just carry items from one flag to the other, handing it off to the next settler. I have never found an explanation for this design decision, but I am pretty sure that the reason is that you don't have to induce a graph into your roadway system as the player is pretty much building the graph of the road network themselves.


The graphics were terrible.


Low resolution screens, 1 byte pixels, …


In DOS there were very specific memory configurations that you had to set with EMM386 and HIMEM in order to run certain games, better or at all.

Windows 95 took all that away for the better.


When these software were rewritten, at that time developer are consider intelligent enough to understand Physics, mathematics etc..Now a days a Online Degree make Developers. The So called next generation of programming hide some low level optimization and developer often do not by pass them. They were written for Business application, but they use in game application. unlimited resources, operating system efficiency lost (as they suffer from same problem, as well as more security code is added to them that somewhere bottelneck it) are some other stuff that has effect on efficiency.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: