The clever bit is that it's two lookup tables here: the big one stores the lighting details for a circle with a configurable radius around the player (that's one full lookup table per radius), but the second one is a pseudo-random ordering for the background rows. I only have time to actually update 1/20th of the screen each time the torchlight routine is called, but by randomizing the order a little bit, I can give it a sortof "soft edge" and hide the raster scan that you'd otherwise see. I use a table because the random order is a grab bag (to ensure rows aren't starved of updates) and that bit is too slow to calculate in realtime.
That looks pretty awesome, and the graphics are way better than any NES game I can remember, it looks more like an SNES level graphics. Since you are running on an emulator there, does the emulator lock you to NES level performance?
Yes, though full disclaimer, the cartridge is a tiny bit more powerful than your average NES game, and that is definitely contributing. The capabilities of the cartridge I'm using are something like Nintendo's MMC5 (used for Castlevania 3) but with a slightly more useful ExRAM implementation. The main benefit for this effect is that I can write to the nametable outside of vblank, and that helps with the double-buffer setup. The NES CPU isn't fast enough to process all of the enemies at once, so I batch the changes over time, and then present the updated board when the musical beat advances.
This should run on real hardware, I'm just waiting on a dev board from my cart supplier before I'll be able to test that properly. Mesen is what I use in the meantime. It is pretty darn accurate, and I haven't overclocked it or cheated any of the console's constraints away.
That looks like a pretty cool game! Were you inspired by Crypt of the Necrodancer? That's one of my favorite games. I was thinking about whether an NES (or another older console) would have been able to handle a game with those mechanics if they had been invented back then (and if so, to what extent), and it looks like you're proving it possible.
The 1/20 update works well because it gives the illusion of tiles popping into view naturally as the lamp moves. This is a cool effect to get away with.
Heh, cool. I always find it fascinating how games for the same console could get so much prettier as people learned to code for them. E.g. Mario 1 and 3 or the two Zeldas for N64.
The first look-up table effect that really impressed me was to use them for making a textured tunnel.
You have a look-up table such that for each pixel on the screen, you know its angle and distance from the center of the screen. With this you can pick which texel to put at each pixel location.
At first I thought the game Stardust must have used this effect, but reading up on it just now they actually just play a repeating 6 frame animation for the background! https://codetapper.com/amiga/sprite-tricks/stardust/
He gave one of my favourite GDC talks of all time [1]! I recommend this talk to anyone who is interested in the details and history of working on limited colour palette games!
I remember games that used 8-bit color palette cycling to do tear-free renders in an interesting way.
They would draw the next frame of the game direct to the screen in either the high nybble (4 bits) or the low nybble (alternating each frame).
The color palette was either one of two: one where the same color would be displayed regardless of the high nybble and the other palette had colors chosen where they were the same regardless of the bits in the low nybble.
To be sure the game threw away 256 possible colors to have instead only 16. But when the next frame was being rendered, the user was never aware until, at the very last instant, the next palette was slammed in revealing the new frame (and of course the rendering began for the next).
Yes, you could for example achieve the tunnel effect described above with a 16x16 or 32x8 texture using a 256-entry palette.
Unreal by Future Crew has an effect that comes to mind that is probably implemented exactly like that (on VGA mode 0x13), just more of a wormhole than a tunnel: https://www.youtube.com/watch?v=InrGJ7C9B3s&t=4m40s
These uses of LUTs are great and I'm happy to see such a neat article describing them. Also love the use of webgl and the ability to upload your own data!
I was a bit surprised however by the way colour grading using LUTs was presented. The article makes it sound like a cool niche solution used by L4D2, but it's been the industry standard for a long while and was used on every game I've worked on, from AAA games like NFS (2015) to larger indie games like Lost in Random.
Glad you liked it! That's quite the incredible career. Lost in Random seems just like my taste. Guess who just bought it and is enjoining it tonight...
> but it's been the industry standard for a long while
Yeah, I mixed messages with my colorful writing style I guess, thought the passage "Using 3D LUTs to style your game’s colors via outside tools is a very well known workflow." was enough to state, that it's not niche use. Clarified now with an extra sentence.
Loved your article! I just wanted to expand a bit on your last section. The Game Boy Advance did offer somewhat support for integer division, but it's not hardware-accelerated [1]. I suspect the LUT model was for performance reasons (and rightfully so!).
The GBA has native support for integer division, but it does not have any hardware support for floating point arithmetic at all — compilers will insert a virtual floating point unit that does IEEE 754 in software, which takes hundreds of CPU cycles for most operations. I believe the LUT you refer to is for floats.
All these animated gifs are annoying and distracting, and I can do without the unfunny boomer memes from 2010. It's just annoying to try and read text with all these colorful gifs being spammed.
Battery optimization disables the top video element of a muted video once you scroll past it, which stops the WebGL samples from playing. Landing in the middle of the article from another page also prevents the WebGL samples from working with video, which is supposed to illustrate the realtime processing aspect. But unmuted cannot be auto played.
I'll rethink how I approach this for my next article. Maybe a pause button under every WebGL sample?
Even in boring old business process land, it’s surprising how useful this is.
Often you can simplify a code base full of conditional statements with a tidy lookup table.
I think the reason this isn’t always thought of is because a lookup table might end up with 50k rows even for something that might seem simple. This might some like a lot to end users, but thankfully the computer doesn’t mind.
Also, the look up tables are so repetitive that the tables are usually pretty easy to manage, and still worth it even when they aren’t.
On top of that, they can be configured by end users without code changes.
All in all, a really useful concept, especially for a lot of scenarios that always pop up when coding business logic.
Retro games used a ton of tables. Back then memory speeds were blazing fast, but processors were sluggish, so it made sense to pack as much computation as possible into tables. The more clever you were about it, the fancier games you could make.
Not always, though it might depend on what platform you mean with retro. Kaze Emanuar on YT does a lot of development for the N64 and it feels like half the time he talks about how the memory bus affects all kinds of optimizations. In Mario 64 he replaced the original lookup table for the sine function because an approximation was faster and accurate enough (or rather two approximations for two different purposes).
I love that channel, he reworked the entire Mario 64 code [0] to make it run at stable 60FPS...because he wanted his mods to run faster.
Back when I started I thought I would make games. I used a lookup table for cos/sin kept as an integer. I only needed enough precision for rotation on 320x240. It was something like 20-30 cycles faster per pixel. Even more if you didn't have a FP co-processor.
By retro platform GP meant Atari ST and Commodore Amiga and the like: LUT were the name of the game for everything back then. Games, intros/cracktros/demos.
Heck even sprites movements often weren't done using math but using precomputed tables: storing movements in "pixels per frame".
It worked particularly well on some platforms because they had relatively big RAM amounts compared to the slow CPU and RAM access weren't as taxing as today (these CPUs didn't have L1/L2/L3 caches).
The speedup from the approximation wasn't that much if anything. He made his real improvements elswhere. But your point stands that memory speed really has moved the goalposts on what is feasible to speed up through using precalculation tables and if you can do it with math then that is often much faster.
I remember the days of resticting rotation accuracy to 360/256ths of a degree so it would fit on a byte, which then would be an index into trig lookup tables :)
I miss the days when professional software development was more akin to this sort of thing, rather than gluing together 20 Javascript frameworks on top of someone else's cloud infrastructure.
If you work with LUTs a lot, I work on a Mac app that handles all kinds of advanced color science and conversion tasks: https://videovillage.com/lattice
A little while back I got into the NES homebrew scene, and I recall seeing a game, (Bobl) that had incredible water physics - way more than system would have been capable of calculating. It turns out that it was a Lookup table and it really opened my eyes as to what you can accomplish with them and how simple tools can look like very complex processes.
> (Bobl) that had incredible water physics - way more than system would have been capable of calculating.
I'm not denying the effect is impressive, but assuming you're referring to the surface ripples/capillary waves, would a 1D cellular automaton be beyond the abilities of the NES to calculate in real time?
A Podcast Of Unnecessary Detail just did an episode talking about the SNES Doom port and how it used LUTs for trigonometry as the SNES didn't have a graphics processor.
It absolutely did! Even the NES had one. In addition to the (cloned) Motorola 65c816 SNES CPU, the PPU (picture processing unit) was a custom chip that offered several different layouts for spitting out tile based backgrounds and sprites as each scanline physically drew across the CRT TV. It was even capable of fancy things like background rotation and scaling (Mode 7, think Mario Kart) and hardware transparences which you can see in many games.
Furthermore, in the case of Doom (and games like Starfox) the cart was baked in with the "Super FX" chip that handled the 3d calculations. So you were actually dealing with two graphics processors for that specific title, heh.
Here's a wonderfully (and painfully) detailed series about the SNES hardware by Retro Game Mechanics Explained:
It fell out of favor because OG Xbox was bad at it and Photoshop doesn’t support it. But, modern hardware can totally handle palette swaps make in AESprite.
I assume you mean the move to 16 bit color framebuffers, not systems with 16 bit CPUs? That was about the point where palettes and sprite and tile based 2D acceleration stopped being relevant.
The SNES and Genesis definitely had hardware sprites. The PS1 and PS2 worked with palettized textures 99% of the time and had hardware quads. The N64 could do palettized textures and quads. The same for the OG Xbox, but it was a big speed hit vs DXTC. With the Xbox360 and PS3 you could implement palettized textures as a shader. But, only the most special cases bothered.
It fell out of favor when we could just store 1000x more sprites. It was memory, not instruction-set. Tinting then happened as a pre-build step until finally shaders replaced the practice entirely.
Instead of “fell out of favor” I shoulda said “artists stopped being informed of its existence and occasionally invented poor approximations to it from first principles”. Such as tinting sprites to approximate palette swaps of the retro style they are referencing.
You can do so much more than tint with palettes. The animations in the link I posted each a composed of a single 8-bit image and a function to rotate specific sections of the palette.
I’m intimately familiar with palette animations. You’re right though, it’s a technique that was lost due to hardware no longer limiting the artists. I still find it to be far superior to sprite tile replacement animations.
what: the most complicated thing (that I've actually used in other projects) was a data-driven precedence-climbing parser (which, since sed is usually lousy at dealing with place-value numerics, required sussing out sed-friendly representations of the core data types).
why: because given a "weird machine" that is nearly omnipresent (in the back end, anyway) and is also a Perlis language, who among us would not wish to program it? (and who among us would willingly admit that 1959's https://en.wikipedia.org/wiki/IBM_1620#Model_I programmers had had better code-fu?)
I must go down to the sed(1) again, to the pattern space and the hold,
And all I ask is a t branch and a s/ub/st/ to map and fold;
And the Jolt’s kick and the keys' clack and the write lines breaking,
And a green glint in mirror shades, and a grey* dawn waking.
* "the color of television, tuned to a dead channel"; no Ἠὼς Ῥοδοδάκτυλος here!
If I had a dollar for every time someone who had 30 seconds of accumulated knowledge about Dynamic Programming tried to argue with me that caching and memoization are the same thing, I'd have enough for a nerf bat to hit them with.
(Briefly: memoization is local shared state, caches are global shared state, which is a whole other set of problems. And caching is hoping you'll need something eventually, where as memoization is knowing you will need it immediately.)
A couple years ago I watched a very long video on Dynamic Programming and learned a new term: Tabulation. If memoization is encountering a common sub-problem and holding onto it, tabulation is seeking them out, answering the sub-problems before they can come up organically.
It's much harder to derive the tabulation answer, but I find it's usually easier to reason about after the fact. Which is enormously helpful when you go back to add new features or fix old bugs.
Look-Up Tables are fixed-size tabulations. The space complexity of tabulation is often ≥ O(n), while LUT are a fixed size, but big enough to reduce the computation time by a constant factor, like 10 or 100.
It sounds like your distinction of tabulation vs memoization is similar to the concept of bottom-up vs top-down DP. Bottom-up: build the table up eagerly, top-down: recurse down and fill in the table lazily.
Yeah I'm not sure how widespread his nomenclature was. I was just glad to have more vocabulary for people who have trivialized the entire discipline down to 'caching'.
This reminds me, we used a command line tool (originally made for Warcraft 2?) to take full color art assets and find a palette of 200 or so colors (the rest reserved for UI elements and the player characters) for the different areas in Diablo 2, and then remapped all the art assets to these small palettes, and then reuse the monster art with other colors within these palettes for varieties of monsters. I don't remember spending more than a day on putting this in our art pipeline, the guy who wrote the tool did all the hard work.
Yes, if you make the exact same inputs you will get the exact same outcome. Demos and multiplayer rely on this, they just record player inputs, nothing about enemy movement or anything like that. This is also why you can leave a multiplayer game in the middle of a match, but you can't join one in the middle
Other games use seeded random number generators to be entirely deterministic even though random; Factorio does this to allow multiplayer without having to sync the entire game state all the time.
Yeah, developer commentaries are a goldmine in general for small gold nuggets of knowledge like that. Never would have come up with such an use-case either.
Basically, all of the Valve ones.
(Half-life 2, Half-life 2 Episode 1, Half-life 2 Episode 2, Team Fortress 2, Left 4 Dead, Left 4 Dead 2, Half-Life Alyx)
Each one of them is a time capsule of the techniques and game design philosophies at the time of development. Pretty sure there is a walk through on youtube for every single one of them, if you don't want to get the games yourself.
They are extra useful to precalculate other visual effects. In the past, following a magazine tutorial, I coded a very cheap raytraced "lens effect" on a 286. I recently (5 years ago!) ported it to js [0].
That magazine got me into fractals and raytracing, BTW.
Lookup tables are great to produce impressive graphics effects under strict
compute limits. I use them in many FPGA projects. On FPGA there is a balance
between limited memory (often BRAM) and limited compute, but they can be used to
great effect:
Tunnel effect (exactly as user bemmu described below):
Sine tables are also very typical, for animating things on screen or plain
trigonometry, for instance my small fixed integer raytracer uses a sine table
to animate the spheres (and the cos is easy to get from the sin, especially if
the table has a power of two size ;) ):
I was the author of that piece. This blog is about graphics LUTs, which fall into point (4) of the "when to use lookup tables" section. GPUs have dedicated constant memories that were specifically made for holding lookup tables so that they can be accessed quickly and deterministically. Graphics LUTs are a really powerful tool because of that, and have none of the performance shortcomings.
Can confirm CPU part. I once tried to replace a 2d stepwise linear function interpolations (basically, a loop finding the right step, then linear interpolation, on a double; the function's objective is to rescale inputs before they become regression's arguments) by LUTs.
All of this looping was happening in another tight loop, and there were multiple LUTs.
It was a failure with overall decreased performance and increased cache misses. Plus, I could never get the same precision.
It could be added and it was a feature on the Gamecube CPU for example - you were able to lock the cache and then, IIRC, half the cache was used as normal and half was at your disposal for very fast access to look up tables or data you were using over and over again. Some code to deform 3D meshes when animating them using a skeleton used the locked cache to speed up that operation greatly by having the bone matrices at hand.
A problem I imagine on modern processors is the cache might be shared amongst several cores and interrupts etc.
Back in the nineties, 3DFX made a speed of light rendering demo for the Voodoo1 where they took over the entire machine and through intimate knowledge of cache behaviour effectively set aside an area in the L1 cache that would contain some data they could stream in and speed up their triangle setup.
You absolutely can tell the CPU what to cache, and also what not to cache (such as PCIe memory-mapped registers), it's just that that's privileged instructions which no sensible multitasking operating system will let you have access to. Because it would have to be reset on every task switch.
I’m still disappointed that you can’t boot a modern processor with its 8MB of L2 cache and run Windows 95 entirely in cache. Stupid requirements to have a backing store.
I think if you flip the right MSRs, you can actually run in this mode on Intel. BIOSes used to do DDR3 calibration in this mode, and needed to run a core DRAM-less to do it.
AMD Zen platforms have always put that functionality (ironically) in the secret platform security coprocessor.
The real probable reason is that you will do it badly. The PREFETCH instructions are the closest things, which are mostly just a hint to the prefetcher. If you had actual cache control (on multi-tenant systems especially), programmers would probably overuse it.
https://twitter.com/zeta0134/status/1756988843851383181
The clever bit is that it's two lookup tables here: the big one stores the lighting details for a circle with a configurable radius around the player (that's one full lookup table per radius), but the second one is a pseudo-random ordering for the background rows. I only have time to actually update 1/20th of the screen each time the torchlight routine is called, but by randomizing the order a little bit, I can give it a sortof "soft edge" and hide the raster scan that you'd otherwise see. I use a table because the random order is a grab bag (to ensure rows aren't starved of updates) and that bit is too slow to calculate in realtime.