Hacker News new | past | comments | ask | show | jobs | submit login
How video games use lookup tables (frost.kiwi)
673 points by todsacerdoti on Feb 29, 2024 | hide | past | favorite | 106 comments



Lookup tables are the only reason I was able to get this effect working at all:

https://twitter.com/zeta0134/status/1756988843851383181

The clever bit is that it's two lookup tables here: the big one stores the lighting details for a circle with a configurable radius around the player (that's one full lookup table per radius), but the second one is a pseudo-random ordering for the background rows. I only have time to actually update 1/20th of the screen each time the torchlight routine is called, but by randomizing the order a little bit, I can give it a sortof "soft edge" and hide the raster scan that you'd otherwise see. I use a table because the random order is a grab bag (to ensure rows aren't starved of updates) and that bit is too slow to calculate in realtime.


That looks pretty awesome, and the graphics are way better than any NES game I can remember, it looks more like an SNES level graphics. Since you are running on an emulator there, does the emulator lock you to NES level performance?


Yes, though full disclaimer, the cartridge is a tiny bit more powerful than your average NES game, and that is definitely contributing. The capabilities of the cartridge I'm using are something like Nintendo's MMC5 (used for Castlevania 3) but with a slightly more useful ExRAM implementation. The main benefit for this effect is that I can write to the nametable outside of vblank, and that helps with the double-buffer setup. The NES CPU isn't fast enough to process all of the enemies at once, so I batch the changes over time, and then present the updated board when the musical beat advances.

This should run on real hardware, I'm just waiting on a dev board from my cart supplier before I'll be able to test that properly. Mesen is what I use in the meantime. It is pretty darn accurate, and I haven't overclocked it or cheated any of the console's constraints away.


Where's good to follow for news of a release? Patreon? I don't twitter.


That looks like a pretty cool game! Were you inspired by Crypt of the Necrodancer? That's one of my favorite games. I was thinking about whether an NES (or another older console) would have been able to handle a game with those mechanics if they had been invented back then (and if so, to what extent), and it looks like you're proving it possible.


The 1/20 update works well because it gives the illusion of tiles popping into view naturally as the lamp moves. This is a cool effect to get away with.


Heh, cool. I always find it fascinating how games for the same console could get so much prettier as people learned to code for them. E.g. Mario 1 and 3 or the two Zeldas for N64.


Also, 2d games have one significant advantage over many 3d titles: you are not burdened with extra "camera operator" task so much


> I use a table because the random order is a grab bag (to ensure rows aren't starved of updates)

I think the search keyword for this is "quasirandom", FWIW.


Wow! Impressive! At first I thought it was just pretty cool. And then I saw you're doing it on a NES game. Bravo.


Very nice work! Definitely looks way better than the contemporary NES attempts at doing it (thinking of MMC1 Ultima Exodus here)


The first look-up table effect that really impressed me was to use them for making a textured tunnel.

You have a look-up table such that for each pixel on the screen, you know its angle and distance from the center of the screen. With this you can pick which texel to put at each pixel location.

It looks as if you are moving in a tunnel with 3D geometry, but it's so cheap you can even do it on pico: https://www.lexaloffle.com/bbs/?pid=63818

At first I thought the game Stardust must have used this effect, but reading up on it just now they actually just play a repeating 6 frame animation for the background! https://codetapper.com/amiga/sprite-tricks/stardust/


If you view color palette as a lookup table, palette cycling was very common and feels very related.


Mark J. Ferrari was incredible at utilising palette cycling: http://www.effectgames.com/demos/canvascycle/


He gave one of my favourite GDC talks of all time [1]! I recommend this talk to anyone who is interested in the details and history of working on limited colour palette games!

[1] https://youtu.be/aMcJ1Jvtef0


Love that video, his passion for the art form really comes through.


Makes me think about the area 5150 demo

https://www.youtube.com/watch?v=fWDxdoRTZPc

edit: This comment really summarizes just how impressive the demo is https://old.reddit.com/r/Demoscene/comments/wjxqve/area_5150...


Witchcraft!


And goes to the next level: http://www.effectgames.com/demos/worlds/

(Make sure you're properly seated, then click Show Options »)



I remember games that used 8-bit color palette cycling to do tear-free renders in an interesting way.

They would draw the next frame of the game direct to the screen in either the high nybble (4 bits) or the low nybble (alternating each frame).

The color palette was either one of two: one where the same color would be displayed regardless of the high nybble and the other palette had colors chosen where they were the same regardless of the bits in the low nybble.

To be sure the game threw away 256 possible colors to have instead only 16. But when the next frame was being rendered, the user was never aware until, at the very last instant, the next palette was slammed in revealing the new frame (and of course the rendering began for the next).


Color palette definitely counts. I remember the lookup table Quake used to map its 256 color VGA palette to itself to implement light levels: https://quakewiki.org/wiki/Quake_palette#Colormap

You now basically have a two step lookup: palette_rgb[colormap[color, light]]


Yes, you could for example achieve the tunnel effect described above with a 16x16 or 32x8 texture using a 256-entry palette.

Unreal by Future Crew has an effect that comes to mind that is probably implemented exactly like that (on VGA mode 0x13), just more of a wormhole than a tunnel: https://www.youtube.com/watch?v=InrGJ7C9B3s&t=4m40s


An advanced use of this principle is POM. You can even do self shadowing!

https://web.engr.oregonstate.edu/~mjb/cs557/Projects/Papers/...


There is a little button "Code ▽" under the demo if you want to see how it's working!


Looking at the symmetries of specific cases, you may find convenient solutions.


Here is video how wind waker uses many luts to do its unique look (BoTW and ToTK use the same techniques) https://www.youtube.com/watch?v=mnxs6CR6Zrk


Many thanks for submitting <3 Author here, in case of any questions.


Hello and thank you for the great article!

These uses of LUTs are great and I'm happy to see such a neat article describing them. Also love the use of webgl and the ability to upload your own data!

I was a bit surprised however by the way colour grading using LUTs was presented. The article makes it sound like a cool niche solution used by L4D2, but it's been the industry standard for a long while and was used on every game I've worked on, from AAA games like NFS (2015) to larger indie games like Lost in Random.


Glad you liked it! That's quite the incredible career. Lost in Random seems just like my taste. Guess who just bought it and is enjoining it tonight...

> but it's been the industry standard for a long while

Yeah, I mixed messages with my colorful writing style I guess, thought the passage "Using 3D LUTs to style your game’s colors via outside tools is a very well known workflow." was enough to state, that it's not niche use. Clarified now with an extra sentence.


I really hope you'll enjoy the game! It's honestly my favourite project I've been on so far.

If you want a little insight into my corner of the making of LiR, check out this post I wrote about the fog: https:// agentlien.github.io/fog


preshaping before the 3d lut, such as with a 1d lut, is a common way to handle hdr tone mapping as well. See for example https://gpuopen.com/wp-content/uploads/2016/03/GdcVdrLottes....


Loved your article! I just wanted to expand a bit on your last section. The Game Boy Advance did offer somewhat support for integer division, but it's not hardware-accelerated [1]. I suspect the LUT model was for performance reasons (and rightfully so!).

[1] https://www.copetti.org/writings/consoles/game-boy-advance/#...


The GBA has native support for integer division, but it does not have any hardware support for floating point arithmetic at all — compilers will insert a virtual floating point unit that does IEEE 754 in software, which takes hundreds of CPU cycles for most operations. I believe the LUT you refer to is for floats.


There was a typo that was melting my brain until I confirmed that this was not some alternate plotting library.

"Here is every single colormap that matlibplot supports,..."


Ohh, that's funny. I just realized, I always read it the wrong way around in my brain up until now! Corrected the typo.


Amazing write-up. Thanks for sharing.


All these animated gifs are annoying and distracting, and I can do without the unfunny boomer memes from 2010. It's just annoying to try and read text with all these colorful gifs being spammed.


May I suggest reader view as a solution if you use Firefox?

But this is a technical topic in a visual concept. I appreciate visualizations and absolutely praise shaders in contexts like these.


Usually I'd agree but I can only see embedded videos which respect the disabled autoplay setting.

Edit: nevermind, once started it's impossible to pause the videos, they always continue playing after scrolling.


I was not sure how to handle that.

Battery optimization disables the top video element of a muted video once you scroll past it, which stops the WebGL samples from playing. Landing in the middle of the article from another page also prevents the WebGL samples from working with video, which is supposed to illustrate the realtime processing aspect. But unmuted cannot be auto played.

I'll rethink how I approach this for my next article. Maybe a pause button under every WebGL sample?


Even in boring old business process land, it’s surprising how useful this is.

Often you can simplify a code base full of conditional statements with a tidy lookup table.

I think the reason this isn’t always thought of is because a lookup table might end up with 50k rows even for something that might seem simple. This might some like a lot to end users, but thankfully the computer doesn’t mind.

Also, the look up tables are so repetitive that the tables are usually pretty easy to manage, and still worth it even when they aren’t.

On top of that, they can be configured by end users without code changes.

All in all, a really useful concept, especially for a lot of scenarios that always pop up when coding business logic.


> On top of that, they can be configured by end users without code changes.

and that is how you end up with a DSL.


Here we go, knew someone would jump in here with their opinions on why this is bad etc.

You’re reading way too much into this, this simply solves a particular set of problems well in this context and that’s that.

Anything you use can get turned into something ugly given enough time and lack of supervision.


> their opinions on why this is bad etc.

where did i say it's bad? DSLs aren't inherently bad (or good for that matter).

> You’re reading way too much into this

i think you're projecting your own ideas onto me!


Ha, sorry, guess I was a bit defensive. Forget sometimes not everyone is a jerk by default online.


Double buffered lookup tables!


Retro games used a ton of tables. Back then memory speeds were blazing fast, but processors were sluggish, so it made sense to pack as much computation as possible into tables. The more clever you were about it, the fancier games you could make.


Not always, though it might depend on what platform you mean with retro. Kaze Emanuar on YT does a lot of development for the N64 and it feels like half the time he talks about how the memory bus affects all kinds of optimizations. In Mario 64 he replaced the original lookup table for the sine function because an approximation was faster and accurate enough (or rather two approximations for two different purposes).

I love that channel, he reworked the entire Mario 64 code [0] to make it run at stable 60FPS...because he wanted his mods to run faster.

[0] https://www.youtube.com/watch?v=t_rzYnXEQlE


Back when I started I thought I would make games. I used a lookup table for cos/sin kept as an integer. I only needed enough precision for rotation on 320x240. It was something like 20-30 cycles faster per pixel. Even more if you didn't have a FP co-processor.


By retro platform GP meant Atari ST and Commodore Amiga and the like: LUT were the name of the game for everything back then. Games, intros/cracktros/demos.

Heck even sprites movements often weren't done using math but using precomputed tables: storing movements in "pixels per frame".

It worked particularly well on some platforms because they had relatively big RAM amounts compared to the slow CPU and RAM access weren't as taxing as today (these CPUs didn't have L1/L2/L3 caches).

The N64 is already a more modern machine.


The speedup from the approximation wasn't that much if anything. He made his real improvements elswhere. But your point stands that memory speed really has moved the goalposts on what is feasible to speed up through using precalculation tables and if you can do it with math then that is often much faster.


I remember the days of resticting rotation accuracy to 360/256ths of a degree so it would fit on a byte, which then would be an index into trig lookup tables :)


I miss the days when professional software development was more akin to this sort of thing, rather than gluing together 20 Javascript frameworks on top of someone else's cloud infrastructure.


I'm with you. The current thing really isn't fun anymore :(


haha yes, we used 2π / 256 and we called them brads (binary radians).


If you work with LUTs a lot, I work on a Mac app that handles all kinds of advanced color science and conversion tasks: https://videovillage.com/lattice


Those tools look awesome! Makes me wanna make a movie :)


Do it!


A little while back I got into the NES homebrew scene, and I recall seeing a game, (Bobl) that had incredible water physics - way more than system would have been capable of calculating. It turns out that it was a Lookup table and it really opened my eyes as to what you can accomplish with them and how simple tools can look like very complex processes.

https://morphcatgames.itch.io/bobl


> (Bobl) that had incredible water physics - way more than system would have been capable of calculating.

I'm not denying the effect is impressive, but assuming you're referring to the surface ripples/capillary waves, would a 1D cellular automaton be beyond the abilities of the NES to calculate in real time?


A Podcast Of Unnecessary Detail just did an episode talking about the SNES Doom port and how it used LUTs for trigonometry as the SNES didn't have a graphics processor.

https://festivalofthespokennerd.com/podcast/series-3-episode...

https://github.com/RandalLinden/DOOM-FX


> the SNES didn't have a graphics processor

It absolutely did! Even the NES had one. In addition to the (cloned) Motorola 65c816 SNES CPU, the PPU (picture processing unit) was a custom chip that offered several different layouts for spitting out tile based backgrounds and sprites as each scanline physically drew across the CRT TV. It was even capable of fancy things like background rotation and scaling (Mode 7, think Mario Kart) and hardware transparences which you can see in many games. Furthermore, in the case of Doom (and games like Starfox) the cart was baked in with the "Super FX" chip that handled the 3d calculations. So you were actually dealing with two graphics processors for that specific title, heh.

Here's a wonderfully (and painfully) detailed series about the SNES hardware by Retro Game Mechanics Explained:

https://www.youtube.com/watch?v=57ibhDU2SAI&list=PLHQ0utQyFw...


Off the top of my head, my uses of LUTs have been:

   - Atmospheric scattering.
   - Tinting Sprites.
   - Nightvision Scopes.
   - FLIR scopes.
   - B&W “video feed” effect.
   - Glitch effects.
   - Heightmap Shading.
   - Alpha dot factor for spaceship exhaust plumes.
   - Heatmap of website visitors mouse dwellings.
   - Crystalline Effects.
   - And finally, post processing colorization in raw color space.
LUTs are a visualization of an array of values known to you, and it’s amazingly useful.


> Tinting Sprites

Kids these days have forgotten all about palettized sprites! Where do ya think the term “palette swap” comes from? ;)

http://www.effectgames.com/demos/canvascycle/

It fell out of favor because OG Xbox was bad at it and Photoshop doesn’t support it. But, modern hardware can totally handle palette swaps make in AESprite.


>It fell out of favor because OG Xbox was bad at it

It fell out of favor because hardware no longer needed to use palettes as they had enough memory to store the actual color.


It fell out of favour when the move to 16-bit systems meant no more hardware sprites.


I assume you mean the move to 16 bit color framebuffers, not systems with 16 bit CPUs? That was about the point where palettes and sprite and tile based 2D acceleration stopped being relevant.


The SNES and Genesis definitely had hardware sprites. The PS1 and PS2 worked with palettized textures 99% of the time and had hardware quads. The N64 could do palettized textures and quads. The same for the OG Xbox, but it was a big speed hit vs DXTC. With the Xbox360 and PS3 you could implement palettized textures as a shader. But, only the most special cases bothered.


It fell out of favor when we could just store 1000x more sprites. It was memory, not instruction-set. Tinting then happened as a pre-build step until finally shaders replaced the practice entirely.


Instead of “fell out of favor” I shoulda said “artists stopped being informed of its existence and occasionally invented poor approximations to it from first principles”. Such as tinting sprites to approximate palette swaps of the retro style they are referencing.

You can do so much more than tint with palettes. The animations in the link I posted each a composed of a single 8-bit image and a function to rotate specific sections of the palette.


I’m intimately familiar with palette animations. You’re right though, it’s a technique that was lost due to hardware no longer limiting the artists. I still find it to be far superior to sprite tile replacement animations.


If you try to program just about anything non-trivial in sed(1), LUTs are the way to go.


I'm both fascinated and horrified to ask what non-trivial sed programming you might be doing, and why ;-)


what: the most complicated thing (that I've actually used in other projects) was a data-driven precedence-climbing parser (which, since sed is usually lousy at dealing with place-value numerics, required sussing out sed-friendly representations of the core data types).

why: because given a "weird machine" that is nearly omnipresent (in the back end, anyway) and is also a Perlis language, who among us would not wish to program it? (and who among us would willingly admit that 1959's https://en.wikipedia.org/wiki/IBM_1620#Model_I programmers had had better code-fu?)

    I must go down to the sed(1) again, to the pattern space and the hold,
    And all I ask is a t branch and a s/ub/st/ to map and fold;
    And the Jolt’s kick and the keys' clack and the write lines breaking,
    And a green glint in mirror shades, and a grey* dawn waking.
* "the color of television, tuned to a dead channel"; no Ἠὼς Ῥοδοδάκτυλος here!


indeed! though you've had more success with that than i have

i'd add: or fpgas, or eproms, or m6


If I had a dollar for every time someone who had 30 seconds of accumulated knowledge about Dynamic Programming tried to argue with me that caching and memoization are the same thing, I'd have enough for a nerf bat to hit them with.

(Briefly: memoization is local shared state, caches are global shared state, which is a whole other set of problems. And caching is hoping you'll need something eventually, where as memoization is knowing you will need it immediately.)

A couple years ago I watched a very long video on Dynamic Programming and learned a new term: Tabulation. If memoization is encountering a common sub-problem and holding onto it, tabulation is seeking them out, answering the sub-problems before they can come up organically.

It's much harder to derive the tabulation answer, but I find it's usually easier to reason about after the fact. Which is enormously helpful when you go back to add new features or fix old bugs.

Look-Up Tables are fixed-size tabulations. The space complexity of tabulation is often ≥ O(n), while LUT are a fixed size, but big enough to reduce the computation time by a constant factor, like 10 or 100.


It sounds like your distinction of tabulation vs memoization is similar to the concept of bottom-up vs top-down DP. Bottom-up: build the table up eagerly, top-down: recurse down and fill in the table lazily.


Yeah I'm not sure how widespread his nomenclature was. I was just glad to have more vocabulary for people who have trivialized the entire discipline down to 'caching'.


Back in the days assemblers (patched Seka) had tools to generate sin wave lookup tables.


Handy for vector graphics if you include a table for TAN.


"They came to TAN / I taught them how to SIN" (COS my slipstick brings all the girls to the yard): https://img-9gag-fun.9cache.com/photo/a5bbB2r_700bwp.webp


This is brilliant - I've used plenty of LUTs for post-processing but it never occurred to me to use them for recolouring assets as well.


This reminds me, we used a command line tool (originally made for Warcraft 2?) to take full color art assets and find a palette of 200 or so colors (the rest reserved for UI elements and the player characters) for the different areas in Diablo 2, and then remapped all the art assets to these small palettes, and then reuse the monster art with other colors within these palettes for varieties of monsters. I don't remember spending more than a day on putting this in our art pipeline, the guy who wrote the tool did all the hard work.


That's really impressive!


Doom used a lookup table for random numbers: https://doom.fandom.com/wiki/Pseudorandom_number_generator


Does this mean the engine is fully deterministic?


Yes, if you make the exact same inputs you will get the exact same outcome. Demos and multiplayer rely on this, they just record player inputs, nothing about enemy movement or anything like that. This is also why you can leave a multiplayer game in the middle of a match, but you can't join one in the middle

https://www.youtube.com/watch?v=pq3x1Jy8pYM


Yes, and that speed runs can be “seed runs” where you do things to make the right LUT number come up at the right time.

The DooM Engine Black Book goes into details.

https://fabiensanglard.net/doom_fire_psx/ also

Other games use seeded random number generators to be entirely deterministic even though random; Factorio does this to allow multiplayer without having to sync the entire game state all the time.


Yeah, developer commentaries are a goldmine in general for small gold nuggets of knowledge like that. Never would have come up with such an use-case either.


Are there any other dev commentaries you recommend (other than the mentions in your post)?


Basically, all of the Valve ones. (Half-life 2, Half-life 2 Episode 1, Half-life 2 Episode 2, Team Fortress 2, Left 4 Dead, Left 4 Dead 2, Half-Life Alyx)

Each one of them is a time capsule of the techniques and game design philosophies at the time of development. Pretty sure there is a walk through on youtube for every single one of them, if you don't want to get the games yourself.


They are extra useful to precalculate other visual effects. In the past, following a magazine tutorial, I coded a very cheap raytraced "lens effect" on a 286. I recently (5 years ago!) ported it to js [0].

That magazine got me into fractals and raytracing, BTW.

[0] https://github.com/luismedel/jslens


Awesome demonstration!


Lookup tables are great to produce impressive graphics effects under strict compute limits. I use them in many FPGA projects. On FPGA there is a balance between limited memory (often BRAM) and limited compute, but they can be used to great effect:

Tunnel effect (exactly as user bemmu described below):

- see it live: https://htmlpreview.github.io/?https://github.com/sylefeb/gf...

- lookup table: https://github.com/sylefeb/gfxcat/blob/main/tunnel/tunnel.h

- how it is computed: https://github.com/sylefeb/Silice/blob/master/projects/ice-v...

Julia fractal, with a table to do integer multiply! (2.a.b = (a+b)^2 - a^2 - b^2, so just precompute all x^2 in a table! )

- see it live: https://htmlpreview.github.io/?https://github.com/sylefeb/gf...

- code (see 'sq' table): https://github.com/sylefeb/gfxcat/blob/main/julia/julia.c

- credits (mul trick): http://cowlark.com/2018-05-26-bogomandel/index.html

In-hardware lookup division for perspective correct texturing!

- doom-chip on-ice (rC3 2021 talk): https://www.youtube.com/watch?v=2ZAIIDXoBis

- detailed write up: https://github.com/sylefeb/tinygpus?tab=readme-ov-file#preco...

- actual lookup table generation: https://github.com/sylefeb/tinygpus/blob/498be1b803d0950328a...

Sine tables are also very typical, for animating things on screen or plain trigonometry, for instance my small fixed integer raytracer uses a sine table to animate the spheres (and the cos is easy to get from the sin, especially if the table has a power of two size ;) ):

- see it live: https://htmlpreview.github.io/?https://github.com/sylefeb/gf...

- source code: https://github.com/sylefeb/gfxcat/blob/main/raytrace/raytrac...

And of course procedural textures!! Perlin noise is made of lookup tables, and often multiple lookups are combined to then lookup a clolormap (https://redirect.cs.umbc.edu/~ebert/691/Au00/Notes/procedura...)

There are so many other examples. Also, FPGAs are quite literally made of LookUp Tables, or LUTs:

- https://github.com/sylefeb/Silice/tree/master/learn-silice#f...

- https://github.com/sylefeb/silixel

(edit: formatting)


Also: https://news.ycombinator.com/item?id=37823805 (You probably shouldn't use a lookup table (2022))


That's about a different, less specific meaning of 'look up table' (vs. 'LUT')


I was the author of that piece. This blog is about graphics LUTs, which fall into point (4) of the "when to use lookup tables" section. GPUs have dedicated constant memories that were specifically made for holding lookup tables so that they can be accessed quickly and deterministically. Graphics LUTs are a really powerful tool because of that, and have none of the performance shortcomings.


Can confirm CPU part. I once tried to replace a 2d stepwise linear function interpolations (basically, a loop finding the right step, then linear interpolation, on a double; the function's objective is to rescale inputs before they become regression's arguments) by LUTs.

All of this looping was happening in another tight loop, and there were multiple LUTs.

It was a failure with overall decreased performance and increased cache misses. Plus, I could never get the same precision.


Why can't CPUs be told what to cache? Is it because the instruction sets are obsolete, or is there a technical reason why they can't be added?


It could be added and it was a feature on the Gamecube CPU for example - you were able to lock the cache and then, IIRC, half the cache was used as normal and half was at your disposal for very fast access to look up tables or data you were using over and over again. Some code to deform 3D meshes when animating them using a skeleton used the locked cache to speed up that operation greatly by having the bone matrices at hand.

A problem I imagine on modern processors is the cache might be shared amongst several cores and interrupts etc.

Back in the nineties, 3DFX made a speed of light rendering demo for the Voodoo1 where they took over the entire machine and through intimate knowledge of cache behaviour effectively set aside an area in the L1 cache that would contain some data they could stream in and speed up their triangle setup.


You absolutely can tell the CPU what to cache, and also what not to cache (such as PCIe memory-mapped registers), it's just that that's privileged instructions which no sensible multitasking operating system will let you have access to. Because it would have to be reset on every task switch.


I’m still disappointed that you can’t boot a modern processor with its 8MB of L2 cache and run Windows 95 entirely in cache. Stupid requirements to have a backing store.


I think if you flip the right MSRs, you can actually run in this mode on Intel. BIOSes used to do DDR3 calibration in this mode, and needed to run a core DRAM-less to do it.

AMD Zen platforms have always put that functionality (ironically) in the secret platform security coprocessor.


The real probable reason is that you will do it badly. The PREFETCH instructions are the closest things, which are mostly just a hint to the prefetcher. If you had actual cache control (on multi-tenant systems especially), programmers would probably overuse it.


TLDR?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: