Tough Times on the Road to StarCraft

jokoon · on Jan 24, 2015

What's the point of using linked lists ? Low memory ? I'll never understand why the usage of linked lists is so common.

I mean you don't have fast random access with linked list, aren't hash maps just better for fast insertion/deletion ? Object pools are also pretty great too.

I'll never forget the day a classmate in some game programming school argued that linked list were like vectors (std::vector).

Linked lists only seems viable for insertions, they're not viable for fast deletion since you need to search for the pointer first. Maybe you need to index linked lists instead, but since the heap already allows itself to be fragmented like a linked list can, wouldn't it be better to just use vectors as often as possible so it can be more cache friendly ?

falcolas · on Jan 24, 2015

Linked lists have more consistent performance than std::vector. And taking half a second to render one frame when you're averaging 100fps is way worse than consistent 50 FPS.

There was recently an article about intrusive linked lists, and this same question came up. I took the unreal linked list library, and wrote a similar std::vector implementation, and tested it. The test was simple, fire 50,000 bullets at a rate of 500 rounds per second, and remove them when they fall 2 meters. They existed in 3 separate lists: physics objects, renderable objects, and all objects. When they hit the ground, they were removed from all lists. Capture the clock ticks to generate each "frame", and do a few calculations on them at the end.

The average and median operation times for std::vector were indeed a fair bit faster (at -o3, at no optimizations the ILL implementation was faster). However, the worst case for std::vector was... much worse. More than 1000x worse in some runs. One test run showed the worse case taking over two millisecond for a single frame of just manipulating vectors!

    $ ./benchmark_ill
    Minimum: 0.000000
    Maximum: 0.000094
    Average: 0.000005
    Median: 0.000005
    Total Frames Rendered: 18968118
    Total Run Time: 100.663895

    $ ./benchmark_v
    Minimum: 0.000000
    Maximum: 0.004813
    Average: 0.000002
    Median: 0.000002
    Total Frames Rendered: 35859663
    Total Run Time: 100.664009

sillysaurus3 · on Jan 24, 2015

Wait a sec. You can't bust out numbers like that without being clear about a few key points:

1. What was std::vector's removal algorithm like? Are you sure that .remove() wasn't shifting all elements in the vector? The "correct" removal algorithm in that case looks like this: Swap the target element to the end, and then decrease the vector's size by 1. But if I remember correctly, std::vector's remove method doesn't work like that, so of course it will have terrible worst-case performance if it was shifting all elements of the array when you remove the first one.

2. What's causing the worst-case slowdown? Are you sure you weren't simply seeing the effects of the L2 CPU cache being invalidated?

A linked list will thrash the CPU cache constantly, whereas vectors thrash the cache much less since the elements are stored in contiguous memory. For a modern game running on a modern CPU, vectors are almost always the better choice for that reason.

EDIT: I'm not sure your numbers are trustworthy. For example, there's no way it rendered 35859663 frames in 100 seconds. That would be 358,596 FPS.

Assuming your FPS was actually something like 358, not 358,596, that's still an extremely high FPS. When your FPS is so high, it becomes difficult to accurately measure the reasons for worst-case performance. A sudden spike within a single frame could be for any of a dozen reasons, such as CPU cache eviction, a GPU pipeline stall, or some other program on your computer taking CPU resources.

ANTSANTS · on Jan 24, 2015

>Assuming your FPS was actually something like 358, not 358,596, that's still an extremely high FPS. When your FPS is so high, it becomes difficult to accurately measure the reasons for worst-case performance. A sudden spike within a single frame could be for any of a dozen reasons, such as CPU cache eviction, a GPU pipeline stall, or some other program on your computer taking CPU resources.

As an aside, programmers need to stop using FPS and other inverse units of measurement when performance tuning. It's much easier to reason about 2.79~ milliseconds than it is to reason about 358 FPS. For instance, assuming a single threaded application, let's say I know my game simulation alone can be updated at 125FPS, while my rendering code can be updated at 30FPS. How many FPS does my game run at when I run both together? You'd be hard pressed to figure that out without converting the two to 8 and 33⅓ milliseconds per tick, respectively, making the combination of the two obviously 41⅓ milliseconds, and thus about 24.2FPS.

Yes, your comment about performance measurements under lights loads not accurately reflecting performance under greater loads is true, but 2.79 milliseconds really aren't so far from the standard target of 16.66 milliseconds per frame than the 358 vs. 60FPS comparison would make one guess.

falcolas · on Jan 24, 2015

> You can't bust out numbers like that

Nobody is preventing you from writing your own benchmark and "busting out [your own] numbers". :)

That said, yes, my deletion algorithm is naive, and I welcome a pull request to fix it.

https://github.com/garrickp/intrusive_benchmark

> Are you sure you weren't simply seeing the effects of the L2 CPU cache being invalidated

Perhaps, but I doubt it's the cause of the worst case scenario, particularly since I'm doing identical loops over the physics and renderable objects, and as you say the linked list should have the worse performance.

EDIT: One more small point - if the programmer is relying on the order of the items at all, the "tail tuck" method of deletion would not work.

EDIT[2]: Addressing your edit - the numbers are only for the action of applying physics and adding/deleting objects - the code should reflect that. There is no actual rendering time or AI or anything else which would prevent us from seeing upwards of 100,000 "frames" per second.

The times are also fairly stable (the standard deviation of max times is around 20%) between runs.

on Jan 24, 2015

[deleted]

sillysaurus3 · on Jan 24, 2015

Well, your comment led with "Linked lists have more consistent performance than std::vector." But the tests just don't support that conclusion, and the truth is the opposite in a real-world environment: lists will kill your performance.

Before you'll see any downsides of list, you'll have to introduce memory fragmentation into your test and ensure that your list's allocations end up all over your address space, like they would be in a real engine. At that point you'll see a huge slowdown compared to vector due to CPU cache thrashing.

sillysaurus3 · on Jan 24, 2015

I apologize that my comment probably came off as a bit too upfront. I need to work on that.

I was only saying that it isn't really a good idea to present numbers like that without having a clear idea of what's causing the worst-case performance. Glancing over the code, and assuming the first bullet fired is always the first one to impact the ground, it looks like the code is always going to run std::vector::erase on the first element of the vector, which will always shift the 499 remaining elements. It's possible that's the reason for the worst-case performance, rather than vector being inherently a worse choice than list.

It was very counterintuitive and weird to realize that vector is almost always a better choice than list due to how powerful the CPU caches are. CPU caching is so good that taking full advantage of caching will usually outweigh algorithmic tweaks, like whether you're using a list or a vector. So the power of vector is that its elements are in contiguous memory, whereas the downfall of list is that they usually aren't.

It's difficult to prove that's true except in real-world scenarios. Unless a test is designed to introduce memory fragmentation, the cache performance penalty of using list won't really show up in the results. All of the list allocations will be pretty much contiguous in a test environment, so the tests will show performance on par with vector. But in the real engine, you'll be left scratching your head why everything's running so slowly, when the answer is because the elements are sprayed all across your memory space and it's causing the CPU cache to stall when they're accessed...

EDIT:

There is no actual rendering time or AI or anything else which would prevent us from seeing upwards of 100,000 "frames" per second.

When your FPS is higher than, say, 1,000, it becomes very difficult to pin down the reasons for performance fluctuations. An engine under load behaves differently than an artificial test, so you'll need to measure the performance impact in a real engine, especially when the framerates in the test are so high.

Your monitor is capped at 70 fps, or some high-end monitors can do ~130. Something like 100,000 fps isn't an accurate reflection of the behavior of the real system. Most of the discrepancy is due to memory fragmentation causing CPU thrashing, but your specific test might be running into some other discrepancies, like:

- clock() might not be a high resolution timer, so the worst-case timing might be due to discrepancies in the clock mechanism itself, especially at 100,000+ FPS. See http://stackoverflow.com/questions/6749621/high-resolution-t...

- since your test is running for 100+ seconds and your timing measurements are taking into account an entire frame, some frames are going to run interrupts. If an interrupt takes a long time to process, that load will show up in your timing measurements. The result is that vector (or list) will be unfairly pinned as the reason for the performance issue, when interrupts were the cause.

I'm not confident about the second one, but the point is that there are all kinds of conflating variables.

falcolas · on Jan 24, 2015

> So the power of vector is that its elements are in contiguous memory

In either case, you'll end jumping all over memory for the actual items in either a std::vec or a linked list, since they are both holding pointers to the actual objects (and based on how the nodes are being created, the linked list containers are closer in memory to the nodes).

That said, my premise was not that linked lists were faster to iterate over, it's that they offer a consistent addition/removal time, which is frequently better for a game engine than access with huge deviations.

There are many games out there which boast 100+ FPS on my gaming computer. However, they still look like crap because they will have intermittent drops to 1 or even .5 fps. I (and many others) would prefer a constant 50 (or 30) FPS than a stuttering 140.

Real time OSes offer the same commitments: constant gaps between program resumes, even if that constant delay is large.

sillysaurus3 · on Jan 24, 2015

Okay, there's some kind of failure to communicate, and the frustration level is rising. The failure is probably on my end, so I apologize.

In either case, you'll end jumping all over memory for the actual items in either a std::vec or a linked list, since they are both holding pointers to the actual objects (and based on how the nodes are being created, the linked list containers are closer in memory to the nodes). That said, my premise was not that linked lists were faster to iterate over, it's that they offer a consistent addition/removal time, which is frequently better for a game engine than access with huge deviations.

CPU caching doesn't work like that. When you access any pointer, the CPU cache will automatically cache 256 bytes from the access address. So if your object is at address "x", everything from [x, x+256) will be put into the CPU cache.

Your Projectile class contains 6 floats, so each Projectile is 24 bytes. When you're using std::vector, all of your projectiles are stored contiguously in memory. That means when you access a projectile, you'll automatically cache (256 bytes / sizeof(Projectile) - 1) = the next 9 projectiles. So you get a bunch of performance for free, simply by using vector.

When you use list, you lose all of that caching. The reason is because list's memory allocations are all over the place. When you use list, the memory region [x, x+256) is solely that one projectile. That means your CPU will end up caching a bunch of irrelevant bytes of memory. That also means the CPU will have to cache each projectile individually as they're accessed. You're trading automatic caching of 10 projectiles for having to cache each individual projectile. Since taking advantage of CPU caching gives you 10x or 100x speedups, that's a very bad tradeoff.

Addition/removal from a container is a tiny, tiny fraction of tininess of what a modern game engine does. It's so tiny that add/removal performance is almost always irrelevant unless you're adding/removing a huge number of elements every frame, which is rare. Iteration performance is almost the only thing that matters for a game engine, because every single frame you'll be iterating over hundreds of containers. If each iteration causes your CPU cache to thrash, then you're in for a world of pain.

The takeaway is that list is an awful choice for any realtime game, and it's dangerous to give the impression that it's better. If someone were to take your tests at face value and start using list in their own game engines, a few months later they're going to be wondering why their performance is so terrible and why their profilers aren't showing the cause. As far as performance characteristics go, memory fragmentation / CPU caching matter so much that discounting them is a recipe for disaster.

There are many games out there which boast 100+ FPS on my gaming computer. However, they still look like crap because they will have intermittent drops to 1 or even .5 fps. I (and many others) would prefer a constant 50 (or 30) FPS than a stuttering 140.

Agreed! But list vs vector isn't the cause. A lot of the stuttering has to do with loading resources from disk, flushing the GPU pipeline, or other I/O intensive actions. On a modern CPU, add/removals from containers aren't the reason for the slowdown.

Either way, good luck, and keep on making cool stuff.

quotemstr · on Jan 24, 2015

> When you're using std::vector, all of your projectiles are stored contiguously in memory

The OP's point is that we're not talking about

  std::vector<projectile>

but about

  std::vector<projectile*>

The latter thrashes the cache, but it's necessary of an object needs to belong to several lists at the same time.

In either case, enlarging the vector also has a significant cost.

sillysaurus3 · on Jan 24, 2015

That's true, but in modern game engines, objects of the same type share an allocation pool. So a bunch of Projectiles should still be in contiguous memory, since they're using the same pool allocator.

At that point, vector is still a better choice because it prevents the CPU from accessing a bunch of intermediate list nodes which will pollute the CPU cache. If you use a vector, there are no intermediate objects, so less stuff winds up in the cache. There's also less memory fragmentation overall.

As long as you're using a vector implementation that grows by a factor of 2 every time it expands, then there shouldn't be any significant cost of enlarging the vector. If it's just storing pointers, it won't invoke any copy constructors. It can even be optimized to a straight-line memcpy, which is blazingly fast on a modern CPU.

quotemstr · on Jan 24, 2015

> That's true, but in modern game engines, objects of the same type share an allocation pool.

I don't think pool allocation guarantees memory contiguity: what if we have a pool with two elements free, those elements being the first and last ones, and we allocate two elements and stick pointers to them in our vector?

> a vector implementation that grows by a factor of 2....shouldn't be any significant cost

It depends on what you mean by "cost". I agree that the enlargement cost is acceptable from an overall CPU budget sense, but each enlargement can still cause a latency spike, since the cost for enlargement is still large (especially considering the cost of taking the heap lock) for _that_ insertion. Sometimes consistency is more important than throughput.

> vector is still a better choice because it prevents the CPU from accessing a bunch of intermediate list nodes which will pollute the CPU cache

Who said anything about intermediate list nodes? The nice thing about an intrusive linked list is that once you pay for accessing an element's cache line, the rest of the accesses are nice and local. There are no intermediate nodes, and all costs are smooth and consistent. Besides: you can issue a memory prefetch instruction for node B while you process node A, further hiding latency.

While a vector is value types might be best in most situations, if you can't use that, an intrusive linked list may or may not be better than a vector of pointers.

sillysaurus3 · on Jan 24, 2015

Good points! I love tech chats like this. There are a few caveats:

A pool should be trying to make new allocations as contiguous as possible. That's accomplished by wrapping the allocations. For example, let's say we have a pool with slots A, B, C, D, E. We allocate A through D, then C is later freed. The pool shouldn't put the next allocation at C. It should be placed at E. That way, the objects are still roughly sorted by creation time: A is younger than B, which is younger than D, and so on.

(The next allocation should go in C after that, because there are no more slots. But by that time D and E might have been freed, so it's still in sorted order.)

That way, if you access any given element, it will cache the next element in memory. Due to the sorted nature of the allocations, that means even if you're iterating over them using some other container like a list or vector of pointers, accessing an element will still likely cache the next iterator's element. Not guaranteed, but likely.

About the vector enlargement: Usually the cost is nil, because as long as the vector stays under 1MB, you can simply preallocate large blocks of memory at startup for each type of vector. Therefore the cost of expansion is just a few arithmetic instructions for bookkeeping purposes, not anything expensive like a heap lock.

Good point about the intrusive list. I personally wouldn't like it because it makes the code more complicated, but as long as you're using a pool allocator, the memory characteristics of an intrusive list shouldn't be too different from the scheme I described above.

flohofwoe · on Jan 24, 2015

Dynamic arrays like std::vector must be tweaked for a special use case as well, for instance it is quite essential to pre-allocate the array to a capacity of the expected number of entries using std::vector::reserve() so that no excessive re-allocation takes place while the array grows. Also special care must be taken if the objects have an expensive copy operation (that's where the new move-semantic in C++11 may help a lot).

sillysaurus3 · on Jan 24, 2015

It's not so much that dynamic arrays are the problem as std::vector specifically is the problem. Many game engines implement their own version of vector which is designed to grow by power-of-2. That way if you add 100 elements, the vector has already reserved 128 slots. If you add 129, it'll automatically reserve 256. Etc. Automatic power-of-2 growth turns out to have very nice performance and memory characteristics.

std::vector itself might work that way, but since the implementation can be different on different platforms, it's usually a better idea to write your own when performance matters.

barrkel · on Jan 24, 2015

I doubt there are any vector implementations that don't use exponential growth. Without exponential growth, push_back doesn't have constant time, and adding n elements doesn't take O(n) time. Since push_back is required to take amortized constant time, an implementation without exponential growth (or some kind of magic) wouldn't be standard conforming.

Whether the exponential growth is in the form of doubling or not is just a constant factor. There's nothing magical about using powers of two. Depending on the accounting used by the allocator you're using, there may be synergies, or there might not be - some allocators require a few bytes more than the allocation request, and this requirement might inflate the underlying memory request more than necessary.

DCoder · on Jan 25, 2015

> I doubt there are any vector implementations that don't use exponential growth.

Trivia fact time: Westwood Studios wrote their own vector implementation for the original Command&Conquer, and reused it for later games in the same series, until Renegade/Generals. That implementation uses linear growth, capacity grows by 10 elements each time.

barrkel · on Jan 25, 2015

Indeed; however that's not conforming to the standard, so it is not an implementation of std::vector ;)

sebleblanc · on Jan 25, 2015

But it is an implementation, albeit a non-standard one.

sillysaurus3 · on Jan 24, 2015

Whether the exponential growth is in the form of doubling or not is just a constant factor. There's nothing magical about using powers of two.

Actually, powers-of-two reduce memory fragmentation for all objects of the same size. Since memory fragmentation is one of the central causes of performance problems in modern large projects, it's really important. Firefox spent a bunch of time realizing their performance problems were due to an incredible amount of fragmentation.

Exponential growth gives you most of the benefit, but specifically growing by a factor of 2 gives you the rest.

barrkel · on Jan 25, 2015

You're not saying anything I don't already know.

If the overhead of an allocation is h bytes, and your allocator stores it in e.g. the prefix of the memory allocation, then all your allocations will be for 2x+h. And this will not have nice effects for your memory fragmentation. In this case, you'd be better off allocating 2x-h each time.

Of course, modern allocators tend to store accounting information using separately reserved arrays of records to avoid exactly this problem. But if you use a language like C# or Java, you come back to around to this problem. Without using a double indirection for array data, object header and array length are stored contiguously with array data. So allocating an array with a power of two will again not be a power of two for the underlying allocator. Much less of a problem with precise GC, but not all GC'd languages use precise moving GC, and some languages let you pin arrays for interop with C. So it's something to be aware of.

flohofwoe · on Jan 24, 2015

I agree! It's just that C++ people from outside the gaming industry always give me the evil look when I start ranting about how useless the std containers actually are ;)

jokoon · on Jan 24, 2015

already read that article about ILL, did not make much sense to me.

Where do you explain why the vector is slower ?

falcolas · on Jan 24, 2015

I'm not saying they're slower, I'm saying the performance is more consistent.

99.9% of the time the std::vec implementation will out preform the linked list. However, that 0.1% seems to be wildly worse than the linked list. This is not a problem for most programs, however in gaming engines that hiccup shows up as a visible hitch in the framerate.

jokoon · on Jan 24, 2015

So to you, that 0.1% is specific to gaming engines, how ? I don't really believe it.

I think "that 0.1%" demonstrates that linked list are really irrelevant and that vectors should be used all the time.

The only useful advantage of linked lists is to be able to insert an element in the middle of the list. I don't understand why anyone would want to insert data in the middle of a list, you rarely want to do that in O(1), and even if you do, you still need to know where exactly you want to insert it.

If you want to have data sorted, use map<>, if you want a faster map, use a hash map.

Maybe lists are relevant for vertex geometry, because it might be interesting to quickly edit and add vertices, but it's such a specific case I don't think it matters. Nobody really edits vertex geometry. You could even just build an index vector instead.

falcolas · on Jan 24, 2015

I don't believe it's problem endemic only to game engines, but game engines are particularly vulnerable to such latency. Google reveals many examples of how hitching causes problems for players.

I'm not going to claim that memory is the primary, or even a remotely major cause of such hitching, but it's certainly not going to help when you have to spend one of the 16 milliseconds given to render a frame on memory allocation and re-ordering.

corysama · on Jan 24, 2015

Remember that this code was written a long time ago. It is seriously only recently that I have seen widespread discussion about how poorly linked lists perform in practice despite how theoretically nice they are.

Back in the olden days at least a few things were different compared to today: 1) your only point of reference and authority was academic writing that stressed theory over practice. Blogs weren't around yet. So, you didn't have any random blogs with hard measurements showing that in practice brute force often wins over elegance. 2) Back then, the memory/ALU gap wasn't quite as drastic as it is today. 3) Available memory was drastically smaller than it is today.

Therefore, if you had a list with a length that was highly variable, the obvious go-to container was a linked list. The idea of using an dynamic array of objects or pointers seemed highly wasteful from a memory fragmentation and copying point of view.

Today processors spend most of their time stalling on memory and only dreaming of the opportunity to copy a large, linear chunk of data. It's only now that the performance penalty for random access has exceeded 50:1 for over 10 years that general programming public has started to take the problem seriously.

Retric · on Jan 24, 2015

Until recently main memory was close to CPU speeds and RAM cache was pointless. The issue is the speed of light means the distance between RAM chips and the CPU limits you to MHz* speeds which is not an issue for a 286, but CPU's kept getting faster even if the Speed of light is fixed.

*At 1GHz light can do a six inch round trip, electricity is significantly slower than that. SDRAM is fairly high latency but cheaper and as the best case is not all that fast it's considered a good tradeoff. Adding cache increases latency even further, but tends to make CPU's faster on average.

jjoonathan · on Jan 24, 2015

I don't think speed of light is a limiting factor yet (if it will ever be) for DRAM: even as memory bus speeds have been increasing, tCAS/tRCD/tRP/tRAS/etc, the number of clock cycles it takes a RAM card to actually perform the commands sent to it, have been increasing too.

For a DDR RAM "speed" of, say, 1866 MHz you have an actual clock of 933MHz and typical timings of roughly tCAS=tRCD=tRAS=9 to tCAS=tRCD=tRAS=11 (measured in cycles of the actual clock). If you're lucky and the correct memory row is already open, then you'll wait 10ns for a memory request to complete, but you will regularly wind up waiting 20 or 30ns if you need to open or precharge+open a row -- and that's assuming that there are no refresh cycles blocking the request. This adds up to a lot of inches when you multiply by 6in/ns, not to mention a lot of CPU die area to optimally order memory requests & refreshes (you can have one in-flight per bank).

You're right that light should be the limiting factor, though. It has long seemed to me that we need to undergo a DRAM->SRAM shift analogous to the HDD->SDD shift. Drop the capacities a bit (factor of ~6 from 6 transistors/cell vs 1 transistor/cell?) in exchange for speed. I'm sure there's a reason why it's not happening (energy efficiency?), but I'm an industry outsider and have no idea what it is. If anyone reading this does, I'd love to hear.

Retric · on Jan 24, 2015

At 933 MHz light can make a round trip of Google: "1 / (966* 1 million) / 2 * speed of light * 1 second in inches" = 6.109 inches. The physical path from a CPU to the furthest part of ram on a desktop PC is about 2-3 times that so by the laws of physics your going to need to wait more than 2 cycle before you even consider that electricity is ~1/2 the speed of light.

So, ~1/2 of your total latency (~5 or 6 of 10 cycles) is just waiting electricity to travel down wires.

Anyway, my point is when physics puts hard limits in place cost / benefit trade-offs tend to be centered around that limit. http://en.wikipedia.org/wiki/Amdahl%27s_law

Note: This is purely a discussion about latency which is what matters for Linked List access. You can send a lot of data as part of a single request cycle, but that adds to your effective latency.

jjoonathan · on Jan 26, 2015

> The physical path from a CPU to the furthest part of ram on a desktop PC is [1-1.5ft]

That doesn't seem right.

http://www.bhphotovideo.com/bnh/controller/home?O=&sku=10887...

Looks like it might be 1ns away at .5c, which is 10% of the minimum latency and 3% of maximum (excluding refresh) latency -- and those are latencies I took off a top-of-the-line gaming RAM (my own computer is 11-11-11). It also looks like the RAM could trivially be moved closer if there were an advantage to doing so (e.g. square, stacked layout).

My point is not that the limits aren't physical, it's that the limiting physical factors are currently in silicon designs, not FR4 layouts.

Retric · on Jan 26, 2015

First latency is a round trip, so you need to think .25c not .5c. Second, reaching the DIMM is not enough you need to reach the worst case furthest bit on that DIMM.

At .25c 1ns = Less than 3 inches. A single DDR3 DIMM is ~6 inches wide and ~1 inch tall so that's not long enough to cover a single DIMM.

Look at that picture again, including clips and those dimm slots are 7 inches wide. The top edge of a DIMM is 1 inch over the MB. And the furthest dimm is ~2 inches further from the CPU. The trace from the CPU's on board memory controller to the MB is ~1 inch. And the CPU is ~ 2 inches from the DIMMS.

EDIT: http://en.wikipedia.org/wiki/DDR3_SDRAM#mediaviewer/File:4GB... The DIMMs are not strait paths so that can easily be another inch.

Looks close to 1 foot to me which at .25c that's ~4ns. Or 40% of the minimum latency.

As to layout, that stuff is much harder than you might think as the wires can't go over each other.

hengheng · on Jan 24, 2015

> electricity is significantly slower

As a rule of thumb, most signals travel at half the vacuum speed (√ε_r for FR4 is 2.15), with a slight variation due to the copper trace shape. So, 150 mm/ns or 6 inch/ns one-way. I love how we can say nowadays "That's not all that much, is it."

Rapzid · on Jan 25, 2015

Man, what ever happened to those fibre interconnects and micro transmitters Intel was working on?

http://www.bit-tech.net/news/bits/2010/07/28/intel-claims-op...

flohofwoe · on Jan 24, 2015

(Intrusive) lists were useful when the cost of a memory access didn't matter relative to CPU speed, so that cache thrashing wasn't an issue, and you had a scenario where list nodes had to be inserted and removed frequently, but you don't need random access. In C such intrusive lists are also easier to manage since you don't need to care about dynamic memory management since the list nodes are embedded in some other existing data structure.

Today where avoiding cache misses is the most important optimization, traditional lists have indeed become quite useless, especially the non-intrusive type like std::list.

flohofwoe · on Jan 24, 2015

...but I think linked list were not the main problem, but (1) use of inheritance instead of composition to implement unit behaviour (but we all walked into that trap, and these were the 90's when OOP was still hip and universally seen as a silver bullet), and (2) it looks like objects have been linked to each other quite ad-hoc, instead of providing some sort of 'object relationship system' which encapsulates object linkage and querying for objects. But as I said, these were the 90's and game developers have learned from these mistakes. There's a lot of good whitepapers around about entity component systems, composition, high-level architecture of games and so on. Today it's more about asset creation pipelines and production tools, at least for 'big' game projects.

davexunit · on Jan 24, 2015

>What's the point of using linked lists ?

The most important quality to me is that singly linked lists are a great simple persistent data structure, whereas the data structures you've mentioned rely on mutation. Linked lists are awesome, seriously.

rcxdude · on Jan 24, 2015

Another useful aspect of intrusive linked lists in certain situations (mostly embedded nowadays) is that they work across non-contiguous memory. When you don't have virtual memory or even malloc that can be really important.

maxerickson · on Jan 24, 2015

In a game, I would think most deletions would start from the node being deleted.

AdeptusAquinas · on Jan 24, 2015

Very interesting article, if horrifying :)

As someone whose only experience is in corporate service IT development, from the outside the game dev industry looks incredibly cowboy-ish; it would be nice to read a story where a team built a game where they met their deadlines, had no crunch times, followed excellent coding standards (CI, unit tests, peer dev etc.) and maybe even used a modern language (perhaps, heretically, a non-native language like C#!)

starmole · on Jan 24, 2015

Interesting perspective! You bring up a lot of different points:

- There are tons of positive "post mortem" [http://www.gamasutra.com/features/postmortem/] write ups on gamasutra, but the ones with problems are usually more valuable. One recent example of "doing it right" I remember was the XCOM:Enemy Unknown remake. Check their GDC talks!

- One thing that makes games special is pushing the envelope. Yes, you can make a game on a very solid stack. But it won't stand out.

- Game coders think service people are mad also: How can you use a database that you do not fully understand? You don't even have the source? Using a framework or library or cloud that you do not 100% understand seems "cowboy" to them.

- Games are "fire and forget". That means technical debt is very different. It is bad at the start of a 3 year cycle, but at the end it is totally ok.

- Not mentioned by you but important to point out: While working on games is great for learning and unlike anything else it is also a terrible industry for coders. Insane crunch time. Low pay.

ido · on Jan 24, 2015

    - Games are "fire and forget". That means technical debt
    is very different. It is bad at the start of a 3 year
    cycle, but at the end it is totally ok.

There is also a big difference between AAA, online, mobile, social, casual, etc. f2p games for example are expected to live & be actively supported/updated for many years if successful.

Clash of Clans was released 2.5 years ago and will likely continue to get active support for many more years. FarmVille was released almost 5 years ago. WoW more than 10, for a different kind of game that also isn't fire and forget.

They get a lot of flack for other reasons but engineering wise companies like zynga get a lot of stuff right.

starmole · on Jan 24, 2015

You are absolutely right. I was intentionally polarizing. But thinking about it more, one thing still stands out: Even long running mmos are planned "for the next 10 years". There is an end in sight. Facebook, Google, Uber, Bitcoins, Most Startups - it's open ended! I don't think it's easy to say one approach is better - but it is useful to understand that they are fundamentally different.

Personally as an engineer I like the coding approach of game dev a lot more.

frik · on Jan 24, 2015

> Game coders think service people are mad also: How can you use a database that you do not fully understand? You don't even have the source? Using a framework or library or cloud that you do not 100% understand seems "cowboy" to them.

Not to mention widely used on-premise enterprise software with weird upgrade-paths and hidden costs. If you look behind the curtain there is a lot of legacy cruft in COTS. Customizing such installations seems cowboy-ish to others. SaaS/cloud is better and hide ugly details but may off limits for high confidential data.

Game programmers often reinvent the wheel but learn from others through post mortems (former Game Developer Magazin). Enterprise devs often cook their own customized solution based on COTS. You can still install and play almost all PC games on modern hardware with Win 7+ 64-bit. Try that with old SharePoint, Exchange, SAP, Oracle/Sun, Novell, etc. software and not to mention the troubles to restore an old backup.

As someone who saw both camps, I would say both sides can learn from the other side.

AdeptusAquinas · on Jan 24, 2015

All true (and as someone who spent years building solutions around the various SP releases, I know about its internal problems all too well). Also, while I will use SQL Server without a second thought for most of our development simply because its easy, does its job without problems and our tooling fully supports it, I also know if I have speed issues I'll just throw another server or a ton of RAM at it until the problem goes away; something game devs obviously can't do.

gaius · on Jan 24, 2015

One thing that makes games special is pushing the envelope. Yes, you can make a game on a very solid stack. But it won't stand out.

I know what you're saying, but I wonder how true that is, and my analogy is the movie and music industries. You can make a very successful movie or album without pushing any technical envelopes, infact there are people who consciously and deliberately use "obsolete" tools and techniques.

icebraining · on Jan 24, 2015

Game coders think service people are mad also: How can you use a database that you do not fully understand? You don't even have the source? Using a framework or library or cloud that you do not 100% understand seems "cowboy" to them.

Is that really true? When a team uses an engine licensed from another company, do they really take the time to fully understand its source?

frik · on Jan 24, 2015

There are two factions, groups that license a third party engine and mainly work on the game content and groups that work on an in-house engine and on game content. If you want to make a game that stands out you are generally better of with an in-house engine. Otherwise you have to make compromises and do it a specific way that may hurt the end result. Most Unity and Unreal 1-3 engine based games can be identified as such literally from 50 feets away. In-house engines have downsides as well, and some hobby projects never advance to the gameplay content creation stage.

Teckla · on Jan 24, 2015

- Games are "fire and forget". That means technical debt is very different. It is bad at the start of a 3 year cycle, but at the end it is totally ok.

Game companies don't "copy and paste" old versions of games to jump start the development on new versions of games? cp doom1.c doom2.c

blister · on Jan 24, 2015

Take a look at the Serious Games/MilSim arena. A lot of companies in this space have standardized behind Unity3d (C# based) or as mods for existing engines like VBS2/3. Most of this work is done under government contract and has a fixed period of performance (often 18 months or less).

We often crank out several great games a year on time and under budget. There's a lot of cool stuff going on in this industry right now.

ido · on Jan 24, 2015

Please realize the story takes place almost 20 years ago! Things have improved a lot since then (in game dev specifically as well as in software dev in general).

There are still a lot of fucked up companies but you generally just don't hear about the development of >99% of projects.

EvanAnderson · on Jan 24, 2015

Prior discussion: https://news.ycombinator.com/item?id=4491216

gaius · on Jan 24, 2015

The team was incredibly invested in the project, and put in unheard of efforts to complete the project while sacrificing personal health and family life.

I wonder if this is "invested" as in, had a substantial equity stake and it was worth the sacrifices of health and relationships because they and their families were setup for life afterwards and retired to a beach in Malibu, or some other meaning...

wingerlang · on Jan 24, 2015

> .. because they and their families were setup for life afterwards and retired to a beach in Malibu ..

And if the game flopped then? Even a success doesn't guarantee the above does it?

gaius · on Jan 25, 2015

Sure, but they weren't taking that chance were they? They were just being exploited.

soup10 · on Jan 24, 2015

Ironically the horrible path finding gave the game much of it's charm and interesting micro mechanics. The default attack move would cause the units to converge into a single file march... which is suicide against a grouped enemy or defended position.

zzleeper · on Jan 24, 2015

Not that it's not interesting, but a [2012] next to the title would have been helpful

angersock · on Jan 24, 2015

Not really, as the article is covering a project near 20 years old.

Who gives a shit when it was written?

loteck · on Jan 24, 2015

Previous discussions of this entertaining series from 2012 here:

https://news.ycombinator.com/item?id=4491216

https://news.ycombinator.com/item?id=4582123

https://news.ycombinator.com/item?id=5252003

hybridtupel · on Jan 24, 2015

Does anybody know what happened with part three? Was it ever published?

realrocker · on Jan 24, 2015

Really like the story-telling style of the article. Was wondering where could I find a compilation of similar "programming tales"? Apart from http://www.folklore.org/ of course.

brainburn · on Jan 24, 2015

He says the maximum number of units is 1600.

If 6 players mindcontrol a drone and an scv and build additional armies, the total number of units could way exceed 1600. Or what am I missing?

ygra · on Jan 24, 2015

You'd get the wonderful error »Cannot create more units«. There also was a limit with certain sprites that counted towards that. It was not an uncommon error to see in certain UMS maps that had way too many units running around.

plorkyeran · on Jan 24, 2015

You'd also occasionally hit it in zerg-heavy 8 player melee games due to that larvae and overlord count against the limit without consuming any supply. It was always pretty hilarious to see a small pack of mutalisks kill a carrier doom-fleet which couldn't launch any interceptors due to the unit limit.