– Garbage collection should be the only option
but he also says:
By 2009, game developers will face...
The presentation isn't explicitly dated, but since the sample game is Gears of War, which came out in 2006, later on he says "by 2009" as if it was not-yet 2009, and the file name is "sweeney06games.pdf", gonna say this is almost certainly a presentation given in 2006. At the time this was given, the iPhone hadn't even come out yet.
It's an interesting presentation. I wonder what Sweeney would say today about this presentation, about how much of it still rings true, and about which things he would second-guess nowadays.
1. Completely deterministic, so no worrying about pauses or endlessly tweaking GC settings.
2. Manual ref counting is a pain, but again ARC basically makes all that tedious and error prone bookkeeping go away. As a developer I felt like I had to think about memory management in Obj C about as much as I did for garbage collected languages, namely very little.
3. True, you need to worry about stuff like ref cycles, but in my experience I had to worry about potential memory leaks in Obj C about the same as I did in Java. I.e. I've seen Java programs crash with OOMs because weak references weren't used when they were needed, and I've seen similar in Obj C.
1. Automatic reference counting adds a barrier every time you pass around an object, even if only for reading, unlike both manual memory management and tracing GC.
2. ARC still requires thought about ownership, since you have to ensure there are no cycles in the ownership graph unlike tracing/copying/compacting GC.
3. ARC still means that you can't look at a piece of code and tell how much work it will do, since you never know if a particular piece of code is holding the last reference to a piece of data, and must free the entire object graph, unlike manual memory management. In the presence of concurrency, cleanup is actually non-deterministic, as 'last one to drop its reference' is a race.
4. ARC doesn't solve the problem of memory fragmentation, unlike arena allocators and compacting/copying GC.
5. ARC requires expensive allocations, it can't use cheap bump-pointer allocation like copying/compacting GC can. This is related to the previous point.
6. ARC cleanup still takes time proportional to the number of unreachable objects, unlike tracing GC (proportional to number of live objects) or arena allocation (constant time).
Reference counting is a valid strategy in certain pieces of manually memory managed code (particularly in single-threaded contexts), but using it universally is almost always much worse than tracing/copying/compacting GC.
Note that there are (soft) realtime tracing GCs, but this can't be achieved with ARC.
1. I'm guessing you mean lock based implementations? There's several non lock, non atomics based ARC designs that still handle threading safely. That means it's little more than a single integer operation.
2. True, but for many contexts this is easy to do and makes it easier to understand data flows. In other cases it's possible to use index based references pretty readily, like in Rust. Or add a cycle collector.
3. In theory you can't, but in practice it's often pretty easy to tell. At least in non-oop languages. I use Nim with its ARC (1) on RTOS and this is really only a concern for large lists or large dynamic structures. It can be managed using the same patterns as RAII where you call child functions who know they won't be the last ones holding the bag. Also you can use the same trick as some Rust code does where you pass the memory to another thread to dispose of (2).
4/5. It depends on the implementation, but you can use pools or arenas or other options. Nim provides an allocator algorithm (tlsf) with proven O(1) times and known fragmentation properties (3). Still tracing gcs can make better usage of short lifetime arenas true. Though with ARC you get similar benefits using stack based objects.
6. It's tradeoffs. Tracing gcs also end up needing to scan the entire heap every so often. ARC's only need to check a root object during usage and only access the entire graph when de-allocating.
Your last point isn't accurate, as you can use an appropriately designed ARC in a hard realtime context. I've found it quite easy to do, but granted it takes a little bit of care, but any real time systems do. For items like interrupt handlers I ensure no memory is allocated or destroyed.
Just one nitpick though:
> 6. It's tradeoffs. Tracing gcs also end up needing to scan the entire heap every so often.
This is not accurate: tracing GCs always start from the roots and only ever visit live objects. By definition, the objects that they free are not reachable from anywhere. "Full scans" typically refer to various optimization strategies that tracing GCs implement to avoid scanning all roots (e.g. generations, per-thread scans) which do rely on still occasionally doing a full scan (scanning all live objects in all generations, or in all threads).
Don't think that's even doable at all, at least not portably. Do you have some examples?
- Nim's ARC only allows one thread to manage a reference count at a time, but enables an isolated graph of memory to be moved to another thread. The concept is called `Isolate`, and is very similar to Rust's single owner of mutable references. There's still WIP to have the compiler automate the checks, but it's usable now (I used it with FreeRTOS's xQueue mechanism just fine). https://github.com/nim-lang/RFCs/issues/244
- Python's new non-GIL proposal that does this using biased references: https://hackaday.com/2021/11/03/python-ditches-the-gils-and-...
- The source of Python's biased references: https://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf
- Defered Reference counting: https://dl.acm.org/doi/pdf/10.1145/3453483.3454060
It's pretty cool stuff!
There are ways around this overhead at least in Metal, but this requires at least as much effort as not using ARC to begin with.
> this requires at least as much effort as not using ARC to begin with.
Designing a brand new CPU architecture certainly counts as "at least as much effort", yes. ^_^
P.S. While I'm here, your "handles are the better pointers" blog post is one of my all-time favorites. I appreciate you sharing your experiences!
> has bad cache behavior for multithreading
Why is this? Doesn't seem like that would be something inherent to ref counting.
> Also, not deterministic
I always thought that with ref counting, when you decrement a ref count that goes to 0 that it will then essentially call free on the object. Is this not the case?
determinism: you reset a pointer variable. Once in a while, you're the last referent and now have to free the object. That takes more instructions and cache invalidation.
Thank you! This was the response that made the cache issues clearest to me.
Once I know I can read the data, it usually doesn't matter that another thread is also reading it. Reference counting changes that because we both need to write to the count every time either of us takes or drops a reference to the data, and in the latter case we need to know what's happened on the other core, too. This means a lot more moving of changing data between processor cores.
> > Also, not deterministic
> I always thought that with ref counting, when you decrement a ref count that goes to 0 that it will then essentially call free on the object. Is this not the case?
That's my understanding, but is that "deterministic" as we mean it here? It's true that the same program state leads to the same behavior, but it's non-local program state, and it leads to you doing that work - potentially a lot of work, if you (eg) wind up freeing all of a huge graph structure - at relatively predictable places in your code.
There are good workarounds (free lists, etc) but "blindly throw language level reference counting at everything" isn't a silver bullet (or maybe even a good idea) for getting low-latency from your memory management.
It is inherently so. A reference count is an atomic mutable value that must be updatable by any thread.
A significant selling point of ARC compared to traditional ObjC reference counting was that the compiler could elide retain/release calls better than the programmer could, thus preventing a ton of stalls.
Determinism: the time taken is not deterministic because (1) malloc/free, which ARC uses but other GCs usually not, are no deterministic - both can do arbitrary amounts of work like coalescing or defragmenting allocation arenas, or performing system calls that reconfigure process virtual memory - and (2) cascading deallocations as rc 0 objects trigger rc decrements and deallocation of other objects.
> not deterministic perf wise.
was what parent wrote (emphasis added), I assume referring to the problem that when an object is destroyed, an arbitrarily large number of other objects -- a subset of the first object's members, recursively -- may need to be destroyed as a consequence.
The only significant cost tracing has these days is in memory footprint.
And that's not insignificant. The top-of-the-line Pixel 6 Pro has twice as much RAM as the top-of-the-line iPhone 13 Pro. Maybe the Android camp is just more eager to play the specs game, but I've long speculated that iOS simply needs less RAM because of its refusal to use tracing GC.
I used all my models until their hardware died.
It mostly means "I can know (as in "determine") how much time, or instructions, or calls, or memory an operation will take".
And this knowledge can come in degrees (be more or less fuzzy).
"Determministic" is a double edged sword. You get deterministic allocation and release times, but they might be bigger than what really is achievable.
When doing all the required optimizations, it turns into tracing GC optimizations with another more political accepted name.
The problem with GC or with reference counting is that it needs to operate on each allocated object separately. If the task for the GC can be reduced to operate on whole memory areas only, its overhead is greatly reduced.
There are no GC implementations that have no pauses. You couldn't make one without having prohibitively punitive barriers on at least one of reads and writes.
There are no operating systems that have no pauses; if you don't want to share the CPU, you're going to have to take responsibility for the whole thing yourself. Most people are not even using RTOS.
There are no reference counting implementations that have no pauses. Good luck crawling that object graph. Most people are not even using deferred forms of reference counting.
As far as I know, there is one malloc implementation which runs in constant time. No one uses it.
There are tracing GCs whose pause times are bounded to 1ms. That is enough for soft-real-time video and audio (which is what matters to video games). In general, you are not going to get a completely predictable environment unless you pick up an in-order CPU with no cache and write all your code in assembly.
(I suspect this is obvious to a lot of programmers, but especially in working with Java programmers who just skim JVM release notes and then repeat "pauseless", it's also not obvious to many too.)
Of course not! Nothing can. But—two things:
1. Per Knuth on premature optimization, 97% of your code should not care about such things. For those parts of your code that do need to effect their own allocation strategies, they can generally do so as well in a GCed language as another. Tracing GC forces minimal cognitive overhead on the remaining 97%, and allows it to interoperate smoothly with the 3%.
2. Tracing GC has better performance characteristics than other automatic memory management policies, in a few important respects. Compacting adds locality; bumping gives very fast allocation; copying gives very fast deallocation of deep graphs in the nursery.
Like I said - "of course" to me, and apparently you, and probably anyone who has worked with multiple GCs and unmanaged languages over many years - but that's not most programmers. The vast majority of Java developers I interview, up to those with 5-6y experience, have no clue how their idioms affect allocations.
> Per Knuth on premature optimization, 97% of your code should not care about such things.
No, this is the classic misstatement of Knuth. 97% of the time, I shouldn't worry about such things. But 100% of my program still might be affected by a design needed to reduce GC pressure, if that means switching core data structures to pools or SoA or something.
As a game dev I still will take a modern GC over refcounting or manual lifetime management any day. There are scenarios where you simply don't want to put up with the GC's interference, but those are rare - and modern stacks like .NET let you do manual memory management in cases where you are confident it's what you want to do. For the vast majority of the code I write, GC is the right approach - the best-case performance is good and the worst-case scenario is bad performance instead of crashes. The tooling is out there for me to do allocation and lifetime profiling and that's usually sufficient to find all the hotspots causing GC pressure and optimize them out without having to write any unsafe code.
The cost of having a GC walk a huge heap is a pain, though. I have had to go out of my way to ensure that large buffers don't have GC references in them so that the GC isn't stuck scanning hundreds of megabytes worth of static data. Similar to what I said above though, the same problem would occur if those datastructures had a smart pointer in them :/
Try-with-resources is ok, but not great.
Ultimately developers, especially those concerned with performance, prefer full control. And that means deterministic lifetime of memory and resources.
They just see C++ and don't look into the details.
Besides that, given that most engines now have a scripting layer, even if deep down on the engine GC is forbidden, the layer above it, used by the game designers, will make use of it.
It is like fostering about C and C++ performance, while forgetting that 30 years ago, they weren't welcomed on game consoles, and on the 8/16 bit home micros for games.
My phone is more powerful than any Lisp Machine or Xerox PARC workstation, it can spare some GC cycles.
Speak for yourself; most phones I see still take 2+ seconds to open some apps.
As far as desktop goes, even fewer cycles can apparently be spared - IntelliJ frequently freezes for 5s just after startup, most applications have insane startup times considering the power they have.
So, no, I don't want to add yet another slowdown because "our machines are 1000x faster than high-end digital workstations from 30 years ago", because it seems to me that all off the increases in hardware power have already been used up, and more.
The problem isn't the hardware or the language, rather how many stuff gets programmed in an age where plenty of people run for Electron apps and don't care about algorithms and data structures.
My first applications had 64KB and 3.5 MHZ at their disposal to make their users happy.
Why is that though? I've written code since the 8 bit days of BBC micros and ZX Spectrums.
Casey Muratori's (who has been on here a lot lately) talk nails it  - developers just don't realise how fast their code could and should run. And the fact that we have layers upon layers of frameworks and no one cares.
It seems like he was off by a matter of a few months at most.
I have a quad AMD Opteron(tm) Processor 6172 (Mar 2010) 2U server with 48 hardware cores. This motherboard can also take the 16 core CPU's which were also introduced the following year, which would put it at 64 cores and 128 threads.
The Opteron 6 core CPU's were introduced in June 2009, so a quad server would be 24 cores, or 48 threads.
Of course, that's not cores in a single CPU, if that's what even what the prediction was referring to, but that probably wouldn't matter very much from a gaming dev perspective anyway. It was probably a pretty good guess, all things considered.
That‘s not usually what people game on, though. Even today, 20 core CPUs are the exception for gaming rigs.
But i think the trend is clear - more cores is the future as clock speed has slowed down.
~0.31% according to the Steam hardware survey  If you are including 8-core machines with simultaneous multithreading (so 16 logical cores, 8 physical), then you would be at ~17%, however.
And of course none of these happened. GPU and CPU did not converged, they both scale far ahead than anyone could imagined.
I mean, if you ask developers in 2006, no one would thought of doing 4K rendering with HDR / 120fps and still not doing Ray Tracing.
You seem to forget the GPU, how may cores does that have? (but really the question is: isn't parallelism the default mode when working with GPUs.
Really made me see things clearer.
Haskell has a separate problem here: all of these can fail, and there's nothing in the type system to alert you of this (in the standard library), such failures just mindlessly throw exceptions like some Java monstrosity. In Rust, on the other hand, all such functions return something like Either Result Error, forcing the caller to check (and ideally handle) any errors. Not to mention async exceptions in Haskell, which can happen anywhere, and the fact that every value is really of type value|undefined, due to laziness. It's practically impossible to cleanly formally reason about code in Haskell due to the fact that anywhere could be interrupted by an async exception.
When even C++ is considering trying to remove exceptions from the standard library, Haskell's love for untyped exceptions everywhere is seriously behind the times for a language that prides itself on correctness.
> all of these can fail, and there's nothing in the type system to alert you of this (in the standard library), such failures just mindlessly throw exceptions like some Java monstrosity
There's `MonadThrow` in Control.Monad.Catch which hints that the monad in question can throw exceptions. Admittedly, partial functions like `undefined` and `error` are still usable...
> Not to mention async exceptions in Haskell, which can happen anywhere, [...] anywhere could be interrupted by an async exception
... and they can throw exceptions everywhere, just like asynchronous exceptions, but it's actually a strength! Haskell enforces a clean separation between impure code (typically in a monad) and pure code. You can only catch exceptions in the IO monad, which often lies outside of the core logic. Due to this unique strength, Haskell is one of the very few languages that can safely terminate running threads.
Impure code can become harder to write because of exceptions, but since you don't write everything in the IO monad, the problem is largely mitigated. Yes, exceptions are hard to get right, and that's exactly why other languages are trying to get rid of, but Haskell makes it quite tractable, (though still quite annoying). Rust used more Maybes and Eithers in the IO monad (to borrow jargons from Haskell), but it's also got panic, which is the actual Haskell exception counterpart.
> and the fact that every value is really of type value|undefined, due to laziness
To be pedantic, Haskell has levity polymorphism, which gives you unlifted datatypes, like in OCaml and Idris. Even older Haskell has unboxed datatypes that are not lifted.
> ...Haskell's love for untyped exceptions everywhere...
Nope, Haskell's exceptions are typed.
> Nope, Haskell's exceptions are typed.
logicchains means that the exceptions that a function can throw are not noted in its type (and as a massive Haskell fan I agree with him/her that that is very annoying).
Do those things come under the category of unsafe? Why would we be programming if it weren't to do those kinds of things? If I buy a shovel, at some point I'll probably want to dig a hole with it.
However, the goal of haskell (and indeed any language trying to be safe in the way SPJ means) is to be able to have most code be safe/effect-free, and then to have effects be very carefully controlled and limited in their use. Things like the IO monad mean many parts of haskell code can't do IO in fact.
We obviously do want some sort of effect in the end, but the idea is it's safer to contain those effects in very limited places, and not allow effects to happen literally anywhere at any time.
Note, unsafe in the SPJ video was specifically about effects, while "unsafe" in rust terminology is mostly about memory safety, so those two terms really aren't the same word, and to be honest that can make communication less clear. I don't know what "category of unsafe" meant in your comment really.
But in any case, yes those things I mentioned can fall under the category of "unsafe". In the category of memory safety, you want to be sure (with compiler guarantees!) that no thread will secretly free some allocated memory while you are still using it. This is something the borrow checker can give you in rust. There are no builtin guarantees in Rust that this fancy new library you imported won't do `DROP TABLE users`, or open a socket to https://haxorz.com and pass on all the environment variables. There are other languages in which you can be sure that a function cannot open any sockets unless it specifically has "can open sockets" encoded in its type.
In practice that's true of Haskell as well.
Anyway, controlling side effects is really about avoiding accidental/unintentional side effects, i.e. mutating the world by accident. Of course if everything is in IO, you only get "can do anything to the world" and "can't change the world at all", so Haskellers are usually interested in more fine-grained separation of effects than just pure/impure.
Of course, you are also trusting that code you're using doesn't do crazy unsafePerformIO, etc. stuff, but at least you can grep the code for that :). And sometimes unsafePerformIO can be a good thing to do 'magical' things which are referentially transparent but require mutation for asymptotics or performance generally.
 Safe Haskell is more about that kind of thing, but AFAIUI it isn't really in much use and never really took off. IIRC, it might even be slated for removal?
 Which is the ultimate shared mutable state, after all.
In Haskell, a function like 'Int -> String' is safe (definitely not bad). A function like 'Int -> IO String' is unsafe (it might be bad; we hope not). If it were possible to specify "bad" via the type system (like the type of use-after-free) then we would want that to be a type error (like it is in Rust).
I don't think that's changing the use of language at all. For example, playing one round of russian roulette is 'unsafe'; even though (a) we don't know if it will have a bad outcome or not, and (b) the chance of a good outcome is much higher than that of a bad outcome.
foo :: Int -> String
foo x = seq
Verse is more intended for Blueprint people and there is no way people working on the Blueprint code will be able to move into functional programming.
Also I don’t get the investment into Verse language itself, why not just fund LuaJit, do we really need another scripting language when LUA nailed some many things right?
I guess Epic will craft a pretty fast scripting language and try to lock game developers to their own ecosystem.
Thanks for the keyword quote. Interesting topic as I felt Spreadsheet is very under-represented form of programming. Especially when trillions of dollars relies on Excel to function properly.
It doesn't have lambdas/closures. You have to create a whole separate object in a separate file to pass something new with data that can be called later.
This is basically what Erik Meijer did isn’t it? He worked at Microsoft and produced some really nice things that had a big impact, like reactive extensions for example.
CurrentRound^ : int = 0
for (i = 1..NumberOfRounds):
CurrentRound := i
Honestly just looks like a dialect of Python. I wonder what SPJ will do with it...
Feels like "the whole everything is looking old we got new keywords! Metaverse! We need new languages...".
The older I get the more I feel like languages are just as generational as developers come into the market. They don't want to write Java, even python is getting old. We need new fresh languages every 5-10 years which frankly just feels like the same walls with a fresh coat of paint.
Honestly, if anyone can solve that problem, SPJ can!
 self-described co-designer of Typescript - https://www.linkedin.com/in/steve-lucco-b5084958/
The way to beat Apple is to develop better technology than them. It will take billions of dollars to do this, but Epic has the money.
So far it seems like they sort of vaguely want to turn Fortnite into it... but I haven't really seen much that I would call concrete.
I always thought their Travis Scott event was a pretty good example of something you could call early metaverse-ish, going beyond the usual scope of the game
Slide 46 and on (mentioned on 48) is largely metaverse and a bunch of stuff presaging the Apple lawsuit, even though Fortnite hadn't launched yet:
Of course that requires protocol interop (as did the Internet with TCP/IP and HTTP) and who knows if it will happen for the Metaverse.
I'm pretty sure that's not how they see it. I think the vision is of the singular Metaverse as the new narrow waist upon which all future apps are built.
I think it's been pretty clear that they're going to work on their own metaverse products. Epic seems to build on Fortnite , while Meta have Horizon.
I don't believe either have stated compatibility as a goal.
I know that the Web 3.0 Cryptocurrency crowd have ideas on the idea of a decentralized web with a shared ontological understanding, so maybe that's the source of the idea of there being a single metaverse?
But neither Epic nor Meta seem interested in a shared ecosystem. The metaverse is a product and their implementations will compete. I'm not sure if people would even want to be in two separate metaverses so competition will be fierce.
Most media (books/movie) sidestep that by usually having a single metaverse creator. That's in large part because it's narratively easier to deal with a single villain and universe. I don't know of any that posit having competing offerings.
The metaverses will be similar. They'll end up with many of the same apps and experiences, but devs will have to publish independently to each platform.
This sounds really interesting, I learned about STM in Haskell while working on Cardano & it seemed so obvious - I am sure there will be some seminal outcomes of this too.
Is Epic worse than Facebook? I can’t imagine much of an upside to praying on human frailties in order to make games addictive to extract money from kids.
(Yes, I know you played Mario Brothers as a child and turned out fine. So did I. But the stuff we had back in the day is extremely tame and weak compared to the hard stuff they peddle today.)
Not only that, Epic invested every single dollar they made from Fortnite back to improving Unreal Engine.
No, but that's not the point.
The "addictive" factor comes from micro-transactions with randomized returns on investment ("loot boxes"). Kids get sucked into chasing the most valuable skins for characters, weapons, etc. This either results in them spending an inordinate amount of time playing the game (increasing odds of paying money) or just paying money directly to purchase in-game loot boxes that have fractional chances of dropping the rarest gear.
It's not "all games" that are problematic in this way; it is games which are monetized through micro-transactions with loot boxes. It's essentially legal gambling for kids, and it's absolutely a problem that we should be taking more seriously.
There's still something to be said about the addictive nature of microtransactions, but loot boxes haven't been relevant to the discussion as far as fortnite goes for a few years.
I'm personally pro-entertainment, pro-culture, etc... Tons of things we spend money on in life is entertainment and frankly, useless. But it's fun.
That depends on how you define "win". As a gacha player myself (Genshin Impact), a lot of people roll because of FOMO and because they want to own a thing, not necessarily because they're going to "win".
Heck, I main Yoimiya, and she's not exactly part of the meta. Maybe I'm a sucker (no, I definitely am), but I'm not paying to "win" in the usual sense. There's plenty of exploitation to be had without attaching a competitive advantage.
Advertising and game design technicques to encourage vulnerable people to spend inordinate amounts of time and money on a game are the real problem. This leads to the current situation of most hugely successful games today, where the vast majority of players spend nothing or next to it, while a small minority of "whales" literally spend thousands of dollars monthly.