For real time user applications such as games, newsfeeds, and mobile apps that have visual/audio feedback, I'm certain that the most important feature is the ability to have applications themselves control the process of collecting and the time constraints to operate within. It is possible to have a collector that is less efficient, but be entirely perceived to be more responsive as long as the wasteful or less efficient work is done at times when it does not matter - and this tradeoff would be gladly welcomed.
So much of UI application time is idle. Even during fluid animations, the majority of the frame time goes wasted. But then something like Objective-C's ARC frees large trees of memory, or perhaps Java's GC collects, and causes your frame time to exceed 16ms. For UI applications, there are periods of time (during animations) where you must hit 16ms deadlines, and the difference between 1ms and 15ms is nothing, but the difference between 16ms and 17ms is everything. UI application animations are like train schedules. If you miss one train by only a microsecond, you still have to wait the entire time until the next train arrives. You don't get points for almost making the train. Furthermore, only the application can know the train schedule. It isn't something the language runtime can possibly know, so to do this right, the language must anticipate this and accept input from the application frameworks.
Then there are other times when we are not performing an animation and we know that we could block a thread for as much as 50ms, without having any perceived delay experienced by the user. The latency constraints for starting a continuous interaction are larger than the constraints for continuing a continuous interaction. So in this case, our application still knows the train schedule, it's just a different train schedule that allows for more time to kill. If applications could tell the GC about this, it might decide that it's a good time to perform a major collection.
I've found that many of what people consider to be performance problems in UI applications aren't problems of efficiency, they're problems of scheduling.
Another anecdote that has less to do with garbage collectors, but still shows the importance frame alignment:
I had been developing a UI interaction for a mobile app, and had been experiencing the strangest stutters while dragging. For the longest time, I blamed the garbage collector (which I realize is the lazy way out - no one will argue with you). However, I eventually realized that the exact same code that executed while dragging could be played on a screen-refresh-driven event and would be as smooth as butter.
What was even stranger, is that my two year old iPhone 5 would execute this dragging interaction much more smoothly than my brand new iPhone 6+. Something was going on here clearly.
So I took the garbage collector out of the equation, and just measured the raw hardware events for screen refreshes and touch events and I found that touches may be coming in at a steady 60fps (just like the screen), but they were aligned in the middle of the frame. This was only happening on my higher end device, not on the lower end device which explains why my iPhone 5 felt better than my 6+. This is really bad because if your handler takes a mere 9ms, it will end up boarding the next train, along with the next* touch handler. So zero passengers hop on the first one, and two passengers hop on the next one. What we really wanted was one passenger per train in order to create smooth interactions/animations. What looked like a performance issue, is actually a simple alignment issue! What's worse, is that if you had a naive "fps counter", that must measured how many event handlers completed per second on average, it wouldn't drop below 60fps!
So not only is it important to be able to schedule work (GC or anything really during available frame time), it's important that we properly align this work with the screen refresh rate, both of which are things that can only be done above the language layer.
the difference between 1ms and 15ms is nothing, but the difference between 16ms and 17ms is everything. UI application animations are like train schedules. If you miss one train by only a microsecond, you still have to wait the entire time until the next train arrives.
This is why we need a better display protocol. Why are we still treating our displays like synchronous, scanline CRTs? The display should have a simple buffer into which we can do random reads/writes at any time and the affected pixels will be updated immediately. This would solve all of these problems (and eliminate video tearing)!
Yes, this would be much better than the status quo, and hopefully technologies like "gsync" will bring something like it to us soon. But this approach will still result in microstutter from frames being displayed at somewhat random times, that may or may not be perceptible (I've never used such a display). Obviously a bit of microstutter is preferable to tearing or missing frames entirely under heavy loads, but really, a synchronous refresh rate would be ideal, if only our process schedulers were designed to give strict priority, more reliable vblank timing, and some degree of safety from preemption to the foreground UI process.
This approach can still cause perceptible latency if the user tries to input something while the GC you were trying to hide is ongoing.
For really intense graphical applications, like games, there is never a resting period where you can afford to spend a frame or more of time collecting garbage.
And for simpler graphical applications where you could perhaps get away with this, it's kind of pathetic that programmers today (or rather, their programming languages) even have trouble hitting 60 FPS on modern machines, when you look back at what people were able to accomplish in the past on hardware several orders of magnitude less powerful.
> For really intense graphical applications, like games, there is never a resting period where you can afford to spend a frame or more of time collecting garbage.
Yeah, it's really interesting to hear graphics programmer's opinions on garbage collection. I forget if it was Jon Blow or Casey Muratori who said something like (paraphrasing): "The bottom line is that GC advocates just have different standards for performance."
And while not a GC related thing, but more a performance related one. I find it absurd that when I use the Windows computers at school it takes me upwards of 30 seconds to open a document in Word.
>> This approach can still cause perceptible latency if the user tries to input something while the GC you were trying to hide is ongoing.
With all due respect, I think you're missing the point on this one. For non-games, there are times when blocking for n frames will be perceptible and times when it will not be and we know exactly when these are. Garbage collection (such as ARC) has nothing to do with that particular fact. It doesn't matter what you are doing during those times - you could be performing longer running computation, or garbage collection tasks (such as ARC deallocing large trees, or the JVM). The way this is handled in applications is by either offloading long running tasks to another thread of finding a way to distribute the time across several event loops (whether intentional or not). I'm proposing that the same techniques that we apply to other parts of our apps be applied to garbage collectors.
Also, that threshold for a continuous animation, is absolutely less than the threshold for perceptible instantaneous latency and that is determined by biology, not me. That means you have more time to work with when there is no animation ongoing but one could begin as the result of a tap, (or typing). Now, the OS/platform does consume a large part of this threshold - but that's a different story.
That's a good point about games never having an entire resting frame. But that brings me back to the first desirable GC feature - to provide a way for a GC to use the remaining frame time of thousands of frames, so that it can perform its work incrementally in the remaining time of each individual frame, without ever missing a display link deadline. I think both are important (sub frame yielding to incremental tasks like collections, and yielding entire frames (or two) to longer blocking tasks for applications that can withstand it). Games might not be able to take advantage of the later, but the majority of non-game apps that you have installed on your phone would certainly.
Some would argue that "for application types X there is no available frame time". I would say that is almost always false. Otherwise application type X would drop frames on any machine that was slightly less performant.
I do think your logic holds for many combinations of users and infrequently-used applications. If your "brain loop" works like: press a button, then wait to see the result and completely mentally parse the screen before pressing anything else, then yes, there will be a few dozen frames after any action in which the program can safely run a garbage collector. This is the way we all operate when we first start using a new program. But then, for many users, once they've become familiar enough with that program, they no longer need to think about most of the screens, and can rely entirely on muscle memory to navigate.[1] At this point, any additional latency is certainly noticed. The best example would be typing; at least on HN, the vast majority of us can touch type, and any pauses between typing and characters appearing on screen (such as the incessant pauses I experience with Firefox and iOS Safari) are incredibly annoying. And since typing is an important part of almost any application, that basically necessitates a never-ending "continuous interaction," I don't think you can discount the importance of a highly interactive feedback loop for everyday programs.
I don't really see how you were discussing incremental GC in your original post, but yes, incremental GCs designed to scan their heaps in small bursts do exist and are common in embedded languages.
I wish that modern OSes and programming paradigms had better support for signals/"software interrupts", so that, even if I don't think garbage collecting everything is the right strategy for interactive applications, at least you could say "let me run my GC/other background task up until exactly the moment that the next frame starts," instead of having to manually parcel out work into tiny units while poling a timer.
[1] This is a big part of why I prefer "tacky" 90s Windows and UNIX UIs to modern animation-heavy ones. Once I've "learned" the program, I'm not even consciously looking at the UI, so the superfluous animations just force me to add delays of my own to my muscle memory, instead of letting me work as quickly as my muscles can move. Programmers tend to get this right for keyboard-driven interfaces, but they tend to underestimate our ability to use muscle memory for mouse and touch-driven interfaces.
>> I do think your logic holds for many combinations of users and infrequently-used applications.
Do you consider the Facebook app infrequently used? Or virtually every other app in the iOS app store that has a UIControl embedded inside of a scroll view such that highlighting of elements is intentionally delayed? There is certainly a few frames worth of work that can be done while no interruptable animations are occurring and while no interaction is occurring. If you don't believe me, then I suggest you do what I often do: use a very high speed camera and examine your casual usage of an app. You'll find that you are much slower than you think you are.
The latency constraints for starting a continuous interaction are larger than the constraints for continuing a continuous interaction.
I think this is true in general, but there are common cases where you do notice the delay. It's easy to notice a few frames of delay when you start scrolling a web page, for example (which often happens with pages using synchronous touch events). If you want a truly great user experience, you need to make sure your GC never adds more than a few ms of latency. Ideally you'd collect every frame so the latency is consistent.
A few frames on top of the platform/OS/hardware latency may begin to cross into the threshold of perceptions for initiation of continuous movement. If your hardware already has 30ms of latency, then a few additional frames makes approximately 70ms. What you actually see with your eyes when your program consumes 30ms at the start of a scroll may be much larger.
However, you already have a few pixels (not time) of intentionally introduced slop in a scroll view. I'd like the ability to play with all of these constraints (including slop pixel amount), remaining frame time, acceptable latency for initiation of gestures, in order to reach what I consider to be a great user experience.
For real time user applications such as games, newsfeeds, and mobile apps that have visual/audio feedback, I'm certain that the most important feature is the ability to have applications themselves control the process of collecting and the time constraints to operate within. It is possible to have a collector that is less efficient, but be entirely perceived to be more responsive as long as the wasteful or less efficient work is done at times when it does not matter - and this tradeoff would be gladly welcomed.
So much of UI application time is idle. Even during fluid animations, the majority of the frame time goes wasted. But then something like Objective-C's ARC frees large trees of memory, or perhaps Java's GC collects, and causes your frame time to exceed 16ms. For UI applications, there are periods of time (during animations) where you must hit 16ms deadlines, and the difference between 1ms and 15ms is nothing, but the difference between 16ms and 17ms is everything. UI application animations are like train schedules. If you miss one train by only a microsecond, you still have to wait the entire time until the next train arrives. You don't get points for almost making the train. Furthermore, only the application can know the train schedule. It isn't something the language runtime can possibly know, so to do this right, the language must anticipate this and accept input from the application frameworks.
Then there are other times when we are not performing an animation and we know that we could block a thread for as much as 50ms, without having any perceived delay experienced by the user. The latency constraints for starting a continuous interaction are larger than the constraints for continuing a continuous interaction. So in this case, our application still knows the train schedule, it's just a different train schedule that allows for more time to kill. If applications could tell the GC about this, it might decide that it's a good time to perform a major collection.
I've found that many of what people consider to be performance problems in UI applications aren't problems of efficiency, they're problems of scheduling.