There are a number of features that make AV1 structurally more suited to real-time implementations than its predecessor, VP9.
For example, it does adaptive entropy coding instead of explicitly coding probabilities in the header. That means that you don't need to choose between making multiple passes over the frame (one to count symbol occurrences and one to write the bitstream using R-D optimal probabilities) or encoding with sub-optimal probabilities (which can have an overhead upwards of 5% of the bitrate). libaom has always been based on a multi-pass design, as was libvpx before it, but rav1e only needs a single pass per frame (we may add multiple passes for non-realtime use cases later).
In another example, AV1 has explicit dependencies between frames. VP9 maintained multiple banks of probabilities which could be used as a starting point for a new frame. But any frame was allowed to modify any bank. So if you lost a frame, you had no idea if it modified the bank of probabilities used by the next frame. In AV1, probabilities (and all other inter-frame state) propagate via reference frames. So you're guaranteed that if you have all of your references, you can decode a frame correctly. This is important if you want to make a low-latency interactive application that never shows a broken frame.
Some of its tools also become more effective in low-complexity settings. One of the new loop filters, CDEF, gives somewhere around a 2% bitrate savings using objective metrics when tested with libaom running at its highest complexity (although subjective testing suggests the actual improvement is larger). However, when you turn down the complexity, the improvement from CDEF goes up to close to 8%. I.e., using this filter helps you to take shortcuts elsewhere in the encoder.
The real reason the reference encoder is so slow is that it searches a lot of things. You can always make things run faster by searching less. Take a look at http://obe.tv/about-us/obe-blog/item/54-a-look-at-the-bbc-uh... to see how drastically people are limiting HEVC to make it run in real time today (though if you have to go up to 35 Mbps to do so, one might wonder what the point is).
For example, it does adaptive entropy coding instead of explicitly coding probabilities in the header. That means that you don't need to choose between making multiple passes over the frame (one to count symbol occurrences and one to write the bitstream using R-D optimal probabilities) or encoding with sub-optimal probabilities (which can have an overhead upwards of 5% of the bitrate). libaom has always been based on a multi-pass design, as was libvpx before it, but rav1e only needs a single pass per frame (we may add multiple passes for non-realtime use cases later).
In another example, AV1 has explicit dependencies between frames. VP9 maintained multiple banks of probabilities which could be used as a starting point for a new frame. But any frame was allowed to modify any bank. So if you lost a frame, you had no idea if it modified the bank of probabilities used by the next frame. In AV1, probabilities (and all other inter-frame state) propagate via reference frames. So you're guaranteed that if you have all of your references, you can decode a frame correctly. This is important if you want to make a low-latency interactive application that never shows a broken frame.
Some of its tools also become more effective in low-complexity settings. One of the new loop filters, CDEF, gives somewhere around a 2% bitrate savings using objective metrics when tested with libaom running at its highest complexity (although subjective testing suggests the actual improvement is larger). However, when you turn down the complexity, the improvement from CDEF goes up to close to 8%. I.e., using this filter helps you to take shortcuts elsewhere in the encoder.
The real reason the reference encoder is so slow is that it searches a lot of things. You can always make things run faster by searching less. Take a look at http://obe.tv/about-us/obe-blog/item/54-a-look-at-the-bbc-uh... to see how drastically people are limiting HEVC to make it run in real time today (though if you have to go up to 35 Mbps to do so, one might wonder what the point is).