It seems like we are coming full circle in try to reduce latency in all parts of...

perl4ever · on Sept 14, 2020

I think that with the end of Moore's law, there is a gradual unwinding of software bloat and inefficiency. Once the easy hardware gains die away, it is worth studying how software can be better, but it's a cultural transition that takes time.

jhayward · on Sept 14, 2020

It is also the case that in some/many areas software has improved performance by several multiples of the improvement in hardware performance over the last 20 years. So it makes sense that this would invigorate software performance investment.

mpweiher · on Sept 15, 2020

Can you give some examples of areas where software performance improvement has been several multiples of hardware improvement?

Most of the examples I can think of are ones where the software slowdown has more than cancelled out the hardware improvements. Then there are a some areas where hardware performance improvement was sufficient to overcome software slowdown. Software getting faster? Software getting faster than hardware??

MayeulC · on Sept 15, 2020

Compilers are way smarter and can make old code faster than it used to run. They also parse the code way faster, and would as such compile faster if you restrict them to the level of optimizations they used to do.

"Interpreters" are faster as well: Lisp runs faster, JavaScript is orders of magnitudes more efficient as well.

Algorithms have been refined. Faster paths have been uncovered for matrix multiplication [1] (unsure if the latest improvements are leveraged) and other algorithms.

Use-cases that have been around fo run a wile (say, h.264 encode/decode) are more optimized.

We now tend to be a lot better at managing concurrency too (see: rust, openmp and others), with the massively parallel architectures that come out nowadays.

[1] https://en.m.wikipedia.org/wiki/Matrix_multiplication_algori...

mpweiher · on Sept 16, 2020

Thanks for the examples!

However, I can't really agree that they are examples of software improvements being multiples of hardware improvements.

1. Compiler optimizations

See Proebsting's Law [1], which states that whereas Moore's Law provided a doubling of performance ever 18-24 months, compiler optimizations provide a doubling every 18 years at best. More recent measurements indicate that this was optimistic.

2. Compilers getting faster

Sorry, not seeing it. Swift, for example, can take a minute before giving up on a one line expression, and has been clocked at 16 lines/second for some codebases.

All the while producing slow code.

See also part of the motivation for Jonathan Blow's Jai programming language.

3. Matrix multiplication

No numbers given, so ¯\_(ツ)_/¯

4. h.264/h.265

The big improvements have come from moving them to hardware.

5. Concurrency

That's hardware.

[1] https://www.semanticscholar.org/paper/On-Proebsting%27%27s-L...

MayeulC · on Sept 16, 2020

You are reading my examples in bad faith :) (though I originally missed your point about "multiples of")

You want examples where software has sped up at a rate faster than hardware (meaning that new software on old hardware runs faster than old software on new hardware).

Javascript might not have been a good idea in the first place, but I bet that if you were to run V8 (if you have enough RAM) on 2005-era commodity hardware, it would be faster than running 2005-SpiderMonkey on today's hardware. JIT compilers have improved (including lisps, php, python, etc).

Can you give me an example of 2005-era swift running faster on newer hardware than today's compiler on yesterday's hardware? You can't, as this is a new language, with new semantics and possibilities. Parsing isn't as simple as it seems, you can't really compare two different languages.

These software improvements also tend to pile up along the stack. And comparing HW to SW is tricky: you can always cram more HW to gain more performance, while using more SW unfortunately tends to have the opposite effect. So you have to restrict yourself HW-wise: same price? same power requirements? I'd tend to go with the latter as HW has enjoyed economies of scale SW can't.

Concurrency might be hardware, but in keeping with the above point, more execution cores will be useless for a multithread-unaware program. Old software might not run better on new HW, but old HW didn't have these capabilities, so the opposite is probably true as well. Keep in mind that these new HW developments were enabled by SW developments.

> No numbers given, so ¯\_(ツ)_/¯

Big-O notation should speak for itself, I am not going to try and resurrect a BLAS package from the 80s to benchmark against on a PIC just for this argument ;) Other noteworthy algorithms include the FFT [1]. (I had another one in mind but lost it).

> The big improvements have come from moving them to hardware.

I'm talking specifically about SW implementations. Of course you can design an ASIC for most stuff. And most performance-critical applications probably had ASICs designed for them by now, helping prove your point. SW and HW are not isolated either, and an algorithm optimized for old HW might be extremely inneficient on new HW, and vice-versa.

And in any case, HW developments were in large part enabled by SW developments with logic synthesis, place and route, etc. HW development is SW development to a large extent today, though that was not your original point.

What can't be argued against, however, is that both SW and HW improvements have made it much easier to create both HW and SW. Whether SW or HW has been most instrumental with this, I am not sure. They are tightly coupled: it's much easier to write a complex program with a modern compiler, but would you wait for it to compile on an old machine? Likewise for logic synthesis tools and HW simulators. Low-effort development can get you further, and that shows. I guess that's what you are complaining about.

[1] https://en.wikipedia.org/wiki/Fast_Fourier_transform#Algorit...

mpweiher · on Sept 17, 2020

> I originally missed your point about "multiples of"

That wasn't my point, but the claim of the poster I was replying to, and it was exactly this claim that I think is unsupportable.

Has some software gotten faster? Sure. But mostly software has a gotten slower and the rarer cases of software getting faster have been outpaced significantly by HW.

> You want examples where software has sped up at a rate faster than hardware

"several multiples":

that in some/many areas software has improved performance by several multiples of the improvement in hardware performance over the last 20 years

> [JavaScript] JIT compilers have improved

The original JITs were done in the late 80s early 90s. And their practical impact is far less then the claimed impact.

http://blog.metaobject.com/2015/10/jitterdammerung.html

As an example the Cog VM is a JIT for Squeak. They claim a 5x speedup in bytecodes/s. Nice. However the naive bytecode interpreter, in C, on commodity hardware in 1999 (Pentium/400) was 45 times faster than the one microcoded on a Xerox Dorado in 1984, which was a high-end, custom-built ECL machine costing many hundred thousands of dollars. (19m bytecodes/s vs. 400k bytecodes/s).

So 5x for software, at least 45x for hardware. And the hardware kept improving afterward, nowadays at least another 10x.

> [compilers] Parsing isn't as simple as it seems [..]

Parsing is not where the time goes.

> 2005-era swift running faster

Swift generally has not gotten faster at all. I refer you back to Proebsting's Law and the evidence gathered in the paper: optimizer (=software) improvements achieve in decades what hardware achieves/achieved in a year.

There are several researchers that say optimization has run out of steam.

https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf

https://www.youtube.com/watch?v=r-TLSBdHe1A

(the difference between -O2 and -O3 is just noise)

> Big-O notation should speak for itself

It usually does not. Many if not most improvements in Big-O these days are purely theoretical findings that have no practical impact on the software people actually run. I remember when I was studying that "interior point methods" were making a big splash, because they were the first linear optimization algorithms that had polynomial complexity, whereas the Simplex algorithm is NP-hard. I don't know what the current state is, but at the time the reaction was a big shrug. Why? Although Simplex is NP-hard, it typically runs in linear or close to linear time and is thus much, much faster than the interior point methods.

Similar for recent findings of slightly improved multiplication algorithms. The n required for the asymptotic complexity to overcome the overheads is so large that the results are theoretical.

> FFT

The Wikipedia link you provided goes to algorithms from the 1960s and 1940s, so not sure how applicable that is to the question of "has software performance improvement in the last 20 years outpaced hardware improvement by multiples?".

Are you perchance answering a completely different question?

> [H264/H265] I'm talking specifically about SW implementations

Right, and the improvements in SW implementations don't begin to reach the improvement that comes from moving significant parts to dedicated hardware.

And yes, you have to modify the software to actually talk to the hardware, but you're not seriously trying to argue that this means this is a software improvement??

MayeulC · on Sept 22, 2020

Another example recently cropped on HN: https://news.ycombinator.com/item?id=24544232

> Parsing is not where the time goes.

Not with the current algorithms.

But let's agree to put this argument to a rest. I generally agree with you that

1. Current software practices are wasteful, and it's getting worse

2. According to 1. most performance improvements can be attributed to HW gains.

I originally just wanted to point out that this was true in general, but that there were exceptions, and that hot paths are optimized. Other tendencies are at play, though, such as the end of dennard's scaling. I tend to agree with https://news.ycombinator.com/item?id=24515035 and to achieve future gains, we might need tighter coupling between HW and SW evolution, as general-purpose processors might not continue to improve as much. Feel free to disagree, this is conjecture.

> And yes, you have to modify the software to actually talk to the hardware, but you're not seriously trying to argue that this means this is a software improvement??

My point was more or less the same as the one made in the previously linked article: HW changes have made some SW faster, other comparatively slower. These two do not exist in isolated bubbles. I'm talking of off-the-shelf HW, obviously. HW gets to pick which algorithms are considered "efficient".

mpweiher · on Sept 23, 2020

> Parsing [current algorithms]

Recursive descent has been around forever, the Wikipedia[1] page mentions a reference from 1975[2]. What recent advances have there been in parsing performance?

> 1. Current software practices are wasteful, and it's getting worse

> 2. According to 1. most performance improvements can be attributed to HW gains.

Agreed.

3. Even when there were advances in software performance, they were outpaced by HW improvements, certainly typically and almost invariably.

[1] https://en.wikipedia.org/wiki/Recursive_descent_parser#Refer...

[2] https://archive.org/details/recursiveprogram0000burg

jhayward · on Sept 15, 2020

I think a better answer than I could give can be found at [1]. One pull quote they highlight:

But the White House advisory report cited research, including a study of progress over a 15-year span on a benchmark production-planning task. Over that time, the speed of completing the calculations improved by a factor of 43 million. Of the total, a factor of roughly 1,000 was attributable to faster processor speeds, according to the research by Martin Grotschel, a German scientist and mathematician. Yet a factor of 43,000 was due to improvements in the efficiency of software algorithms.

[1] https://cstheory.stackexchange.com/questions/12905/speedup-f...

mpweiher · on Sept 16, 2020

Fun example, thanks for pointing it out!

I actually took Professor Grötschel's Linear Optimization course at TU Berlin, and the practical optimization task/competition we did for that course very much illustrates the point made in the answer to the stackexchange question you posted.

Our team won the competition, beating the performance not just of the other student teams' programs, but also the program of the professor's assistants, by around an order of magnitude. How? By changing a single "<" (less than) to "<=" (less than or equal), which dramatically reduced the run-time of the dominant problem of the problem-set.

This really miffed the professor quite a bit, because we were just a bunch of dumb CS majors taking a class in the far superior math department, but he was a good sport about it and we got a nice little prize in addition to our grade.

It also helped that our program was still fastest without that one change, though now with only a tiny margin.

The point being that, as the post notes, this is a single problem in a single, very specialized discipline, and this example absolutely does not generalize.

jhayward · on Sept 16, 2020

I think that your original comment is the one that risks over-generalization. The "software gets slower" perception is in a very narrow technical niche: user interface/user experience for consumer applications. And it has a common cause: product managers and developers stuff as much in the the user experience as they can until performance suffers. But even within that over-stuffing phenomenon you can see individual components that have the same software improvement curve.

In any area where computing performance has been critical - optimization, data management, simulation, physical modeling, statistical analysis, weather forecasting, genomics, computational medicine, imagery, etc. there have been many, many cases of software outpacing hardware in rate of improvement. Enough so that it is normal to expect it, and something to investigate for cause if it's not seen.

mpweiher · on Sept 16, 2020

First, I think your estimate of which is the niche, High Performance Computing or all of personal computing (including PCs, laptops, smartphones, and the web) is not quite correct.

The entire IT market is ~ $3.5 trillion, HPC is ~ $35 billion. Now that's nothing to sneeze at, but just 1% of the total. I doubt that all the pieces I mentioned also account for just 1%. If so, what's the other 98%?

Second, there are actually many factors that contribute to software bloat and slowdown, what you mention is just one, and many other kinds of software are getting slower, including compilers.

Third, while I believe you that many of the HPC fields see some algorithmic performance improvements, I just don't buy your assertion that this is regularly more than the improvements gained by the massive increases in hardware capacity, and that one singular example just doesn't cut it.

stephc_int13 · on Sept 14, 2020

I also think that the end of the so called Moore Law will create the necessary incentives to build better software.

ginko · on Sept 15, 2020

What's with the "so called"? Moore's Law is a pretty well-established term.

stephc_int13 · on Sept 16, 2020

It is not a law, it is merely an observation.

MayeulC · on Sept 15, 2020

> But I feel like we have a number of years to go until we can really get back to where we used to be with vintage gaming consoles and CRT displays.

Interesting. CRTs also had a fixed framerate. Let's make that 60 fps for the sake of the argument.

It really depends on what you are calling "vintage". Most later consoles (with GPUs) just composited images at a fixed frame-rate.

Earlier software renderers are quite interesting, though, in that they tend to "race the beam" and produce pixel data a few microseconds before it is displayed. Does that automatically transfer to low latency? I'm not sure. If the on-screen character is supposed to move by a few pixels with the last input, it really depends on whether you have drawn it already. Max latency is 16 ms, min is probably around 100 µs. That gives you 8 ms of expected latency. And I think you still get tearing, in some cases.

There is also no reason it couldn't be done with modern hardware, except the wire data format for HDMI/DP might need to be adjusted.

However, and I've said that for a long time, one key visible difference between CRTs and LCDs is persistence. Images on a LCD persist for a full frame, instead of letting your brain interpolate the images. The result is a distinctively blurry edge on moving objects. Some technologies such as backlight strobing (aka ULMB) aim to aleviate this (you likely need triple buffering to combine this with adaptative sync, which I haven't seen).

I wonder if rolling backlights could allow us to race the beam once again? QLED/OLED displays could theoretically bring a better experience than CRTs if the display controller allowed it: every pixel emits its own light, so low persistence is achievable. You don't have a beam with fixed timings, so you could just update what's needed in time for displaying it.

dan-robertson · on Sept 16, 2020

A separate latency difference is in how long it takes for the pixels to switch. Ie the time between when the device starts trying to change the colour of a pixel and when you see that colour change. This is a relatively long time for an lcd but not so long for a crt

MayeulC · on Sept 16, 2020

Well, isn't it an aspect of persistence?

This is usually called ghosting, and can be combated using "overdrive";

https://blurbusters.com/faq/oled-motion-blur/

https://blurbusters.com/faq/lcd-overdrive-artifacts/

Edit: regarding my older comment, I thought that QLED were quantum dots mounted on individual LEDs. They are regular LCDs, with quantum dots providing the colour conversion. That makes more sense from an economical perspective, less so for performance. Maybe OLED and QLED could be combined to leverage the best OLED for all colors?

dan-robertson · on Sept 19, 2020

If you draw a graph of the signal going into a monitor (above) vs the luminosity of a pixel, it might look like this for a crt:

  ^
  |   _______________
  |  /
  | |
  | |
  |_/
  |_________________> t
  
  ^
  |      /\
  |     /  \
  |    |    \
  |    |     \
  |    |      `-._
  |____/__________`-.___>t

And the luminosity like this for an lcd:

  ^
  |             _________
  |           .’
  |          /
  |         /
  |        /
  |______-’______________>t

Persistence is the time it takes between when the luminosity goes up and when it goes down. It is low for CRTs and high for LCDs (which stay on until they need to switch unless the monitor is low persistence (or low persistence is emulated with a high refresh rate and black frame insertion).

The switching time is the time between when the signal starts (or rather when the monitor has finished working out what the signal means and has started trying to change the luminosity) and when it gets bright (or in the case of LCDs, to the right luminosity).

Ghosting comes from not switching strongly enough and overdrive (switching too hard) tends to lead to poor colour accuracy.

MayeulC · on Sept 22, 2020

Nice charts, did you draw them by hand?

You are right. That directly contributes to latency. This effect is called pixel response time [1], and is usually measured "grey-to-grey" [2]. Nowadays, though, I think monitors usually have a short response time (<1ms for "gamer" monitors).

> Persistence is the time it takes between when the luminosity goes up and when it goes down

Right. But that proves the two are somewhat linked, especially as pixel response isn't symmetric (0->1 and 1->0).

Reading a bit more into it, the industry terms for persistence seem to be both grey-to-grey (GtG), and Moving Picture Response Time (MPRT). The latter is a measurement method [3] for perceived motion blur. It directly depends on the time each pixel remains lit with the same value ("persistence" strikes again), so a slow (>1 frame) or incomplete transition can create motion blur (contributes to persistence).

> low persistence is emulated with a high refresh rate and black frame insertion

It can also be achieved wit backlight strobing on backlit displays: strobing for 1ms on a "full-persistence" (pixel always on) display gives a 1ms persistence regardless of the frame rate. A ~1000 FPS display would be necessary to get the same level of persistence with black frame insertion alone. I believe this is part of the reason Valve went with an LCD instead of an OLED screen to get the 0.33ms persistence on the Index HMD.

[1] https://blurbusters.com/gtg-versus-mprt-frequently-asked-que...

[2] (from the above) https://hal.archives-ouvertes.fr/hal-00177263/document

[3] (also from blurbusters article) https://lcd.creol.ucf.edu/Publications/2017/JAP%20121-023108...

ric2b · on Sept 19, 2020

Those charts are awesome, props for the creativity.

ajconway · on Sept 14, 2020

It seems to be a normal engineering process: first make it work, then look for those high-level things that you are ready to break in order to gain performance through optimization and micro-optimization.