While it is true that older software is extremely snappy if you compare it with ...

VortexDream · on July 3, 2021

Honestly, I don't see why any of these things require such terrible performance characteristics.

kilburn · on July 4, 2021

I tried to use examples where some inherent performance penalties where easy to see:

- Unicode text: All text consumes more memory (because the character space is much larger). Basic text processing ("wrap this paragraph at 80 characters") becomes much harder, not just because bytes != glyphs, but also because glyphs can combine.

- Security improvements: we now have various sandboxing, isolation, execution-protection, etc.. features in OSes. The performance impact of some of those is negligible thanks to new hardware features to help with them, but others still have a significant cost. Furthermore, some performance tricks used in old systems would simply violate the current security models and are hence impossible to do anymore.

- Driver isolation: this is similar to the above. The OS is now doing more work to ensure drivers behave, the isolation forbids some more performant pathways, etc.

- Wifi with some reliability: this was an example of progress being achieved. Wifi protocols are a nightmare, and I'm still amazed that they work at all.

- Better trackpads: another progress example. The big issue here has been the development of smarter algorithms (and the hardware refinement to back them up). This is something that seems simple, but it took a very long time to get this anywhere acceptable (even after Apple showed the world it was possible). I can only assume that it is actually a pretty hard problem underneath.

- Files that don't get corrupted: we have needed years and years of iterative improvements to finally get here, both at the FS level and on the programs above (think DBs). We are now going through journal logs, memory barries, checksumming and verifying data, using copy-on-write for FSs, etc.. All these things have non-negligible runtime and camplexity costs.

In general, we have been prioritizing to make more stuff and/or making the stuff more correct, disregarding the performance aspect so long as it remains good enough (from the POV of the developers).

dmitriid · on July 4, 2021

> Security improvements: we now have

Everyone agrees that M1 is a very fast chip, much faster than the current Intel chips shipping with Macbooks.

And yet: when I trigger the native "Open File" dialog from IDEA, it still takes MacOS up to a second to verify permissions on a list of directories and mark them as available.

So, given that:

- M1 is blazingly fast

- modern SSDs pump GBs of data per second

- RAM on M1 is almost literally a part of the CPU, and the bandwidth of modern RAM is also GBs per second

why does it take up to a second to verify permissions on a list of five directories?

> Unicode ... Wifi ... Trackpads ...

None of these require GBs of RAM and 16-core processors to barely run.

bruce343434 · on July 3, 2021

> - You can have a trackpad that doesn't suck

Go on...

dmitriid · on July 3, 2021

> In any case, a lot of the slowness we pay for toady is in exchange of actually being able to deal with the complexities necessary to achieve those things in reasonable time/cost.

This is also debatable at least.

Just a few weeks ago it turned out that the new Windows Terminal can only do color output at 2fps [1].

The very infuriating discussion in the GitHub tracker ended up with a Microsoft tea member saying that you need a " an entire doctoral research project in performant terminal emulation" to do colored output. I kid you not. [2]

Of course, the entire "doctoral research" is 82 lines of code [3]. There will be a continuation of the saga [4]

And that is just a very small, but a very representative example. But do watch Casey's rant about MS Visual Studio [5]

You can see this everywhere. My personal anecdote is this: With the introduction of new M1 macs Apple put it front and center that now Macs wake up instantly. For reference: in 2008 I had the exactly same behaviour on a 2007 Macbook Pro. In the thirteen years since the software has become so bad, that you need a processor that's anywhere from 3 to 15 times more powerful to barely, just barely, do the same thing [6].

The upcoming Windows 11 will require 4GB of RAM and 64GB of storage space just for the empty, barebones operating system alone [7]. Why? No "wifi works reliably" or "trackpad doesn't suck" can justify any of this.

[1] https://twitter.com/cmuratori/status/1401761848022560771

[2] https://github.com/microsoft/terminal/issues/10362#issuecomm...

[3] https://twitter.com/cmuratori/status/1405356794495442945

[4] https://twitter.com/cmuratori/status/1406755159347130371

[5] https://www.youtube.com/watch?v=GC-0tCy4P1U

[6] https://gadgetversus.com/processor/apple-m1-vs-intel-core-2-...

[7] https://www.microsoft.com/en-us/windows/windows-11-specifica...

dmitriid · on July 3, 2021

As luck would have it, here's the continuation to Windows Terminal. Casey Muratori made a reference terminal renderer: https://github.com/cmuratori/refterm

This uses all the constraints that the Windows terminal team cited as excuses: it uses Windows subsystems etc. One person, 3k lines of code, it runs 100x the speed of Windows terminal.

See the epic demo (and stay till the end for color output): https://www.youtube.com/watch?v=hxM8QmyZXtg

kilburn · on July 4, 2021

To play devil's advocate, what I see in that demo is that:

- Yeah, some lower-level stuff seems to be broken (the windows console I/O stuff).

- Microsoft's devs linked to a very nice website that quickly and nicely explains some of the quirks of modern text rendering [1].

- Casey proceeds to demonstrate a (very fast!) approach that completely ignores some of the problems laid out in that site. Namely, his entire approach is based on caching rendered glyphs, which the document explicitly states is something you can't do naively and expect correct results (section 5).

- In his very demo some of these issues pop up (terrible-looking emojis, misaligned ascii art) and he shrugs those off as if they were easy jobs.

- Other issues are never tested in his demo. Examples include: (i) dealing with text selection, which is hard according to the linked site; (ii) how is he "chunking" non-asscii character runs (also hard to do correctly); (iii) handling ligatures (terminal programs oftentimes use ligatures between basic ascii character combinations such as => to make some source code more readable)

In other words: I see an incomplete solution that addresses only the easy parts of the problems (in a very performant way!) that fundamentally cannot be extended to a correct solution for the actual/full problem. And a lot of arrogance while presenting it.

In no way does this mean that a much more performant solution doesn't exist. But Casey's cute demo is not a proof that it does because it does not solve the actual/full problem.

PS: I don't work for MS, I don't know Casey nor any of MS's devs, and I don't even use windows. I do hold a PhD though, and I know plenty of PhD's dedicated to exploring the nitty-gritty details that some people with only cursory knowledge about the problem would dismiss as "this must be a quick job".

[1] https://gankra.github.io/blah/text-hates-you/

jiggawatts · on July 4, 2021

> caching rendered glyphs, which the document explicitly states is something you can't do naively and expect correct results

But very slightly less naively you can get correct results. One trick is to realise that ClearType only increases the effective horizontal resolution. Also, despite appearing to have three times the horizontal texels available to it, the final post-processing to reduce the colour artefacts means that the effective resolution increase is only about double the pixel count. So you can get very good results by horizontally stretching your glyph cache buffer by a small factor, typically three. This is not a huge increase in memory usage, and provides more than sufficient subpixel positioning quality. Where this breaks down is if your pipeline has very complex text special effect support, such as arbitrary transforms.

But a terminal emulator needs very few special effects. It doesn't need to be rotated smoothly through arbitrary angles, which DirectText supports. It doesn't need perspective transforms, or transparent text, or a whole range of such exotic features.

In fact, 99.99% of the characters drawn by a terminal emulator will be plain ASCII aligned to a simple grid. This is a trivial problem to solve, as demonstrated in this thread. It's not PHD work, it's homework.

dmitriid · on July 4, 2021

> what I see in that demo is that:

Remember: this is a demo written by one person over a few days, and not, you know, a full team of people at a multi-billion-dollar corporation.

> Microsoft's devs linked to a very nice website

> some of these ... hard according to the linked site ...

1. It's more excuses, and 2. we are not talking about "everything on the website".

He took all the constraints that the Windows terminal team imposed (you have to use DirectDraw, you have to go through conio, you have to use the unicode subsystem in Windows), and:

- even using those constraints he sped up the terminal 10x

- he already does things than windows terminal doesn't, for example: correct Arabic with correct ligatures despite "omg this website shows why it is so hard". So yes, he caches glyphs, and still displays proper Arabic text (something Windows terminal doesn't do) with varying widths etc.

And that 10x speedup? Instantly available to Windows terminal if only they use a cache of glyphs instead calling DirectDraw on every character. This is not rocket science. This is not "phd level research thesis". This is just junior-student-level work.

As to the issues:

- he was the very first one to point them out because he a) knows about them and b) knows how to solve them

- yes, the solutions he proposes are easy: the main "issue" is DirectDraw improperly handling font substitution, and his proposed solutions (resample glyphs after draw or skip DirectDraw and write your own glyph renderer) are easy. He literally does that for a living.

- omg text selection. Yes, text selection is a hard problem. But it's not an impossible problem, and nothing about text selection should bring the renderer from 5k fps to 2fps

> some people with only cursory knowledge about the problem would dismiss as "this must be a quick job"

These are not "nitty-gritty" details. And they are definitely not from a "person with cursory knowledge". There's nothing new in terminal rendering.

AussieWog93 · on July 3, 2021

>But do watch Casey's rant about MS Visual Studio [5]

"As an online discussion about software bloat grows longer, the probability of a Casey Muratori rant being posted approaches one."

astrange · on July 3, 2021

> In the thirteen years since the software has become so bad, that you need a processor that's anywhere from 3 to 15 times more powerful to barely, just barely, do the same thing [6].

That is not "the software" unless you count EFI, the sleep process is mostly a hardware thing and controlled by Intel.