Hacker News new | past | comments | ask | show | jobs | submit login
All Sliders to the Right (acm.org)
77 points by rbanffy on April 2, 2023 | hide | past | favorite | 69 comments



Yup. A lot of modern software is beyond parody now: https://twitter.com/cmuratori/status/1640827575437250561?s=2... (the "new and improved" Microsoft Teams takes 9 seconds to display under a kilobyte of text, including 3 seconds to just display a splash screen)


Ironic that all the devs being asked Leetcode algorithm optimization questions during their interviews don’t seem to be putting any of that to use where it actually matters!


I’m sure the Teams splash screen is O(1).


Algorithmic optimization is more than just about asymptotic complexity; reducing that leading coefficient is equally important. Constant time is meaningless if the constant is measured in seconds.


That was the joke. Leetcode usually only cares about asymptotic complexity.


I thought leetcode cares about the actual run time.


It measures runtime, but not reliably, multiple runs of the same code can report very different runtimes. My impression is the focus is mostly on getting the time complexity right.


Suggestion (not for you specifically, just a general comment): never answer leetcode questions right away. Push back and ask the interviewer to describe the problem they're actually trying to solve. Probe their constraints - memory limitations, throughput. 99% of software isn't going to be the leetcode algorithm but the reasons for it being there. (Caveat: I haven't had to be in an leetcode driven interview for some years, and this probably doesn't work great with people that love algorithms more than getting something done.)


I recently took a computer vision course taught in both Python and C++, students chose the language they used for assignments. I signed up twice and did both languages. The C++ versions of the exact same tasks are not only significantly faster, they also occupy exponentially less memory, and in a few tasks I was able to trivially introduce C++ threads for jaw dropping speed improvements, while Python's situation for such optimizations is dismally convoluted. From the experience of this class taken twice, it really looks like there are serious gains to be had with an optimization pass via C++/rust/go for many systems that could benefit from multi-core implementations.


Python is a glue language. It is one thing to use it to string together bunch of algorithms written in C. But somebody's got this genius idea of using it as a general language and now we have this backend frameworks that are 2 orders of magnitude slower than they should be.


Not just the speed, but the memory use.

A popular approach for Python web serving is to launch a number of "workers" (eg via gunicorn, etc), that hang around waiting to serve requests.

Each one of these workers in recently running code (here) idled using ~250MB of non-shared memory. With about 40 workers needed to handle some fairly basic load. :(

Rewrote the code in Go. No need for workers (just using goroutines), and the whole thing idles using about 20MB of memory, completely replacing all those Python workers. o_O

This doesn't seem to be all that unusual for Python either.


In a forking model that shouldn’t be the case, I guess all the workers are loading and initializing things post-fork that likely could have been accomplished pre-fork?

That said, Python devs are some of the worst engineers I encounter, so it’s not surprising things are being implemented incorrectly.


Last I heard, forking wasn’t a very effective memory-sharing technique on CPython because of the way it does reference counting: if you load things in before you fork, when the children start doing work they update the refcounts on all those pre-loaded objects and scribble all over that memory, forcing most of the pages to be copied anyway.

There seems to have been some recent-ish work on improving this, though: https://peps.python.org/pep-0683/#avoiding-copy-on-write


The ffmpeg patches of a few lines of assembly that make 10-100 fold improvements on slow C amaze me


I worked on an facial recognition system written in C++ that used SIMD optimizations to achieve 25M facial compares per core per second. On that same system I wrote an optimized ffmpeg player library that only consumed one core while doing a few hundred frames per second with 4K video.


Next time someone tells me that there is no problem with the bloated state of modern software, I'm going to point them to this video where a multimillion dollar development team congratulate themselves on reaching Windows 3.1-tier performance.


You may enjoy this article about writing a Slack client for Windows 3.1:

https://news.ycombinator.com/item?id=21831951

AFAIK part of the Teams API has been RE'd and is sufficient for writing an IM-only client, so it would certainly be an interesting exercise to do the same as above but for Teams.


As demonstrated (heh) by the demoscene, the TI calculator scene, game development, etc, restriction breeds innovation.


Yes but developer experience !!


Don’t worry; Teams 2.0 promises to suck less. They are ditching Chromium for new Edge technology.


That's the 9s one they are advertising :)


I've always felt like software is falling prey to something akin to Jevon's paradox[1]. We have more compute available, so we can be less optimised with our code and do more stuff, so we write bad code that tries to do too much (or some kind of causal chain like this) and we're slower than if our hardware had never improved.

The only space free from this appears to be videogames. Gamers regularly complain about "low" framerates that'd put a lot of (at least Windows-based) text renderers to shame. A load screen that you notice is points docked in the review score. I can open Steam, launch CS:GO and get into a game before Teams has finished its morning coffee. [1] https://en.wikipedia.org/wiki/Jevons_paradox


Dynamic high quality text rendering with support for all the random fonts we throw at it is actually very hard and nuanced with a distinct lack of hw acceleration techniques. There’s been some advances to use GPU acceleration but those are still research and haven’t yet seem mainstream adoption afaik. Also using gpu acceleration can decrease battery life (usually ok these days because if I recall correctly the gpu is mostly on anyway for modern compositing.

Fundamentally games are actually very different problems amenable to very different kinds of optimization (3d being easier to render than 2d). That’s why all the text you see in 3d games are static menus that are hyper optimized for the limited specific content the ever show in a single font. And the 3d itself has a massive accelerator coprocessor that you need to run at high frame rates. And that accelerator one really works well for extremely parallelizable work which font rendering (and 2D in general) tends not to be (although like I said, there’s research work on making 2d run on GPUs).


> Dynamic high quality text rendering with support for all the random fonts we throw at it is actually very hard and nuanced with a distinct lack of hw acceleration techniques. There’s been some advances to use GPU acceleration but those are still research and haven’t yet seem mainstream adoption afaik.

Are you perhaps on the Microsoft team responsible for the new Widows Terminal who claimed that drawing text fast is a PhD research-level topic? I remember it didn't turn out well


Nope. And it is much higher level than PhD. PhD is akin to someone with a few years of industry experience. A very small minority is extremely talented and can do research well beyond their years, but that’s not PhD level. It’s professorial level work (remember PhDs do some of the practical grunt work but usually a good professor will act as a sounding board / set out the research direction / help unblock students).

Consider that Raph and many others like him have spent decades thinking about this problem and working on it off and on and still haven’t fully cracked it. See the PathFinder for example as an attempt to research new techniques to make this work. There’s other work extending this and trying to put it into a real world system. I’ve briefly worked in related areas so I know a little bit about how hard it is and have talked to experts in the GPU space, but it’s not my particular area of interest / expertise. I’d call myself an expert generalist so a lot of the expert-level techniques that domain experts employ can be beyond my skill set (in general vector processing like SIMD and GPUs aren’t areas I’ve delves in too deeply because of how difficult it can be to get really good).


I feel like this was at least partly written by ChatGPT :)

Even if "dynamic high quality text" is hard, I would like to remid you that we're essentially running supercomputers. And most of "dynamic high quality text" on our computers is incredibly static for signinficant amounts of time.

Even on CPU we should be able to render any quantity of any quality of any dynamic text at speeds that cannot be perceived by the human eye. (Well, as it turns out, we do exactly that, even on smart phones)

And, of course, people have done that, with very little effort. Reason? They actually looked at what a modern computer is capable of, and used that instead of hiding behind high brow "oh, it's beyond PhDs my dear Watson for sho"


Nope no ChatGPT.

You may want to reread what I said. I was responding to someone talking about games and game performance in the context of text rendering. And the PhD research topic is how to use the GPU to accelerate text rendering.

You seem to perhaps be misunderstanding what I’m saying and somehow thinking I’m defending

> Microsoft Teams takes 9 seconds to display under a kilobyte of text, including 3 seconds to just display a splash screen)

I’m not. Clearly it shouldn’t take that long to render that text. But achieving a screen full of text at 120fps (or is it even just 60fps) on the highest resolution displays is actually a difficult problem for CPU renderers to achieve (even for the greatly simplified case of a single mono space font). If you think otherwise, you may want to sync up with Raph and publish your contributions to this space. It’s possible I’m getting some of the specific details wrong (again - not my area of expertise) but there’s definitely a push to find HW acceleration within the GPU to achieve those frame rates (and not just text rendering - all 2D rendering is difficult to HW accelerate).

Notice in [3] how CPU renderers are still lower latency than GPU ones because it’s an area of active research that hasn’t yielded winners yet.

Also I’m well aware we have super computers in our pocket. But don’t forget that the screen resolution is really frickin high and DPI scaling on top (so that text is actually legible) adds additional overheads (+ rendering at 60hz instead of 120). There’s also all sorts of tricks being played (eg it’s not going to be rendering the full screen I think because most UIs are retained not immediate).

[1] https://raphlinus.github.io/rust/graphics/gpu/2020/06/01/pie...

[2] https://raphlinus.github.io/rust/graphics/gpu/2020/06/13/fas...

[3] https://medium.com/@raphlinus/inside-the-fastest-font-render...


It turn out breaking RTL handling


there’s research work on making 2d run on GPUs

I had to check the date to make sure that we weren't in the 80s. Accelerated 2D has been the norm since at least the early 90s, and every integrated GPU that's absolutely horrible at 3D performance has a perfectly adequate 2D accelerator.

The hardware isn't the problem. It's the disgustingly bloated software on top.


From the ‘See Also’ section in that Wikipedia article, it looks like for software this is called https://en.wikipedia.org/wiki/Wirth%27s_law.


Interesting post. Maxed out hardware can't make up for poorly optimized code and architecture.

The computer scientist Niklaus Wirth wrote in his 1995 article "A Plea for Lean Software" that software is getting slower more rapidly than hardware is becoming faster.

Constraints sometimes can lead to more innovation and efficiency. Limiting the level of hardware that the dev team can design and test against could be a good start.


Constraints sometimes can lead to more innovation and efficiency. Limiting the level of hardware that the dev team can design and test against could be a good start.

As the demoscene has shown, we are still probably on average a few orders of magnitude away from pushing the limits of what even decades-old hardware is able to do.


On the converse, having developers use absolute cutting edge machines with 64GB of ram and gigabit Ethernet connections ensures they will never even understand how much their app sucks.


This is true. The way I write software was greatly influenced by the fact that I always had under powered, resource constrained hardware.


I think there are more to that:

For one, “screens are changing too fast and I can’t keep up with it” is a common complaint from less tech-inclined people. Screens transitioning from one state to another state, then staying frozen until user completes decision making process, may not be a universally ideal mode on interaction. For me it is but I’m likely not normal.

For another, I’ve just come across an anecdote yesterday that some developers encounter pressures from bad managers to use phones and tablets in place of a desktop OS, or resistance against engineering workstation upgrade plans, with “computers are always slow and that’s normal” as a rationale.

There must be elements that are normalizing slowness on computers, even preferences over fast computers and software. Blaming “bad” programming won’t work if there are incentives against improving it: bad incentives must be identified and removed first.


One job back. manager purposely got a SATA drives for my workstation, claiming they will force me to write faster code. What an idiot decision: 40+ minutes to boot the system, 5-10 minutes to load an IDE... Simply impossible to use. After a while I put in an SSD I got on my own and never told him.


He heard the idea, but didn't understand the execution: what he needed to do instead is proritise testing the software on slow SATAs


SATA is orthogonal to SSD vs spinning rust. Also, over 40 minutes to boot sounds like there was more going on than slow disk.


Sounds like a good day to shut your computer down at the end of each shift and then spend the first 40 minutes of each day playing games on your phone [or reading this website].


++ I'd definitely make sure my manager was fully aware of how much time he was burning.


I think we need to stop saying we'll optimize later and stop shunning people every time they bring up "microoptimizations" on a forum. Let them do it right the first time. No one is coming back to that code to fix it later.


This reminds me of a weird phenomenon I've observed where users seem to have a better opinion of "slower" software. If certain, ostensibly complex operations like generating an output or saving a record take longer in one application than another, the first app is assumed to be "doing more" or "creating better results".

And I've noticed that I'm not immune to it myself. I sometimes catch myself wondering "what am I not doing right?" when I've made a thing that is perceptively instant compared to a competitor taking 5 to 10 seconds to do the same thing.

Indeed, it seems to contradict the studies that companies like Amazon have done to show that "every 300ms of latency leads to X% of users bouncing out." I've only seen people complain about the most egregious of performance issues. People seem to have been trained to expect software to be slow and have come up with an ex post facto rationalization for why that is the case.

It's particularly frustrating because I've only ever worked on small teams, which already receive a lot of bias of the form, "what could a handful of people do compared to the might of FAANG?” When often the reality is that the organizational structure of large corporations frequently leads to the lack of cooperation necessary to make quality software.

I think a likely culprit might be the massive amounts of logging that I don't do, either of user or system activity. When using tools like Logcat to try to debug issues on-device, I notice hundreds of log entries being recorded per second from all corners of different apps and system layers before I finally re-learn how to use the arcane filtering system to get to the stuff from only my app. And that's just system level logging. Throw on top all the Google Analytics stuff some apps do for every click, every page transition, every single thing a user could do, and I suspect it adds up to quite a bit of UI latency, network bandwidth saturation (some ostensibly simple sites become unusable on bad network connections while my own apps seem barely affected), and energy usage.


> This reminds me of a weird phenomenon I've observed where users seem to have a better opinion of "slower" software. If certain, ostensibly complex operations like generating an output or saving a record take longer in one application than another, the first app is assumed to be "doing more" or "creating better results".

I'd speculate that this has more to do with a lack of feedback in the UI than anything.


I observed this phenomenon for myself these days when using igzip. Depending on the file, it can be up to 2-5x faster than gzip for decompressing files even though neither is multi-threaded. I had to compute a checksum on the decompressed file and compare it the output of gzip to be sure that the fast output is the same as the slow output.


The amount of improvements in speed and resource usage i've seen happening in the last few days since the LLaMA LLM model release is staggering. There is a lot of room for improvement, no doubt.

Looking at the bigger picture:

- Moore's law is slowing down

- The cost for ever shrinking transistors is exploding

- Energy costs are soaring

- Planetwide, we need to get rid of non renewable power sources ASAP

Thus, efficient hardware and software should be on everyone's priority list, even if developers are expensive.


The dogma for the past few decades has been that one is supposed to wholly ignore all performance concerns, and then try to slap some profiling-guided lipstick on the pig at the very end of the development process.

You can’t hot loop optimize a Pinto into a Ferrari, you have to be mindful of performance from the start. Unfortunately performance-related compromises net you bad code reviews and dismissive attitudes from your peers


You shouldnt need to make noticable compromises to get performance in any reasonably efficient language/tooling


There are compromises like ”do not pass everything around as JSON” that swamp any language. The cost of constant re-parsing really adds up


> The diagrams we show computer science students that purport to illustrate how the CPU, memory, and I/O devices are connected in a computer are woefully out of date and often represent an idealized computer from about 1970 or 1975. These diagrams are, of course, complete fictions when you're looking at a modern server, but people still believe them.

And yet, we still see the common misconception that CPU speed == clock speed.

> It is true that the days of upgrading every year and getting a free—wait, not free, but expensive—performance boost are long gone, as we're not really getting single cores that are faster than about 4GHz.


"The most common big server used in a data center is a two-CPU system with a lot of memory and, hopefully, some fast I/O devices inside such as a 10-100G network card and fast flash storage"

This made me go check on the date of the article. Two CPUs? But I suppose for CDN servers that makes sense, they're simply big caches.


Presumably the article is meaning two CPU sockets, rather than two cores?


Clearly. The cores also have a bus connecting one another, but the differences in IO the article mentions don't apply there.


2 sockets can allow a server to hold 32 or more cores, and a 2 socket server uses 1U of space. Going to 4 or more sockets typically requires 2 or even 4 Us of space, so they’re less efficient space-wise.

Most people dealing with hardware don’t refer to “CPUs” anymore, because there’s a lot of confusion around sockets, cores, and hyperthreads, so it’s better to be more specific in any conversation.


Look at the ROI. Any amount of performance loss users still tolerate equally, or sufficiently close enough, is prone to be optimized away by trading it away for developer productivity. It's a "supply curve" on a graph.


Developer productivity also suffers from slow software.


Luckily developers tolerate less from other developers (sans Electron, whose existence IMO is justifiably owed to abysmal state of Desktop GUI programming).

We have a bunch of Go and Rust tooling coming out for JS, for example.


For an article that is about hardware architecture, it is amazing how many of the comments here thus far (41 comments total) seem to think that this is about "inefficient software".


Most software is poorly optimized.


No, most software is straight-up pessimized. Computers are so ridiculously fast nowadays that optimization really is unnecessary in most cases, but developers have somehow managed to make such poor decisions when building software that the result is slow anyway.

Compare: Computer game from 1995 to computer game from 2023, to chat client from 1995 to chat client in 2023.

One of these demonstrates many orders of magnitude improvement consistent with the improvements in the hardware, and it isn't the chat client.


Most software is poorly optimized for performance, but often optimized quite okay for some other metric, be it a stable release cycle, or cost of maintenance or etc.


The cost of the next instance size on AWS is often 1/20th the cost of the developer time to make similar efficiency improvements.


If you just use enough tape you can save money on all those pesky construction dudes.

If you just pull teeth instead of fixing them you need less dentists. Everybody wins.

Why bother with foundations if you can just tape together whatever works and call it a day? If it crashes you just add tape.


> If you just use enough tape you can save money on all those pesky construction dudes.

But the comparison here is more like the other way around, isn't it? Fixing the problem by just throwing more resources at it is like making up for an inefficient construction workflow by hiring more construction dudes.


The "construction dudes" would be developers, not "compute".


Welcome to the future.


Mike Judge was right!


Sometimes the slowness is due to using a distributed cloud architecture in the first place.


An increasing number of sites I interact with seem to love making me wait for API calls for everything. Like the lazy-loading is meant to improve the user experience, but if all I see is blurred placeholders and ...s while you make your six trillion cascading calls to your microservice architecture, I'd rather just go back to waiting for the full content to load in a few HTTP requests.


And then a team with the same mindset writes the software that runs on my desktop or phone or light bulb. From a certain point of view, that's even better, because someone else is stuck paying for the ‘next instance size’.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: