
24-core CPU and I can’t type an email – part two - MBCook
https://randomascii.wordpress.com/2018/08/22/24-core-cpu-and-i-cant-type-an-email-part-two/
======
ubermonkey
It's tangential here, but, seriously, let me ask:

For those of us here with > 15 years in the business, when's the last time you
really felt a BIG uptick in performance or responsiveness from a new computer
upgrade?

I buy a new laptop every 3-4 years, and I buy a nice one, but generally
speaking I feel like we've been kind of flat in terms of usable power for a
while. 6 or 8 years ago I switched to an SSD, and THAT was a big deal --
easily the most dramatic uptick in performance since I moved from an AT clone
to a 386 in 1991. But since then? Not so much.

My laptop is smaller. It runs longer on the battery, and is generally cooler.
The screen is better. But in terms of how long it takes to boot, or open large
data sets, or whatever? Not so much different than 5 years ago.

~~~
dsr_
Given:

\- gigabit ethernet LAN and a speedy Internet connection

\- doing "desktop" work -- browser, SSH, office docs, media consumption

All my machines feel snappy, ranging from the $250 NUC through the MBP2011
(with 16GB RAM and an SSD) through the midrange Intel desktop (also 16GB RAM
and an SSD).

I don't play any significant games and major computation happens on servers,
not on anything I'm typing directly on.

It's been snappy for a decade, and I don't expect a noticeable improvement the
next time I get something new. I might be pushing more pixels to a screen, but
it will respond at about the same speed.

~~~
sp332
Since everyone mentioned SSDs, I'll second getting a gigabit local network
connection. Now I store everything on the server in another room, and my
laptop, desktop, and consoles can load and share files almost like the hard
drive was connected to each directly.

------
lmilcin
I have on my desk a machine that would be an equivalent of a large datacenter
two decades ago and then sometimes it is barely able to keep up with me
pressing keys on the keyboard.

I think the next major step will be for the humanity to learn make software
more efficient and effective rather than throw more CPU cycles at the problem.

~~~
JensRex
Instead we get Electron, and people use it to make text editors that take up
500 MB of space.

~~~
bluejekyll
Electron gets a lot of hate, but I think people generally are looking at the
wrong thing when focusing on memory usage, ask yourself why it’s so important
to so many organizations.

The number one cost at most companies, is humans. Before electron what did you
have for cross platform development? Java was there, but that never really
took off for embedded UI inside the browser, and the browser became the
primary target for most businesses in the last ten years.

Electron allows small teams to focus their most costly investment on
developing toward one product being developed across many platforms.

I use electron apps that give a consistent experience, on macOS and Linux (not
personally Windows, obviously as well), and then the same web app on Firefox,
Chrome and Safari across those same platforms with the addition of my phone.

Yes, that comes at the cost of running a full browser, but it’s a logical
choice when optimizing for the diverse computing landscape of today.

~~~
boomlinde
"Looking at the wrong thing" is a matter of perspective. As a user, I couldn't
care less that you saved money by writing a bloated web app. I care about the
resources _I_ have, which includes my own time, energy, money, computer,
battery etc. In those terms, there are clear disadvantages to Electron based
software.

~~~
bluejekyll
See my response to a similar sibling comment:
[https://news.ycombinator.com/item?id=17826915](https://news.ycombinator.com/item?id=17826915)

My basic point there being that losing a small number of users due to these
concerns is probably worth it.

I don't want to come across as someone who's hyper-defensive of Electron. I
hate the fact that such a bloated piece of software has become the standard
means for supporting cross-platform development. At the same time, I really
don't see other viable options at this point in time.

I'm curious though, is there something you would recommend instead that meets
these requirements with a single codebase (~90% shared code): target all major
platforms (macOS, Windows, Linux) and target all major browsers (Chrome,
Safari, Firefox, Edge, IE), (edit: and how could I forget, all major phone
browsers) with nearly identical UI/UX?

~~~
keldaris
There is no such thing and I would argue there shouldn't be. The notion that
you could use a "nearly identical UI/UX" across a desktop application, a
mobile application and a web page invariably leads to horrible results. The
usecases are just too different. The only reason people pretend otherwise is
laziness and cost cutting - a context in which Electron seems reasonable as
well, to the horror of technically literate users everywhere.

What might be reasonable, at least for applications that aren't performance
sensitive, is asking for a cross platform solution between desktop OSes and a
different one across mobile platforms. Those exist, as you well know.

~~~
sangnoir
> There is no such thing and I would argue there shouldn't be

It's one thing to not want to use Electron app, it's another to say _no one_
should use them. It rubs me the wrong way. There are a lot of things I'd never
use, but they don't rile me to that level.

> The only reason people pretend otherwise is laziness and cost cutting - a
> context in which Electron seems reasonable as well, to the horror of
> technically literate users everywhere.

"Laziness and cost cutting" is just your label for trade-offs you don't agree
with. That would apply to cross-platform Java/Swing apps or Gnome. The
"technically literate" users don't have to worry about business considerations
and engineering tradeoffs - the authors do. If memory usage is such a big
deal, then the market will self-correct, I was told it's a meritocracy.

~~~
keldaris
> It's one thing to not want to use Electron app, it's another to say no one
> should use them. It rubs me the wrong way. There are a lot of things I'd
> never use, but they don't rile me to that level.

The reason this trend riles me so much is that we now have companies like
Slack, which easily have enough resources to do an efficient desktop app on
any platform they choose, releasing utter garbage that, far from merely
wasting memory, takes up ridiculous amounts of CPU time (and therefore battery
time and energy) to do the simplest things (like render emoji, or even a
blinking cursor). The aggregate waste of resources is mind boggling, and we've
gone far past the point where Electron was solely used as a quick solution for
very resource constrained companies or single devs.

> "Laziness and cost cutting" is just your label for trade-offs you don't
> agree with. That would apply to cross-platform Java/Swing apps or Gnome. The
> "technically literate" users don't have to worry about business
> considerations and engineering tradeoffs - the authors do. If memory usage
> is such a big deal, then the market will self-correct, I was told it's a
> meritocracy.

It certainly does apply to Swing apps, Qt, Gnome, etc. I've always considered
all three of those frameworks ludicrously bloated, by the way, but Electron
has far exceeded my worst nightmares in that regard.

I'm not sure who told you the market is a meritocracy or why you believe it,
but I don't see much evidence for that view, personally.

------
snaky
> the lock was being acquired and released ~49,000 times and was held for, on
> average, less than one ms at a time. But for some reason, even though the
> lock was released 49,000 times the Chrome process was never able to acquire
> it.

Well, locking is hard.

> The good news is that even though there is occasional unfairness, there is
> unlikely to be persistent unfairness. In order for a thread to steal the
> lock, it needs to hit the tiny window where the lock is available. In
> practice, a thread is unlikely to be this lucky repeatedly.

[https://blogs.msdn.microsoft.com/oldnewthing/20170705-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20170705-00/?p=96535)

> The fact is, any time anybody makes up a new locking mechanism, THEY ALWAYS
> GET IT WRONG. Don't do it. Take heed. You got it wrong. Admit it. Locking is
> _hard_.

[https://yarchive.net/comp/linux/locking.html](https://yarchive.net/comp/linux/locking.html)

~~~
AstralStorm
Actually this is exactly because someone in Windows used the plain old mutex.
I'd call that "your granddad's lock". It's old, crotchety, unfair and
shouldn't be used in situations where any concurrency can be expected.

This despite the kernel having a nice RCU mechanism inside as well as waitfree
queues.

~~~
snaky
If there was a spinlock, the problem stay.

> the rule simply is that you MUST NOT release and immediately re-acquire the
> same spinlock on the same core, because as far as other cores are concerned,
> that's basically the same as never releasing it in the first place.

[https://yarchive.net/comp/linux/spinlocks.html](https://yarchive.net/comp/linux/spinlocks.html)

Other mechanisms do exist of course.

------
snarfybarfy
You should have asked your pointy haired boss. He could have explained the
problem in much simpler terms.

Would you expect 24 employees to write ONE email without 4 team leads and one
department head?

Obviously NO!!

Your processors obviously need more management. I think Intel has the right
offering for you, aka management engine.

~~~
Scarblac
New from Intel: the Scrum processor.

~~~
bluejekyll
You joke, but seriously, this is generally a goal of distributed and/or
parallel computing. Reduce interdependencies, stop constant cross chatter and
try to do as much computing in isolation as possible.

Scrum’s not necessarily a bad anology. Here’s a different thought that this
conversation has me now thinking about, what if we did think about development
teams as CPU cores? We might discover weak points in the architecture of an
organization, and recognize more quickly where we need to address bottlenecks.
The bandwidth of the bus betweeen the cores might be too limited. The pipeline
of work (backlog) might not be deep enough, and has a ton of branch statements
(spikes) that may throw out the entire pipeline...

~~~
nextos
Why not just embracing simplicity?

I have a golden rule for my systems. After booting into X and opening one
terminal with htop and hiding kernel threads, everything should fit very
comfortably in one screen. This constraint forces yourself to have a very
simple setup. A few daemons, a window manager and a terminal. I do the rest of
my computing in Emacs and Firefox.

I have several Arch or NixOS setups that could work on a 128 MB RAM setup,
excluding Firefox. Plus, if something breaks, I know how to fix it.

~~~
bluejekyll
Organizations like to grow, though. Wouldn’t that imply that organizational
structure would have a maximum size?

------
jwilk
Part one:

[https://news.ycombinator.com/item?id=17780127](https://news.ycombinator.com/item?id=17780127)

------
coldcode
My MBP only has 8 cores but often times during the day it locks up and blows
up a hurricane of fan noise. Why? Idiotic Jamf management software and a virus
checker that never finds anything. Some days you only get 1 core for work when
it gets stuck and have to reboot to get it back. It's not always the computer,
OS or the software you use.

~~~
posixplz
Yep, Jamf and McAfee are both trash-quality software that significantly reduce
the overall security of a system. McAfee is a well-known story: their AV
unpacks and analyses potential malware via a _kernel_ module. Wtf?

Jamf's code quality and security model are abysmal. It blows my mind that
apple recommends the use of Jamf to large Mac shops. Having peeked under the
hood at previous employers, I was extremely disappointed. The product is
insecure by design - not what one wants for device management that's given
root privileges on machines containing corporate crown jewels. I highly doubt
their operational paradigm for Jamf cloud has changed in the last year,
either.

I apologize for going completely off-topic, but these products make my blood
boil. Especially because I now work for a shop that uses both Jamf and McAfee
for Mac "security". On a positive note, I removed the corporate mandated
malware with ease - no boot to single-user required.

~~~
sfink
There was a time that running Windows absolutely required you to use an
antivirus solution. It was crazy not to.

Those days are past. Now you have to be crazy to run one. Even setting
security aside, I think antivirus packages are the number one source of
instability and weird performance problems.

------
raattgift
Could you build on the always-unfair system along these lines? (C11ish, sorry)

    
    
      void 
      do_work(the_work_t *w)
      {
         /* for simplicity here, rather than e.g. w->contenders */
         static _Atomic int contenders = 0;
         
         contenders++;
         for ( ; work_remaining(w); ) {
           take_mutex(w->m);
           do_some_work(w);
           drop_mutex(w->m);
           if (contenders > 1)
             reschedule_this_thread();
         }
         contenders--;
      }
    

This depends on the OS providing a cheap and fast reschedule_this_thread()
mechanism that _effectively_ guarantees that if there is only one other
contending thread with work, that thread will end up holding the mutex. (If
there are multiple such threads, an arbitrary one of them will end up with the
mutex, rather than the thread that just dropped the mutex.)

One could of course only check for other contenders every few times through
the for loop if reschedule_this_thread() is expensive or slow, or if
contenders is especially hot.

contenders is explicitly not a locking mechanism and should not influence the
policy of any code running while the mutex is held. It should also be a per-
mutex counter.

~~~
brucedawson
The trick is how you implement reschedule this thread. What you want in this
specific case is to take it off-core long enough for another thread to wake up
and take the lock. That is far too squishy a goal to be something that you can
implement. If you sleep for some number of nanoseconds then you are wasting
performance and/or not sleeping long enough.

In this particular case the lock was a kernel lock in kernel code so the OS
would have to fix this, by making the locks fair (or occasionally fair).

~~~
raattgift
I've been spoiled by a better OS. :D

How about:

    
    
      while work:
        if (contenders > 1)
           { reduce_priority; pri_reduced = true }
        take_lock
        if (pri_reduced)
           { unreduce_priority; pri_reduced = false }
        do_work_quantum
        drop_lock
       endwhile
    

(I mean empirically, although an educated guess would do).

Of course, if reducing priority is too fast, this likely doesn't help;
alternatively it could be too slow and what you get back in system latency is
taken away in lowered throughput. That's probably not OK if you don't need the
system latency to be low.

I wonder if (dramatically, even) reducing the priority of some of the original
workload exposing the problem, not just when racing for a lock but when doing
the actual work quanta, would help. My thought here is that your email-sending
is higher-prirority and will at least push some of the workload out of the way
in reasonable time, giving you back some responsiveness.

I'm surprised if Windows doesn't offer up a high-throughput/latency-tolerant
QOS for threads.

~~~
brucedawson
Priorities don't help. They are only relevant if there are more runnable
threads than CPUs. In my case I had lots of spare CPUs so both threads could
run, regardless of priority.

A QOS does not directly help. The only thing I am aware of that can help is
fair locks, or occasionally fair locks, so that the lock is _given_ directly
to the waiting thread, instead of being made available to all.

I have yet to hear of any other solutions.

------
agumonkey
I wonder how much cpu time has been liberated through the CFG scan patch.

------
fencepost
I'll have to go through with some of the same steps to see if a recent problem
I had is related to this.

In my case it was actually being triggered by a piece of monitor management
software that came with the LG ultrawide monitor that I use. No significant
CPU load, no memory issues, plenty of cores in an older HP workstation with a
Xeon Processor, 48 gig of RAM and a Samsung SSD. When that display management
software was running a text editor couldn't even keep up with displaying text
as it was typed.

Edit: rereading some of the original 2 articles, yeah, I'm not going to be
running the same kinds of tests - even if I did it'd take too long to develop
the knowledge base to be able to interpret my results adequately.

------
atesti
What exactly did his it department do using wmi to query this
Win32_PerfFormattedData_PerfProc_ProcessAddressSpace_Costly? I can't find this
in the article

~~~
brucedawson
The actual high-level query is in part one and in the first reply, but I think
your real question is "why did they want to scan the address space of every
process on the system?"

I think that the answer is that they didn't. That was just one of the
bazillion counters that came along for the ride. I believe that that counter
has been removed in the latest OS, thus squishing this bug in another way.

I don't understand WMI, but it sounds really weird. One peculiarity is that
once some program asks for counters "WMI refreshes the list of counters every
2 minutes until the WMI helper process closes due to inactivity."

So that's great. IT asks for some data, they get memory scans that they don't
want, and those scans are repeated every two minutes for a while (ten
minutes?) even though nobody is looking at the results.

------
equalunique
I really enjoyed the first iteration of this article and am happy to see a
part two.

