
War Story – The Mystery of the Very Long GC Pauses in .NET Windows Service - matthewwarren
http://tooslowexception.com/scenario-mystery-of-the-very-long-gc-pauses-in-net-windows-service/
======
nathanaldensr
I understand why the author obfuscated the anti-virus (hereafter: malware)
vendor, but by doing that he makes the information less interesting. Now I
won't know which vendor to perform tests on or avoid altogether when it's
clear the malware's slimy tentacles are the cause of this issue. Malware
violates principles including Least Astonishment--look at how long it took
them to figure out it was malware--leading to the poor company spending
thousands on consultants to help them diagnose the issue. In a just world, the
malware company should get the bill.

~~~
MrBuddyCasino
> Now I won't know which vendor to perform tests on or avoid altogether

Anti Virus Scanners? At this point, all of them.

------
ufmace
What seems crazy to me is running this type of antivirus on servers. It's kind
of a hacky drag on the desktop, but I can at least see the tradeoff - you're
going to be visiting websites that could be hacked, downloading and testing
software that could have been modified or malicious, etc. There have been
plenty of cases of very safe mainstream ordinary websites getting set up to
serve ads that install malware.

Servers, though. I get defense in depth and all that, but running antivirus
seems like a step too far. Your servers should be locked down and only running
well-tested software. Least permissions for all web processes, etc. Why risk
slowing down or corrupting your server processes for a small increase in
protection in an unlikely scenario?

~~~
amaccuish
For web I'd say no, for print and file servers I'd say yes to antivirus.

------
cpeterso
Mozilla is prototyping a Firefox feature (called "Inject Eject") to sandbox
the browser process, not to protect the system from web content exploits, but
to protect Firefox itself _from the system_ (antivirus software injecting
crashy DLLs into the Firefox process). Here is the bug:

[https://bugzilla.mozilla.org/show_bug.cgi?id=1435780](https://bugzilla.mozilla.org/show_bug.cgi?id=1435780)

~~~
leeter
This sort of thing is why MS implemented 'PatchGuard' on windows 10 to stop AV
vendors from patching OS routines and paths. The system will now bugcheck if
the checksums of certain pages change. It also does a few other thing allgedly
that MS has deliberately declined to go into detail. But regardless even MS
has been fighting back against this garbage.

~~~
kristianp
Not just windows 10, it's apparently been around since 2005:

[https://en.wikipedia.org/wiki/Kernel_Patch_Protection](https://en.wikipedia.org/wiki/Kernel_Patch_Protection)

------
nissimk
Anti virus causes so many problems on Windows systems. Even the included
defender product. My laptop experiences random pauses a couple of times per
day and I've correlated them with high CPU in anti malware service executable.
It's difficult to cure without uninstalling anti virus.

------
indemnity
We had a long running bug in McAfee which was causing our software to crash
whenever it was working with SSL and some system .NET code deep in the bowels
would somehow cause an AD lookup, which got intercepted by an injected McAfee
DLL which crashed and took our process with it.

We found it by attaching WinDbg and finding this dll on crashing threads’ call
stack.

Disabled McAfee, problem solved.

We opened a ticket, they never fixed it while I worked on the project.

In the end we dumped Windows as a platform since customer IT would never sign
off on deploying a server image without AV, and rewrote everything in Java,
deploying using Linux containers, where IT has no power.

Cheaper, faster, less crashy.

------
orf
Name and shame the AV please. Is there a good reason to not?

~~~
bunderbunder
The author gave his reason at the bottom of the article.

I think I'm inclined to agree with him. The underlying problem isn't a symptom
of something unique and crazy that one unique AV vendor is doing. It's a
symptom of something standard (but crazy) that enterprise AV solutions do.

If monkey patching every process on the system at run-time is something you
view as evil, then he's already named and shamed the entire antivirus software
industry, and now you know to steer clear of antivirus. Which you were
probably already doing, anyway, to the extent that IT lets you.

If you see it as an inevitable part of running AV, and view AV as essential,
then there's not really anything to shame. It was an honest bug relating to an
obscure corner case. I hope they've already filed a bug report, so just keep
your AV up to date and it should get fixed.

~~~
lostmsu
I don't think what you call "monkey patching" is always problematic. In fact,
launching an idle thread in another process is pretty safe operation. One must
really do something else to break things. So it might still be a particular
vendor's issue.

I am not affiliated with AV software.

------
mewse-hn
Antivirus sucks

~~~
sebazzz
It really does.

I work at a big accountancy firm as a software developer so IT is quite
security minded, and while our ThinkPad T470p notebooks aren't top of the
line, they should be at least faster than my Dell Latitude E6520 of 2012 which
both have i7 CPU.

We run Symantec Endpoint Protection and this antimalware software is quite
intrusive. I had an exception for a week to run with Symantec completely
disabled, and never had a better working laptop. Still quite slow, but a good
bit faster than usual.

Since the Spectre microcode updates performance has only been getting worse.

~~~
vidanay
Software developer here. My corporate IT also runs Symantec Endpoint
Protection. For a period of about three months last year, SEP would delete my
executables and dll's AS I WAS COMPILING THEM in Visual Studio. This would of
course result in a failure to build the project. Usually compiling a second
time would succeed, but sometimes it would take three or four tries.

Needless to say, this resulted in an email from IT informing me of the unusual
activity on my system, and when I told them it was deleting software that I
was writing, they had to get approval from two levels of management that it
was actually my job to be writing software and it should be allowed by SEP.

~~~
CWuestefeld
I don't think you can blame this on AV in general, or Symantec specifically.
What you're describing is an overzealous InfoSec department that's unable to
consider the actual needs of different kinds of users.

------
starik36
> it was happening only on one of the server clusters and no others

So did anyone figure out why the anti-virus was causing problems on a single
box only and not the others?

------
jve
And did you report that to antivirus vendor, at least?

------
polskibus
Surely they could've just exclude that executable from antivirus instead of
installing it altogether?

~~~
ninju
Per the article

> In fact, only completely uninstalling the antivirus was a proper fix –
> excluding only .NET assemblies was not enough.

------
kristianp
How can the antivirus patch a signed .net dll? I'm assuming they didn't use
snk's.

------
CyanLite4
TLDR: anti-virus programs can negatively affect server performance.

------
inmemorydotnet
We've an InMemory Dot Net database, and we've seen cases of hour long GC
pauses. Typically this happens on 64 bit systems, that are low on memory, and
have needed to start to swap to disk. The swap file, being located on Network
storage doesn't help.

~~~
orf
Why on earth would you configure your swap file on a network path?!

~~~
sciurus
Why on earth would you configure swap on a system running an in-memory
database? Presumably you use an in-memory database because you need extremely
fast response times from it, but those are going to go through the roof when
it starts swapping, potentially causing a cascading failure throughout systems
that directly or indirectly rely on it.

Better to configure the databse to allocate less than the available system
memory and refuse writes when it's out of free memory.

~~~
vardump
Windows requires some swap to work correctly. Unlike Linux, it doesn't ever
overcommit. IIRC, too little swap can make some parts of Windows work
incorrectly.

~~~
raptorfactor
Do you have an example of which "parts of Windows work incorrectly"? Windows
runs just fine without a pagefile. Of course if you have a piece of software
which then tries to use more memory than is physically available those
allocations will fail, which in 99.99% of software results in termination or a
crash, but the OS itself works just fine (as does all the software as long as
you're not hitting your physical memory limits).

~~~
vardump
I don't remember exactly what it was, but I do remember you needed about
200-300MB page file to avoid it.

Of course, due to Windows no-overcommit policy, it's better to have a large
page file almost always. That ensures your RAM is going to be used efficiently
and not as a simple backing store that's never actually referenced.

