
The logging framework isn't a bottleneck, and other lies your laptop tells you - todsacerdoti
https://tech.davis-hansson.com/p/tower/
======
fmakunbound
Many years I worked with a team that couldn't figure out why the code ran so
fast on Bob's laptop but was slow as shit on production server hardware. The
production hardware was state of the art at the time and they were kind of in
a pinch deadline-wise. They ended up literally deploying Bob's laptop to the
ISP data center.

~~~
duxup
>They ended up literally deploying Bob's laptop to the ISP data center.

I guess in a pinch but that is a solution that I just don't think I would
like.... I would feel like we're going to figure this out, not surrender to
the absurd...

~~~
vbezhenar
Sometimes that requires expertise in a very low-level topics. For example I've
read a similar story, but developer digged the truth. Production was running
on vmware cluster with live migration or something like that. Servers used
slightly different CPUs (frequency differed a bit), so VMware emulated some
important instruction (something related to get current timestamp). MSSQL used
that instruction a lot in one specific scenario, so it was running much slower
on virtualized production server.

They were able to figure it out and rewrote one stored procedure so it ran
fast again. But I'm sure that I, for example, don't possess the necessary
expertise to figure this out. I wouldn't be surprised that in many small-to-
medium companies there are no experts of that level either.

~~~
xxs
Time Stamp Counter (TSC) has been trapped to make sure it returns the same
value across all cores. Intel fixed that part, though.

~~~
zorked
In a live migration scenario the TSC in the source and destination machines
will be necessarily different so that emulation is required to make it not
warp.

That said, due to previous TSC instability all operating systems deal with a
TSC warp pretty well. Whether applications deal well with it is probably the
reason why emulation is the default, but it could be that this can be
disabled.

(Everything AFAIRC)

------
godot
The article talks about the difference between laptop vs server, but no other
comments here so far have really touched on that topic. I think this post
isn't about how good or bad cloud/VPS servers are, but really about difference
between running on laptop vs servers.

It seems starting around maybe 2012~13, devs in the tech industry started
moving from Windows laptops to Macbooks in large scale, and this was when it
became much more common to run your server stack locally on the laptop. This
was really strange to me at first; since prior to this, I was very used to
having "dev servers". I came from LAMP world where we'd write code on Windows,
upload code to a dev server, run the app on it and test. This seems very
amateur in retrospect, but I also worked in a relatively large unicorn at the
time (from 2008 to 2013) and I don't think we were even the only company doing
that -- from what I knew, this was pretty much standard practice, especially
if you run LAMP stack. Which is to say, it was way more common back in the
days to dev on a dev server than on their own laptop. (Maybe that's a point
against Windows at the time; about how difficult it was to have a reasonable
local dev environment on Windows.)

So when I joined a startup later, and found out how everyone on the team had
Macbooks (with no option of going Windows), and everyone ran the entire server
app on their laptops (and this was the nodejs world, no longer LAMP), I was
almost in disbelief -- are we not to worry about any difference between
developing, testing and running this server code on a Mac laptop, vs when
deploying it and have it actually run on a Linux server later? After a short
while it seemed like it was not really a concern; it seems perhaps either
macOS linux is close enough to servers, or node is compatible/high-level
enough between mac and linux.

Reading this article is kind of a wake-up call -- no, it's really not exactly
the same.

~~~
fetbaffe
Thing is that the dev servers of the past was never equal in setup of the prod
servers, difference in software versions, failovers, proxies, crons, backups
etc, that was just a lie that was told.

In my opinion is is much better to develop a cross platform/environment
product that can work everywhere and the key to that is to avoid assumptions
about where the product is executing. In the long run, this makes your
software much more flexible.

~~~
steerablesafe
There is a significant difference between "works everywhere" and "works
everywhere efficiently". For the former you can get away with just writing
portable code. For the latter you basically have to benchmark everywhere.
Performance is a hell of a leaky abstraction.

~~~
fetbaffe
Yes, that is why it is futile to try to have an exact copy of the production
server because the smallest difference will ruin your performance profiling,
to be absolutely sure you have to do the profiling on the production server
itself.

Usually every developer has their own login user on the development machine,
but that is enough to skew performance profiling, because usernames populates
environment variables & that will change memory layout of any program that the
shell executes.

~~~
steerablesafe
If memory layout changes due to environment variables significantly changes
performance characteristics then it looks like an opportunity to optimize this
aspect of loading a binary. I assume cache alignment is in play here.

~~~
fetbaffe
Good video on the topic

[https://youtu.be/r-TLSBdHe1A](https://youtu.be/r-TLSBdHe1A)

~~~
steerablesafe
Thank you, this is a fantastic talk.

It's great that they can average out the random effect of layout, but what I
meant by optimization opportunity is to deliberately aim for the "good" memory
layouts. Something like fixing the alignment of the stack on program entry
point or marking specific functions as "stack aligned".

------
bdcravens
Worried that your laptop is too fast?

Run your code in Docker for Mac, that'll fix the problem.

~~~
ssmiler
I actually experienced the opposite. A native Mac compiled application was
running slower than the same application running in a ubuntu docker image.

~~~
Radle
Wherever I worked, getting a bigger server or sharding the users better was
always the easiest solution.

------
Rapzid
m.2 (NVMe) is so fast and cheap it's absurd. I blinked(a few years) on
hardware and when I went shopping for a new disk was blown away by the price
to performance.

I was equally as shocked to discover the crap-show that is the m.2 controller
space. I felt for sure supermicro would have chassis you could load up with
those suckers and they'd be hot-swappable and changing the face of storage.
Instead I found proprietary(Intel/AMD) RAID drivers coupled to specific CPU
models and expansion cards that could take a measly 4 drives(no hot-swapp).

~~~
simcop2387
The hotswap is unfortunate, and likely a resukt of server/consumer space
differentiation among all kinds of vendors (os, cpu, motherboar, and even the
nvme manufacturers all have to play along). But the limit of 4 is at least a
real technical limitation, each one of those addin cards usess a pcie 16x slot
(either real or "dimm.2" ) and each card needs A 4x from it. You could use a
mux to add more but they're already getting to the point of being able to
saturate the links. Pcie 4.0 and 5.0 will give a lot of headroom for more
drives on a system.

~~~
derefr
Sounds like we might need to go back to the kind of mainframe architecture
that has IO offload. Split the PCIe bus into NUMA-like zones; give each zone
its own (probably ARM) CPU, running its own kernel; then use "application
processors" (probably x86) to command-and-control the IO zones, allocating
e.g. IOMMU-subvirtualized ethernet channels to them. Control plane/data plane
separation.

~~~
pg-gadfly
A bunch of less powerful servers with expansion cards over a network should be
much easier to manage and horizontally scalable

~~~
simcop2387
sort of, the network part of that ends up being a huge bottleneck then too,
with 16 drives at 5GB/s (max i've seen so far) each you've got 80GB/s you need
for the network to each server. You start getting into the really expensive
side of things speed wise.

------
gumby
This is grossly under appreciated. And this is why server machines and
sustained performance remain one of Intel’s strengths, and despite the number
of ARM developers out there, ARM hasn’t managed more than a toehold in the
server market. Nobody has designed an ARM CPU for that kind of workload.

Intel hardly has an impregnable lead in this regard, but I expect AMD to get
noticeable server market share before any ARM part does.

------
tyingq
There's also the similar problem where a developer usually has a much better
laptop, phone, internet or lte connection, etc, than the average end user.

~~~
Yizahi
I remember the Youtube dev story posted here. They optimized Youtube page at
some point, deployed and saw much worse average load time for users. As it
turned out the whole new batch of users saw load time decrease from unusable
tens of seconds or minutes on slow connections to tolerable less than 10-20
seconds or so and they started using the site which dropped average time by a
lot :) All these modern developers usually ignore how their websites
overloaded with usually unneeded scripts and trackers load on slow networks
and PCs.

~~~
bcrosby95
> All these modern developers usually ignore how their websites overloaded
> with usually unneeded scripts and trackers load on slow networks and PCs.

Both Firefox and Chrome developer tools even have settings for throttling
network speed. I wonder how many developers use them.

------
userbinator
_For server workloads, it often makes sense to give up clock frequency to
“fit” more cores in the same heat / power space._

In other words, server hardware is usually designed for parallelism/maximum
throughput, while client hardware is designed for single-threaded performance
and decreasing latency.

~~~
qubex
To rehash the old and tired metaphor: one’s a sprinter carrying a baton and
the other’s a herd of marathon runners dragging a sled.

~~~
abstractbarista
This is brilliant and I love it.

------
bitcharmer
I come from a field where it would be unthinkable to draw conclusions about
application performance by running it on a developer's laptop instead of the
target platform.

Judging by the comments here this is not as obvious as I previously thought.

~~~
pdimitar
Most businesses will never allow you to actually develop and test, for a full
day (or weeks on end), on a dedicated server machine.

It's about costs.

------
sdflhasjd
We've had software perform wildly different on our dev servers from prod
servers (which are owned by customers). One time it was VM overprovisioning,
another time we were just getting completely different performance depending
on what hypervisor was being used.

------
mehrdadn
A bit off-topic, but can someone tell me why server _firmware_ takes so long
to _initialize_ compared to laptop firmware? And why even laptop firmware
takes longer to initialize now compared to laptops from some say 7 years ago?
Why is firmware initialization getting so darn slow when computers are getting
faster?

~~~
fennecfoxen
Servers never reboot, so no one cares.

~~~
yjftsjthsd-h
I have no idea why you're getting downvoted; this is very much my experience.
An actual physical server is usually either a database server that can
literally manage years of up time (no I didn't like it, but they were on an
isolated network and the Oracle DBAs were very particular about updates), or
else VM hosts that support live migration so nobody cares if we happen to have
one host offline for an hour or two while it's patched and rebooted.

------
Thaxll
The main error here is to let people loadtest on their own machine, you should
always to do that on the same platform that will run in production.

I think the article is missing many key elements, for example you should
always trottle your CPU when doing benchmark.

------
duxup
I remember when I was building my first PC with some friends doing the same.

One managed to get a hold of a 'server' and we just assumed it had to be fast
right? Very much learned the opposite....and the lack of a sound card was a
big turn off...

------
senderista
LOL "the mechanical sympathy community"

Good article though. I'm sick of seeing comparative benchmarks on a MacBook.
If you don't have real servers you should always benchmark your code in the
cloud (bare metal instances if you can afford them).

~~~
saurik
Well, you should benchmark on similar setups to what you will deploy on:
benchmarking on "real servers" or "bare metal instances" is also going to be
highly misinformative if you expect most of your deployments to be on
virtualized hardware under KVM or Xen.

------
_bxg1
Interesting, but I'd think that just running your profiling in a server
environment would have the same effect as doing _everything_ in one, no? And
for much less hassle

~~~
Analemma_
You'd be amazed how few developers even know about profiling, or think it's a
method of last resort when other attempts to track down performance issues
have failed. One of my standard interview questions is, "A customer comes to
you and says that [app] 'is slow', with no additional information". You look
at the code and there are no glaring inefficiencies, no O(n!) algorithms, etc.
What's your next step?", and less than a quarter of candidates ever mention
profiling. "Sympathy for the hardware" is, IMO, sort of a hack to be an
imperfect solution to this problem.

~~~
devonkim
The reality is that a lot of profilers cause a sort of Heisenberg effect when
running in production where it can slow down your code so much that it's not
as meaningful anymore. This is now the time your ops engineers like me will be
wagging their finger saying the application should have had instrumentation or
APM support built-in like via OpenTracing or that ebpf and friends could be
useful if your production machines were reasonably up to date. In most
occasions I've seen, the majority of engineering teams outside massive scale
companies with gobs of resources even today wind up debugging performance
problems like it's 1999 with application-external hypotheses and checking for
smoking guns like high page fault rates, dropped packets, etc. (eg. USE
methodology).

~~~
jeffbee
Sampling cycle count or LBR with linux perf events is almost invisible to
performance, maybe a 1% hit to throughput as a general rule. The problems come
from languages where the PC is irrelevant, like Python, but nobody uses Python
because it's fast, so using a python profiler like pyflame should be fine,
even in production.

~~~
devonkim
Profiling in production with the JVM using something like YourKit or JProfiler
is the typical case for myself. Ironically, I've found profiling with Python
in production easier for the reasons you've mentioned. If something is down or
running slowly already, adding another 3%+ latency is hardly going to be an
issue. Architecturally, with big monolithic programs that do too many things
attaching a profiler to try to analyze 1% of the program's responsibilities or
surface area becomes a risk to other production operations unfortunately. In
most cases slowdowns happen because of resource saturation, things timing out,
blocking on shared resources. In the first scenario, trying to run a profiler
can exacerbate the problem or even fail to start, so the only way forensics
can be done there is by emitting observability data prior to the failure
point.

Other approaches taken have been the more Erlang style "let it fail"
methodology which is fine for newer projects but represents a rewrite for most
systems in practice and is thus far, far beyond profiling discussions.

------
ineedasername
I have two computers at work: A Xeon workstation that was low-mid range when
new, 7 years ago, and still has an HDD. And I have new mid-range i5 laptop
with an SSD. The old Xeon performs much better for everything except boot
time. Even program load times.

------
speedgoose
It's a bit obvious that some major cloud providers are very expensive when
consumer hardware provided so much better performances compared to price for
years.

But they can have such a pricing model. What they offer is good and valuable
anyway. And you need to pay Jeff Bezos.

------
throwaway_pdp09
Seems a bit naive.

"Hanging out in the old data center, rendering web pages for a thousand people
simultaneously? A different story. For server workloads, it often makes sense
to give up clock frequency to “fit” more cores in the same heat / power
space."

Servers have larger caches, I'd expect better IPC than a laptop CPU from
better micorarchitecture, at least double the memory channels etc. I guess
that was his point though.

~~~
zokier
> Servers have larger caches, I'd expect better IPC than a laptop CPU from
> better micorarchitecture, at least double the memory channels etc.

In aggregate yes the server cpu numbers are great. But looking at them per
core makes them much worse. I checked some random Xeon Platinum CPU vs a
mobile i7, both being Skylake (so essentially same microarch). The Xeon had 28
cores, 38.5M cache (=1.4M/core) and 6 memory channels(=0.2/core). In
comparison the i7 had 4 cores, 8M cache (=2M/core) and 2 memory channels
(0.5/core). The Xeon uses 25% faster memory, but that doesn't really make up
the difference.

~~~
throwaway_pdp09
My understanding is kind of matches yours, but I assume it will depend. If you
have a memory-bound problem then yes I'd expect a laptop _perhaps_ to be
competitive, but if it substantially hits cache for the server then the laptop
with smaller cache, may be waiting longer.

> But looking at [the server cpu numbers] per core makes them much worse

but for what benchmark? (sorry if you said and I missed it)

~~~
inetknght
If you have a memory-bound problem running on a Xeon with a couple dozen
cores... you either have a couple dozen memory-bound problems (which brings
into scope the cache per core concept) or you're using the CPU wrong.

------
jbverschoor
Conclusion, which you should already know: cloud providers offer obsolete
hardware for a premium price

~~~
Jgrubb
Wrong conclusion, it's actually that people who build web applications and
people who know how computers work are often not the same people.

~~~
jbverschoor
Then why should you even allow them to access any server - vps or bare - at
all? Also, please remove all ops-code from the two of someone who doesn’t
understand ops is calling himself devops

------
vmception
I synced an Ethereum fullnode on one of Linode's most powerful instances, it
was fast but seemed like it should be faster. I was posting issues on github
about how this wasn't working fast enough

a lot of tutorials online is about people trying to get their raspberri pi to
sync blockchains, so I was prepared for the worst

but then I put together an 8 year old desktop and it synced in 2 days

kind of underwhelmed at "cloud" options. but they work

------
qubex
Good article, but one thing that annoys me deeply is the author’s intimation
that a server ‘renders’ a web-page. A server does not ‘render’ a web-page,
that’s a client-side (browser) task (unless he’s referring to running a
browser in a Remote Desktop environment, which seems to be definitely not the
case).

~~~
mdpye
render verb

1\. provide or give (a service, help, etc.). "money serves as a reward for
services rendered"

2\. cause to be or become; make. "the rains rendered his escape impossible"

3\. represent or depict artistically. "the eyes and the cheeks are
exceptionally well rendered"

You seem to be stuck on the 3rd definition (there are more, but 1, 2 and 3
cover everything we need here to see how it's appropriate based on both the
server and client side actions.

The other way of looking to it is that the server renders [something] to HTML,
and the client renders HTML to pixels.

It's quite common to refer to things being rendered to another forms, the term
is not owned by processes with a visual output.

~~~
qubex
> _You seem to be stuck on the 3rd definition_

Quite possibly. I concede the point.

> _It 's quite common to refer to things being rendered to another forms_

I think it’s my mathematical background. I tend to think of these kinds of
processes as ‘ _transformations_ ’.

~~~
mdpye
Heh, sorry to beat you over the head with it. You're certainly right that they
are transformations. That feels like a more general term (within this context,
obvs not in the context of render as "to give"), but I can't tease out what
the difference is in my understanding of them.

------
julianeon
"It works on my machine" remains a problem to this day.

While you can make various workarounds to rectify this, it still seems like
"so make your machine the machine it runs on" is a very effective solution.
Unlike the alternatives, once you commit to it, you can't forget or revert
back to bad old patterns.

I've had this thought before: What if Amazon, one day, just decreed that all
employees' laptops had to run Amazon Linux?

Maybe it's a good thing I'm not Jeff Bezos, but I think it would have a lot of
positive effects.

~~~
cowsandmilk
Why the focus on Amazon here? Why assume developers at Amazon don’t use Amazon
Linux for development?

~~~
sharpy
A lot of them do. In fact, laptop + cloud desktop that runs Amazon Linux is
the standard set up now.

