
Knuth's Challenge: Analyze everything your computer does in one second - gaxun
https://www.gaxun.net/commentary/knuth-challenge/
======
khedoros
The code running on the CPU isn't the only thing the computer is doing in that
one second, the X86{,_64} opcodes we could capture aren't necessarily exactly
what the CPU is doing, and code being run by extra controllers and processors
isn't likely to be accessible to anyone but the manufacturer.

In 1989, we'd've also been looking at code running on a single core with a
single-task or cooperative-multitasking OS (for most home computers, anyhow),
with simpler hardware that an individual could completely understand, and it
would run at a speed where analyzing a second of output wouldn't be completely
beyond the pale.

I've analyzed CPU logs from DOS-era programs and NES games. I certainly
haven't analyzed a full second of the code's execution; I'm usually focused on
understanding some particular set of operations.

~~~
gaxun
> simpler hardware that an individual could completely understand

Our hardware now is so much more complex, what has it gained us? The quick
answer is performance, but is it true? What about correctness? Hard to prove
either way, but my guess is we've gained a little bit on performance and lost
on correctness.

~~~
khedoros
My computer certainly does more than the one 27 years ago did, in the senses
of operations it performs in a second, capacity of data storage, increased
intercommunication options, increased selection of devices that I can
interface with it, and so on. Some of these are just a change in magnitude,
and the changes in the software are just as important as changes in the
hardware.

There are more parts of the computer (again, both hardware and software) that
are undocumented. Taken as a whole, the system is more capable, but closed
hardware and software makes me wonder about capabilities in my computer that
serve someone that isn't me.

~~~
drvdevd
Personally, I like to occasionally remember that there's not a single
computing device in my _life_ I really _own_. It can be either a comforting
piece of knowledge or a frightening one depending on your perspective. I like
to take the former perspective.

~~~
27182818284
Ehhh. In a sense sure, but lots of people consider they've owned a loaf of
bread without knowing Wonder's recipe.

~~~
drvdevd
I see what you mean, but also no one ever owned a loaf of bread that could be
cut, toasted and buttered remotely by a totally unknown 3rd party either...

~~~
24gttghh
Unless you go to the local diner for breakfast and don't know the line-cook...

~~~
drvdevd
Good point. You walk into the diner implicitly trusting the line cook and the
bread baker and so on. Very similar to the chain of trust implicit in the tech
we use. Only I believe the chain of trust in the tech to be much longer and
proven to have been repeatedly broken in recent decades. Also, line cooks have
the luxury of tossing out loafs of bread that have moulded over.

~~~
24gttghh
:) They can even cut off the mouldy bits, preserving the rest of the
breakfast.

------
moyix
This is totally doable. PANDA [1], e.g., takes recordings of full-system
execution by recording non-deterministic hardware inputs in a modified QEMU.
This is much more compact than a full instruction trace, but you can then
replay the recording to produce instruction traces, memory traces, whatever.

We recently added some machinery [2] for using debug symbols to map each
instruction back to its source line. So, assuming you can get debug symbols
installed for every userspace program, every library, and the kernel, I think
you could come very close to tying every instruction back to a source line.

There are caveats, though – the overhead of QEMU and the recording
infrastructure mean you only end up getting around 200 million instructions /
second, which is nothing compared to modern bare metal. You could capture a
longer trace, though, and do the same thing to get up to the same amount of
code executed as on real hardware.

If someone wants to try this I'd be very interested to see the results and
happy to help answer any questions that come up!

[1] [https://github.com/moyix/panda](https://github.com/moyix/panda)

[2]
[https://github.com/moyix/panda/blob/master/qemu/panda_plugin...](https://github.com/moyix/panda/blob/master/qemu/panda_plugins/pri_trace/pri_trace.cpp)

~~~
pm215
This is getting perilously close to "attempting to extract performance
information from a program running under a simulator/CPU model", which is
really tricky to do in a way that gives you results that apply to actual
hardware. In particular, the performance characteristics of QEMU are wildly
different from real hardware in several significant ways: (1) the emulated FPU
is incredibly slow, so the results will overweight anything that's FP-
intensive (2) there is no modelling of CPU caches, TLBs or branch predictors,
so these major influences on real-hardware performance won't be visible (3)
because the emulated CPU is very slow, interrupt handler and similar code
which runs triggered by timers or other external events will take up more time
in the trace than it would on hardware.

You'd probably find something interesting in the general sense from looking at
what's going on in a QEMU emulation for a fixed time period, but you'd need to
be rather wary about how applicable what you saw might be to a real hardware
run. At minimum you'd want to cross-check against what perf on real hardware
revealed.

------
roel_v
This line stood out to me:

"It might be shocking if an erroneous program was discovered, but it could
certainly happen."

Is anyone actually under the impression that the thousands or tens of
thousands of components that interact with each other at any given time on a
typical desktop computer would be "correct", even for generous definitions of
that word (well, that are less generous than "most of the time generally do
what the designers set out to, generally, accomplish")?

It seems to me that all components in computers of at least the last decade,
not to mention their interactions, are so complex that they almost certainly
are full of small and not so small errors; they are deployed as soon as the
most obvious and obnoxious errors have been removed but there must be heaps of
things going on that most people would agree on are "errors". I'm continually
amazed that we manage to build (or rather, 'assemble') systems that most of
the time work at all.

------
EdwardCoffin
I think perhaps some alterations would really be necessary to make this
analysis tractable. He wrote this in 1989, so how many doublings in
performance have we had since then? The duration should be adjusted
accordingly, so analyze everything the computer does in say a thousandth of a
second, or even less.

Another thing is that he was surely envisioning a program written in C or the
like - a straightforward translation from the program to executable with some
optimization, and all of the program optimized to the same level. With JITs,
one would have to take a few steps back to determine whether the code had been
optimized because the JIT had determined it would be a good idea, not
optimized because it just wasn't important, or possibly not optimized _yet_
simply because the JIT had not seen enough of the program to decide whether to
bother.

There's also the idea of memory hierarchies influencing how things are done in
ways not necessarily obvious to someone focusing on the code being executed at
the moment. I think any memory hierarchies (I'm thinking here of L1, L2, L3
caches in modern processors) have a much greater impact on how optimal code is
written now than it was back then. Perhaps the code one examined could be done
better for itself, but was done less optimally in order to not have
detrimental effects on other code in the program that was more important (like
perhaps displacing stuff that other code had cached).

I'm not really sure that this exercise would be worth it today, except in
special cases, trying to wring every last bit of performance out of a program
after less tedious avenues had been exhausted. I can't say that I have had a
whole lot of need for such performance.

This idea of close analysis of a part of one's program does remind me of
something else though: the idea that one should run one's program in a source-
level debugger and step through every line of code one can get to, trying to
get 100% code coverage, and contemplate the state of the program at every
step. I think this would uncover many latent bugs, hidden assumptions and the
like in most programs. I guess what I am trying to say here is that
correctness is more important than performance, and perhaps easier to do in
today's world.

~~~
gaxun
> I think perhaps some alterations would really be necessary to make this
> analysis tractable.

In this post, I presented the idea as if it was "easy", but Knuth seemed to be
proposing it as a rather large undertaking. I skipped some parts of his
original prompt for brevity, but since you bring this up, I can summarize a
bit more here. I also found a copy of the address in PDF form online [0], if
you want to read the whole thing. This is from the last few pages.

He compared this task to researchers who documented every square meter of a
large tract of land to see how different factors affected plant growth. He
also mentioned a study of 250,000 individual trees in a rain forest. It's not
supposed to be easy.

Yes, we've doubled many times since then, but our power to analyze large piles
of data has also improved dramatically.

> I'm not really sure that this exercise would be worth it today

I think it really depends on what kind of system you are going to analyze. He
was probably thinking of big systems running a school or business back then.
These days there are just so many more types of machines. Most are probably
not interesting at all. Maybe some kind of life-or-death devices, though?

> correctness is more important than performance

One neat thing about this kind of lowest-level analysis is that you can
probably check on both at the same time.

[0]
[http://www.sciencedirect.com/science/article/pii/03043975919...](http://www.sciencedirect.com/science/article/pii/030439759190295D)

~~~
EdwardCoffin
You might find the following interesting.

In Carl De Marcken's Inside Orbitz email [1] he has the following item:

> 10\. ... We disassemble most every Lisp function looking for inefficiencies
> and have had both CMUCL and Franz enhanced to compile our code better.

In 2001 there was a series of three panel discussions on dynamic languages [2]
that are an absolute goldmine: about six hours worth of listening, with
various luminaries discussing deep ideas and fielding questions from the
audience. Knuth is cited several times on different topics. This is also where
I learned about the idea of stepping through every line of code you can get to
(Scott McKay brought this up in the panel on runtime [3]. You ought to be able
to find the other two panels (compilation and language design) from that one.
Anyway, they discuss a lot of idea behind performance, for example

a) code that is locally bad but globally good

b) optimizing for locality and predictability of memory access (David Moon, in
the Compilation panel, I think)

c) speculation that performance improvement could be gained via having an
efficient interpreter residing in cache, over optimized compiled code (Scott
McKay again, in the panel on runtime - incidentally I think this idea is
proven in Kdb+ - at least, I understand that is their secret to performance,
or one of them)

[1] [http://www.paulgraham.com/carl.html](http://www.paulgraham.com/carl.html)

[2] [http://www.ai.mit.edu/projects/dynlangs/wizards-
panels.html](http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html)

[3] [https://www.youtube.com/watch?v=4LG-
RtcSYUQ](https://www.youtube.com/watch?v=4LG-RtcSYUQ)

Edit: cleaned up formatting Edit 2: and more clean up

------
Someone
_" The computer will execute several hundred thousand instructions during that
second; I'd like you to study them all."_

That _" several hundred thousand"_ has grown by about four orders of magnitude
since then, even more if we consider multi-threading, GPU, CPUs in the network
controller, etc.

Because of that, that task has become a lot more work. It is doable for
10^5-10^6 instructions (certainly for someone with Knuth's work ethic), but
for 10^9-10^10 instructions, I guess even he would need to write tooling to do
it.

A problem with tooling is that it may hide interesting behavior that the
programmer writing the tooling isn't aware of in the 'misc' category of code
executed. I fear that may defeat the purpose of this exercise.

~~~
Klathmon
Couldn't you adjust the amount of time that you look at to preserve the
"purpose" of the exercise?

Make it 1/10th of a second, or 1/1000th if needed.

~~~
hderms
Interesting question as I'm guessing the ratios of instructions would probably
be different owing at the very least to the vast difference in CPU power to IO
that has grown over time.

------
ideonexus
Fascinating question, reminds me of this one that made HN almost two years
ago: “What happens when you type Google.com into your browser and press
enter?”

[https://github.com/alex/what-happens-when](https://github.com/alex/what-
happens-when)

Would love to see Knuth's Challenge setup in a repo for collaboration.

HN Discussion on the above link:

[https://news.ycombinator.com/item?id=8902105](https://news.ycombinator.com/item?id=8902105)

~~~
contingencies
Sad that it has 57 issues and 27 pull requests, and begins with _When you just
press "g" the browser receives the event and the entire auto-complete
machinery kicks into high gear..._ ... not exactly a low level explanation.

I was expecting non-keyboard input methods, keyboard scan codes, keymaps,
input modes, fonts, glyph selection, screen resolution and rendering. Scan
codes _are_ mentioned in the next paragraph, but the rest are ignored (OK,
they finally mention DOM rendering, but leave out the earlier screen updates).
Two seconds later they are discussing ARP... really? What about
virtualization, sandboxing, firewalling, link layer selection, use of multiple
concurrent link layers for a given network-layer address range, non-ARP link
layers, existing transport-layer sessions to the relevant hosts, VPNs,
proxies, offline mode, caches and load balancers (possibly multiple layers),
non DNS-based name resolution, differences between available IPv4 and IPv6
address classes (mobile IP), layer 2 transparent proxying (eg. proxy
ARP/heartbeat), switch ARP caches, etc.

I also think this is an excellent interview question for any internet-related
engineering role, as it really gives people a chance to show their degree of
comprehension of many layers.

~~~
drchickensalad
Know anything else that actually lists all that? I'd love to research each
thing on its own independently but it's so hard to come up with the list
without already knowing

~~~
contingencies
_I 'd love to research each thing on its own independently but it's so hard to
come up with the list without already knowing_

I would recommend examining from different perspectives: network activity or
options at each layer (wireshark), browser activity (firebug, debugger, or
read the code), OS activity (debugger, or read the code), program activity
(debugger, or read the code, or learn various system monitoring tools like
filesystem monitors, kernel or library-based tracers, etc), virtualization
systems (their hardware and network emulation), network systems (proxies, load
balancers and caches of all configurations) security systems of all kinds. You
are right that there is probably no holistic resource, because the question is
sort of ridiculously specific.

------
drdrey
TEMU is a tool based on QEMU that is (was?) aimed at doing exactly that, a
dynamic analysis that captures instructions executed in every process
(including the kernel):
[http://bitblaze.cs.berkeley.edu/temu.html](http://bitblaze.cs.berkeley.edu/temu.html)

The problem of course is that this is not analyzing the physical machine but
rather the behavior of programs in a virtual environment.

------
wyldfire
> There are definitely people who profile their applications ... the entire
> system, though? Is there a utility I can use to log every instruction
> performed in one second for an operating system running on bare metal?

Kinda. oprofile and later perf use the hardware's performance counters. So,
it's sampled and therefore it's not "every instruction performed" but the
scope is indeed the entire system. If your hardware doesn't support it, then
the kernel has a sampling feature on a timer. It's IMO representative and
probably the sanest way to accept the challenge.

But Knuth questions make me wonder: how could we possibly reason about
correctness from the samples? We'd be better off following the trail of
breadcrumbs back to the source unless he means something terribly specific
beyond "correct or erroneous".

> Can it be done with an emulator like QEMU?

Yes, that's probably an alternative if sampled isn't sufficient.

~~~
no_protocol
> We'd be better off following the trail of breadcrumbs back to the source

I'm thinking this is what was intended. Did something get lost along the way
with all the layers of translation from "human problem" to machine language?
Here's another quote from the source passage, linked elsewhere in this thread:

    
    
      The sequence of operations will be too difficult
      to decipher unless you have access to the source
      code from which the instructions were compiled.
      University researchers who wish to carry out such
      an experiment would probably have to sign
      nondisclosure agreements in order to get a look
      at the relevant source code.

------
shauncrampton
I like the idea of making a trace through a complex app (such as a browser)
and the kernel and listing off all the open source developers whose code it
passes through. Just how many people contributed to my 1s of Youtube cat video
watching pleasure?

------
CalChris
Scalar CPU clock rates were maybe 25MHz in 1989.

OOO speculative superscalar hyperthreaded multicore CPUs now are 2.5GHz.

------
ultramancool
This just makes me miss SoftIce. Just hit ctrl+d and be immediately dropped
into exactly what your machine is doing at any time.

Of course, just doing this would be illegal on most proprietary operating
systems.

------
je42
This and similar questions is a very good question for a job interview (as a
theoretical thought experiment in the interview)

------
citrin_ru
Analyze all CPU instructions even for one second is very hard. Good place to
start is Dtrace FBT provider - capture and analyze all called functions for
one second.

Perhaps it will be some useless work, which should not be done at all. But CPU
time is cheap and programmer's time is expensive.

------
DonHopkins
Reminds me of how Stanisław Lem went virtual and wrote fictitious reviews of a
non-existent books that were far too vast to actually exist in the real world,
including one called "One Human Minute" [1]:

"One Minute", for a faux book by J. Johnson and S. Johnson: One human minute,
Moon Publishers, London - Mare Imbrium - New York 1985. The book is alleged to
be a collection of statistical tables, a compilation that includes everything
that happens to human life on the planet within any given 60 second period.

Here are some real reviews of a real book of fictitious reviews of fictitious
books [2]:

KristenR rated it really liked it. Shelves: science-fiction, short-stories,
male-author.

This volume had 3 essays, each with an interesting concept.

One Human Minute: Lem has styled this piece as a book review...of a book that
hasn't been written. One Human Minute is apparently a Guinness Book of World
Records that is completely mundane, yet also amped up on steroids. Imagine a
book that is full of tables upon tables and graphs and charts about everything
that happens on earth per minute. How many babies are born, how many people
get struck by lightning, how many people are tortured by electricity, how many
orgasms are achieved per minute...

Definitely a philosophical piece, but seemed to be musing about the depravity
of the human race. I'm not sure if I missed the point.

The Upside-Down Revolution: The evolution of military and warfare...written
under the premise that the author has a history book from the future and
publishes it in the present as science fiction. I lost interest partway
through this one.

The World as Cataclysm: I have a fascination with astrophysics. I am fully
aware that the bulk of it goes over my head and I have near zero retention,
but that doesn't stop me from reading/watching anything on the subject that is
remotely geared towards the layman. Simply Fascinating. This piece goes into
the probabilities of extraterrestrial life. I don't know what Stanislaw Lem's
qualifications are, but as I was reading this I was nodding...uh huh, that
makes sense...hmmm, I sense a little research project on Lem.

Rich Meyer rated it: really liked it. Shelves: read-in-2014.

An interesting trio of science fiction-tinged essays by one of the great
science fiction writers. The title refers to one about a book that covers what
happens on the planet every minute, and how the book is updated and
computerized until it becomes a power unto itself. The other essays follow the
search for intelligent life and the chances for finding it, and a look at the
history of warfare from the point-of-view of a book from 2150, and manages to
make some pretty accurate predictions.

[1]
[https://en.wikipedia.org/wiki/Stanis%C5%82aw_Lem%27s_fictiti...](https://en.wikipedia.org/wiki/Stanis%C5%82aw_Lem%27s_fictitious_criticism_of_nonexisting_books#Provocation_and_One_Human_Minute)

[2]
[http://www.goodreads.com/book/show/28771.One_Human_Minute](http://www.goodreads.com/book/show/28771.One_Human_Minute)

------
joakleaf
... I wonder what the human brain and body does in one second.

------
betimsl
How about use an FPGA for this? Use an architecture and ALU and implement a
software which would monitor and output op code and ALU for any given time.

------
dmfdmf
I'd like to see this applied to Windows 7 and find out why it has become
mysteriously slow, even on fast computers, after Windows 10 came out.

