Hacker News new | past | comments | ask | show | jobs | submit login
Alan Kay on Donald Knuth (twitter.com/fermatslibrary)
71 points by tosh 10 months ago | hide | past | favorite | 39 comments



This seems to be quoting from https://softpanorama.org/People/Knuth/index.shtml

> When I was at Stanford with the AI project [in the late 1960s] one of the things we used to do every Thanksgiving is have a computer programming contest with people on research projects in the Bay area. The prize I think was a turkey.

> [John] McCarthy used to make up the problems. The one year that Knuth entered this, he won both the fastest time getting the program running and he also won the fastest execution of the algorithm. He did it on the worst system with remote batch called the Wilbur system. And he basically beat the shit out of everyone.

> And they asked him, "How could you possibly do this?" And he answered,

> "When I learned to program, you were lucky if you got five minutes with the machine a day. If you wanted to get the program going, it just had to be written right. So people just learned to program like it was carving stone. You sort of have to sidle up to it. That's how I learned to program."


I heard a similar story from a professor who was doing his graduate work in mainframe days.

(Paraphrasing) 'When you learn to program on punch card with batch jobs, you get really good at writing it correct the first time. Because if you write it wrong, it's (a) overnight processing to find out, (b) another day to fix it, (c) another overnight to get the updated results.'


I learned to program using Turbo Pascal and kept going with Delphi. I then had a 10 year break doing mainly C++.

Having compile times suddenly go up 10-100x I found myself being much more careful before compiling.

Don't think I spent that much more time, just an extra pass looking things over, making sure I wasn't doing something obviously wrong etc.

Hard to maintain that discipline now that I'm back to "instant" compilation, for some reason.


On one hand, high compile times make you think about your code more deeply. No one will dispute that this is a good discipline.

On the other hand, computers can check so many more cases more quickly and more correctly. I think there's a point at which you say, "I'm going to get better results by letting the machine do the work."


I had to work 3 years to buy a computer in 2001 and in the mean time I would volunteer to do networking for internet cafes to get a few minutes of coding time after hours. When I finally bought a computer, my parents only allowed me to touch it for 24 hours each weekend and using that slow dial-up internet connection. I made those 24 hrs count. After school, I would read every programming book in the bookstore, plan and design and UX my projects, so by the time my 24 hours rolled in, I could just build. By the time I had finished college, I could tell you all the browser mis-behaviors your code would trigger in IE and others, by just reading the code.

The 10X+ developers got there by iterating so many times, they know how to skip the traps. The more layers of abstraction pile up on top of fundamental systems the more variably productive new developers become by default, because they become attached to abstractions that can and do change, and because the permutations of misguided possibilities on top of flawed foundations become so abundant that a single person can't strip through the noise. That's why additional abstraction and libraries that automatically update on top of your existing code are a recipe for short-lived work and thus a quick to degrade portfolio. Understandable code that doesn't move under your feet is key, because you will forget what you did way before what you did becomes outdated.


Donald Knuth on Donald Knuth:

‘The story you heard is typical of legends that are based on only a small kernel of truth. Here’s what actually happened: John McCarthy decided in 1971 to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford, using the WAITS time-sharing system; I was down on the main campus, where the only computer available to me was a mainframe for which I had to punch cards and submit them for processing in batch mode. I used Wirth’s ALGOL W system (the predecessor of Pascal). My program didn’t work the first time, but fortunately I could use Ed Satterthwaite’s excellent offline debugging system for ALGOL W, so I needed only two runs. Meanwhile, the folks using WAITS couldn’t get enough machine cycles because their machine was so overloaded. (I think that the second-place finisher, using that "modern" approach, came in about an hour after I had submitted the winning entry with old-fangled methods.) It wasn’t a fair contest.

As to your real question, the idea of immediate compilation and "unit tests" appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works and what doesn’t. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be “mocked up.”’

https://www.informit.com/articles/article.aspx?p=1193856

(Link shared by sp332 in sibling comment)


> "unit tests" appeals to me only rarely

Give this man a node.js application with 100000 dependencies where an update from package version 34.2.4 to 34.2.5 breaks everything because the developers decided to change the order of parameters in one function just for fun.


> node.js application with 100000 dependencies where an update from package version 34.2.4 to 34.2.5 breaks everything because the developers decided to change the order of parameters in one function just for fun.

aka "totally unknown environment" which you "need feedback about what works and what doesn’t"


he was dealing with that situation already in the 01970s—he was using an extensible programming language that people kept extending, so the programs he wrote one day would break the next—which is why he designed τεχ in such a way that you can take any τεχ document from 40 years ago and render it in exactly the same way in current τεχ

one of the drop-dead showstopper tests in the τεχ release process since that time has been the 'τεχ torture test'; it's an enormous random τεχ document which has to produce byte-identical output on each new version of τεχ for it to be released, unless he can justify each difference. so he's not opposed to automated testing, he just tends to do it at a larger granularity


I think that says more about programming today than it does Knuth.


Ok, that's fine but that's not what he does.


> fortunately I could use Ed Satterthwaite’s excellent offline debugging system for ALGOL W

What is he referring to here? What's an "offline" debugger?


The furthest back reference I can find for this story is https://softpanorama.org/People/Knuth/index.shtml

However, Knuth refuted it. "My program didn't work the first time" https://www.informit.com/articles/article.aspx?p=1193856


It's an interesting idea, that scarcity and inconvenience could make one a more thoughtful practitioner.

A similar situation applies to writing. When people wrote with quills on parchment under candlelight, they did the structuring and editing in their minds beforehand. Now, word processors encourage us to vomit words onto the page and rarely do we edit them down to a similar level of quality.


It's cool that scarcity can breed innovation as well. An example I like is how the Swiss went from manufacturing general watches to very successful luxury ones because they were scarce in labour resources and decided to make a pivot given their constraints at the time.


Constraints (to the degree they are not debilitating) can trigger creativity, unlock efficiency and sharpen problem solving skills.

Abundance is better than scarcity but it can lead to bloat and waste. With strong discipline and a value system that promotes lean approaches one might be able to have the best of the two worlds.


Unix/Berkeley vs Lisp/Mit.


the worse is better vs The Right Thing™ dichotomy encompasses more than availability of resources. It's perfectionism vs "don't let the perfect be the enemy of the good" 80/20 expediency


I’m 39 now, and I started to code when I was 8. There were a few things that I think helped a lot:

No internet forced me to think for myself a lot when I got stuck, instead of immediately googling.

Immediate Googling a problem is obviously better for productivity, BUT not having this ability grew my working memory a lot during my formative years, I still have a very capable working memory, which allows me to hold large amounts of code (deep call stacks and state) and therefore understand unknown code a lot more fully and a lot faster than others. Being able to hold large state in memory means I can work on compilers, kernels, deep learning frameworks with a wholistic view which leads to more consistent interface design and data structures.

Less distractions also grew my focus and attention abilities. I am shocked at how little programmers can focus these days, they can hardly hold attention for more than 20 seconds without jumping to a website / phone / other task.

The system I started with (VIC20) booted into a BASIC interpreter - you had no choice but to learn to code!


You might just be gifted with great memory. I‘m not sure working memory can really be trained in a significant way. Happy to be proven wrong of course.


One pretty strong argument against that claim comes from an interesting experiment with world class chess players. When you give top players positions from real games they can often memorize dozens of boards, which makes for pretty impressive blind simultaneous matches, I think the world record is in the dozens. But when you hand them illogical or random positions that never occur on boards they can maybe memorize one or two boards, only a little bit better than an amateur with reasonable memory.

You could even try this with programming. Have a programmer memorize a page of code then try it with someone who has never coded. Pattern recognition, which is learned, has huge impact on ability to memorize. It's sort of intuitive why, if I see three functions and I can recognize that it's fizzbuzz, recursive fibonacci and bubble sort all I need to remember is variable names. My dad who has never seen code would have to recall every symbol.


Great point. That sounds a bit like compression.


It is a lossy form of compression. You dont remember exact copies of all of your sensory inputs - your brain abstracts and composes, at every level of abstraction.


They also mentioned a lack of distractions, allowing them to grow their attention and focus. I'm pretty sure all of those will affect one's memory.

Sometimes the wonders of modern computers are as much of a trap as they are beneficial. It was far easier to separate yourself from distractions when the tools were not a source of distraction in their own right. If the television was a distraction, you simply turned it off or went to another room. You can't simply turn off a computer or walk away from a computer when the task you are trying to complete is on a computer. It also helped that computers weren't quite the communications and entertainment devices that they are today. That is especially true of personal computers until the early 90's, where one had to quit a task, rather than momentarily switch away from it, to do something else.

If you have that attention or focus, modern computers are absolutely amazing productivity tools. If you don't, it is an uphill battle to stay on task. Older computers may not be nearly as amazing as productivity tools, but at least you didn't have to face that battle.


it can definitely be enhanced by drugs; i've found that modafinil boosts my reverse-digit-span score from about 7 digits to about 11 digits


Dijkstra told similar stories. Working in the immediate post-war Netherlands he got something like 30 minutes of machine time a week. I may be wrong on the exact amount of time, but it was extremely limited.

I believe that, at least until recently, quite a lot of the continental European approach to computing science was due to the culture those early pressures created.


In the USA, in the ‘70’s, I had an account on a university computer that was good for 4 hours in a month. I think that was just a fairly normal thing in the old days, when almost nobody had an Apple ][.


As I look at the thoughtful comments, I think I omitted a detail that would help: the computer in question was a PDP-11 with interactive timesharing.


also, i think it's a pretty large amount of computer time! it was enough to do things every month that would have required decades or centuries with a pocket calculator

if you were using an ibm 370 model 168 http://www.bitsavers.org/pdf/ibm/370/funcChar/GA22-7010-4_37..., you were running on a machine with 1 to 8 mebibytes of ram (dedicated entirely to your program if in batch mode), 16 mebibytes of virtual memory, an l1 cache which i think was 8 kibibytes by default, 1.3 megabytes per second of i/o per channel (with up to 12 channels), and a fixed-point (32-bit) multiply took 780 nanoseconds, so you could do about 1.3 million multiplies per second. your data would often be on magtape, which moved 2.9 meters per second, which would be 700 kilobytes per second at 6250 characters per inch (introduced 01973). 4 hours would be 18 billion multiplies

the apple ][, by comparison, could have 64 kibibytes of ram, and read or write a floppy at about two kilobytes per second and run about 300'000 instructions per second, delivering 0.021 dhrystone mips on an apple //e, according to https://netlib.org/performance/html/dhrystone.data.col0.html. so, coming at it from a batch processing perspective, this particular 370 model had about 64 times as much ram as a maxed-out apple ][, about 64 times as much computing power (maybe 1024× for floating-point stuff), and i/o on the order of 2048 times faster

so those 4 hours of computer would allow you to process a quantity of data on magtape that would take about a year swapping floppies on an apple ][, or do an amount of calculation the apple would need weeks or months for

suppose instead of the apple your alternative was an hp-35 non-programmable pocket calculator, introduced in 01972, or more or less equivalently, a slide rule. you need about 2 seconds to enter each new operand and press an operation key. we can derate those 18 billion multiplies in the 4-hour monthly allocation to about 5 billion total operations if we include cosines, square roots, etc. this would take you 300 years to do by hand on the hp-35. on a programmable hp-65 (introduced 01974) maybe only 30 years

the enormous benefit of the apple ][ was that you got millisecond turnaround instead of hundred-kilosecond turnaround. for many purposes, those eight orders of magnitude less latency more than compensated for the two or three orders of magnitude you were sacrificing in throughput

when i started using time-shared unix machines in the 01990s (according to the dhrystone link, the decstation 3100 was 13.4 dhrystone mips, so roughly 16 times the mainframe described above) i would log into a shell server with 16-64 other concurrent users. the default cpu time limit was, i think, 10 minutes. that is, when you launched a program, if it used 10 minutes of cpu, the os would kill it. before netscape, the only time i ever saw this happen was when i accidentally wrote infinite loops. i could use an interactive program like trn or tcsh or emacs for hours and hours without hitting that limit


looking at the docs for the ibm 370 model 165 (https://bitsavers.org/pdf/ibm/370/systemGuide/GC20-1730-0_37... from 01970, several years before the model 168 mentioned above) i came across these more detailed notes on cpu performance

> The CPU has an 80-nanosecond cycle time and an 8-byte-wide data path. (...) Among the major elements in the CPU are the instruction unit, the execution unit, local storage, and control storage.

> (...) The faster internal performance of the Model 165 is due in part to the use of more concurrence in CPU operations than is implemented in the Model 65. The Model 165 CPU contains an instruction unit and an execution unit that overlap instruction fetching and preparation with instruction execution. The Model 165 instruction unit is controlled by logic circuits and can process several instructions concurrently while the execution unit is executing a single instruction. The instruction unit prefetches instructions (maintaining them in sequence), decodes instructions, calculates addresses, prefetches instruction operands, and makes estimates of the success of conditional branches. When a conditional branch is encountered, the instructions immediately following the branch and those located at the branch address are prefetched and placed in separate instruction buffers within the instruction unit. Two 16-byte instruction buffers are used. This insures the availability of prefetched instructions whether the branch is taken or not.

> The execution unit is microprogram controlled and can execute one instruction at a time. It has the capability of processing a new instruction every cycle. Emphasis is placed on optimizing fixed binary and floating-point arithmetic operations. A 64-bit parallel adder is used to perform binary and floating-point arithmetic, while an 8-bit serial adder is used in the execution of packed decimal arithmetic.

processing a new instruction every 80-nanosecond cycle would be a peak performance of 12.5 mips, although presumably many instructions, in particular including multiplication, apparently took longer than that; i think the average was over 2 cycles per instruction. the 12.5 megahertz clock was the same on the model 168 i described above

i'm not an ibm fan. but ibm mainframes might be the central example of 'the kind of large computer a person at a university in the 01970s might be allocated a few hours a month on'

there's a wikipedia article about this particular model, mentioning it was discontinued in 01977: https://en.wikipedia.org/wiki/IBM_System/370_Model_165

there's a table at the end of https://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Pape... describing the clock speed, cache size and associativity, cache line size, tlb size, etc., of numerous 360 and 370 models, including the 165 and 168, unfortunately lacking year of introduction. if someone else is doing this kind of investigation they may find it a useful reference

ken shirriff has an overview of the 360 line at http://www.righto.com/2019/04/iconic-consoles-of-ibm-system3...

nyt published an article on the introduction of the model 165 at https://www.nytimes.com/1970/07/01/archives/ibm-shows-2-new-... ('i.b.m. shows 2 new computers', gene smith, 01970-07-01) with the crucial detail:

A Model 165 with one million bytes would rent for $98,715 a month and sell for $4,674,160.

this allows us to calculate that, at the rental price and assuming 80% capacity utilization, each hour of effective computation cost 169 dollars

lynn wheeler's memoirs https://www.garlic.com/~lynn/subtopic.html are an irreplaceable source for information about this kind of thing; https://www.garlic.com/~lynn/2023b.html#0 discusses the history of development of the 370

in https://narkive.com/9nl6cj2Q.3 (which is by wheeler but which i can't find in his own web pages) he says the model 168 was 3.0 mips, while the 370/195 (which i think never shipped) reached 10 mips peak

http://www.beagle-ears.com/lars/engineer/comphist/model360.h... says the 370 model 168 was 1.6 cycles per instruction, reaching 3.5 mips on the model 168-3, although 12.5÷1.6 = 7.8, not 3.5

(why am i not an ibm fan? consider this quote from chris date about why his seminal book on databases was delayed for 2 crucial years from https://archive.computerhistory.org/resources/access/text/20... :

> I wrote it very quickly, but I was in IBM. And for someone in IBM in those days to publish something, it had to go through the clearance procedures in IBM. And so I submitted it to the clearance procedure, and of course the problem with the book from an IBM point of view was that it was not coming out saying that IMS was the greatest product ever invented. So everybody who reviewed it in the clearance procedure did two things. First, they found something I had to change. And second, they found somebody else who had to review it. This process was clearly going to go on for a long time! (...) I did actually hawk a proposal and sample chapters around to several publishers, and I got some proposed contracts, but I wasn’t getting the clearance.

)


Transcript with tesseract:

When I was at Stanford with the AI project [in the late 1960s] one of the things we used to do every Thanksgiving is have a programming contest with people on research projects in the Bay area. The prize I think was a turkey.

McCarthy used to make up the problems. The one year that Knuth entered this, he won both the fastest time getting the program running and he also won the fastest execution of the algorithm. He did it on the worst system with remote batch called the Wilbur system. And he basically beat the shit out of everyone.

And they asked him, "How could you possibly do this?" And he answered, "When I learned how to program, you were lucky if you got five minutes with the machine a day. If you wanted to get the program going, it just had to be written right. So people just learned to program like it was carving in stone. You sort of have to sidle up to it. That’s how I learned to program.

#--

Made width:

      wget -c "https://nitter.esmailelbob.xyz/pic/orig/media%2FGM0SuezXkAANlXk.jpg" -O a.jpg
      tesseract -l eng a.jpg  a.txt
https://twiiit.com will redirect any Twitter URL to Nitter, then you can read any post with even Dillo, Lynx or JS less browsers. Then, the image it's one click away, just download it.


this is great, thank you so much

i didn't realize that tesseract had improved so much!


is fermatslibrary alan kay's twitter account? if not, is there a better source for this?


I think he has an HN account.


He has two.

I have the honor of being the person that incited him to create his first account, which he only used once, to improve something that I said.

https://news.ycombinator.com/user?id=alanone1




that's irrelevant if he hasn't posted whatever this is (apparently https://news.ycombinator.com/item?id=40269201) on hn


Every tweet should be rendered with TeX like this one.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: