> When I was at Stanford with the AI project [in the late 1960s] one of the things we used to do every Thanksgiving is have a computer programming contest with people on research projects in the Bay area. The prize I think was a turkey.
> [John] McCarthy used to make up the problems. The one year that Knuth entered this, he won both the fastest time getting the program running and he also won the fastest execution of the algorithm. He did it on the worst system with remote batch called the Wilbur system. And he basically beat the shit out of everyone.
> And they asked him, "How could you possibly do this?" And he answered,
> "When I learned to program, you were lucky if you got five minutes with the machine a day. If you wanted to get the program going, it just had to be written right. So people just learned to program like it was carving stone. You sort of have to sidle up to it. That's how I learned to program."
I heard a similar story from a professor who was doing his graduate work in mainframe days.
(Paraphrasing) 'When you learn to program on punch card with batch jobs, you get really good at writing it correct the first time. Because if you write it wrong, it's (a) overnight processing to find out, (b) another day to fix it, (c) another overnight to get the updated results.'
On one hand, high compile times make you think about your code more deeply. No one will dispute that this is a good discipline.
On the other hand, computers can check so many more cases more quickly and more correctly. I think there's a point at which you say, "I'm going to get better results by letting the machine do the work."
I had to work 3 years to buy a computer in 2001 and in the mean time I would volunteer to do networking for internet cafes to get a few minutes of coding time after hours. When I finally bought a computer, my parents only allowed me to touch it for 24 hours each weekend and using that slow dial-up internet connection. I made those 24 hrs count. After school, I would read every programming book in the bookstore, plan and design and UX my projects, so by the time my 24 hours rolled in, I could just build. By the time I had finished college, I could tell you all the browser mis-behaviors your code would trigger in IE and others, by just reading the code.
The 10X+ developers got there by iterating so many times, they know how to skip the traps. The more layers of abstraction pile up on top of fundamental systems the more variably productive new developers become by default, because they become attached to abstractions that can and do change, and because the permutations of misguided possibilities on top of flawed foundations become so abundant that a single person can't strip through the noise. That's why additional abstraction and libraries that automatically update on top of your existing code are a recipe for short-lived work and thus a quick to degrade portfolio. Understandable code that doesn't move under your feet is key, because you will forget what you did way before what you did becomes outdated.
‘The story you heard is typical of legends that are based on only a small kernel of truth. Here’s what actually happened: John McCarthy decided in 1971 to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford, using the WAITS time-sharing system; I was down on the main campus, where the only computer available to me was a mainframe for which I had to punch cards and submit them for processing in batch mode. I used Wirth’s ALGOL W system (the predecessor of Pascal). My program didn’t work the first time, but fortunately I could use Ed Satterthwaite’s excellent offline debugging system for ALGOL W, so I needed only two runs. Meanwhile, the folks using WAITS couldn’t get enough machine cycles because their machine was so overloaded. (I think that the second-place finisher, using that "modern" approach, came in about an hour after I had submitted the winning entry with old-fangled methods.) It wasn’t a fair contest.
As to your real question, the idea of immediate compilation and "unit tests" appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works and what doesn’t. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be “mocked up.”’
Give this man a node.js application with 100000 dependencies where an update from package version 34.2.4 to 34.2.5 breaks everything because the developers decided to change the order of parameters in one function just for fun.
> node.js application with 100000 dependencies where an update from package version 34.2.4 to 34.2.5 breaks everything because the developers decided to change the order of parameters in one function just for fun.
aka "totally unknown environment" which you "need feedback about what works and what doesn’t"
he was dealing with that situation already in the 01970s—he was using an extensible programming language that people kept extending, so the programs he wrote one day would break the next—which is why he designed τεχ in such a way that you can take any τεχ document from 40 years ago and render it in exactly the same way in current τεχ
one of the drop-dead showstopper tests in the τεχ release process since that time has been the 'τεχ torture test'; it's an enormous random τεχ document which has to produce byte-identical output on each new version of τεχ for it to be released, unless he can justify each difference. so he's not opposed to automated testing, he just tends to do it at a larger granularity
It's an interesting idea, that scarcity and inconvenience could make one a more thoughtful practitioner.
A similar situation applies to writing. When people wrote with quills on parchment under candlelight, they did the structuring and editing in their minds beforehand. Now, word processors encourage us to vomit words onto the page and rarely do we edit them down to a similar level of quality.
It's cool that scarcity can breed innovation as well. An example I like is how the Swiss went from manufacturing general watches to very successful luxury ones because they were scarce in labour resources and decided to make a pivot given their constraints at the time.
Constraints (to the degree they are not debilitating) can trigger creativity, unlock efficiency and sharpen problem solving skills.
Abundance is better than scarcity but it can lead to bloat and waste. With strong discipline and a value system that promotes lean approaches one might be able to have the best of the two worlds.
the worse is better vs The Right Thing™ dichotomy encompasses more than availability of resources. It's perfectionism vs "don't let the perfect be the enemy of the good" 80/20 expediency
I’m 39 now, and I started to code when I was 8. There were a few things that I think helped a lot:
No internet forced me to think for myself a lot when I got stuck, instead of immediately googling.
Immediate Googling a problem is obviously better for productivity, BUT not having this ability grew my working memory a lot during my formative years, I still have a very capable working memory, which allows me to hold large amounts of code (deep call stacks and state) and therefore understand unknown code a lot more fully and a lot faster than others. Being able to hold large state in memory means I can work on compilers, kernels, deep learning frameworks with a wholistic view which leads to more consistent interface design and data structures.
Less distractions also grew my focus and attention abilities. I am shocked at how little programmers can focus these days, they can hardly hold attention for more than 20 seconds without jumping to a website / phone / other task.
The system I started with (VIC20) booted into a BASIC interpreter - you had no choice but to learn to code!
You might just be gifted with great memory. I‘m not sure working memory can really be trained in a significant way. Happy to be proven wrong of course.
One pretty strong argument against that claim comes from an interesting experiment with world class chess players. When you give top players positions from real games they can often memorize dozens of boards, which makes for pretty impressive blind simultaneous matches, I think the world record is in the dozens. But when you hand them illogical or random positions that never occur on boards they can maybe memorize one or two boards, only a little bit better than an amateur with reasonable memory.
You could even try this with programming. Have a programmer memorize a page of code then try it with someone who has never coded. Pattern recognition, which is learned, has huge impact on ability to memorize. It's sort of intuitive why, if I see three functions and I can recognize that it's fizzbuzz, recursive fibonacci and bubble sort all I need to remember is variable names. My dad who has never seen code would have to recall every symbol.
It is a lossy form of compression. You dont remember exact copies of all of your sensory inputs - your brain abstracts and composes, at every level of abstraction.
They also mentioned a lack of distractions, allowing them to grow their attention and focus. I'm pretty sure all of those will affect one's memory.
Sometimes the wonders of modern computers are as much of a trap as they are beneficial. It was far easier to separate yourself from distractions when the tools were not a source of distraction in their own right. If the television was a distraction, you simply turned it off or went to another room. You can't simply turn off a computer or walk away from a computer when the task you are trying to complete is on a computer. It also helped that computers weren't quite the communications and entertainment devices that they are today. That is especially true of personal computers until the early 90's, where one had to quit a task, rather than momentarily switch away from it, to do something else.
If you have that attention or focus, modern computers are absolutely amazing productivity tools. If you don't, it is an uphill battle to stay on task. Older computers may not be nearly as amazing as productivity tools, but at least you didn't have to face that battle.
Dijkstra told similar stories. Working in the immediate post-war Netherlands he got something like 30 minutes of machine time a week. I may be wrong on the exact amount of time, but it was extremely limited.
I believe that, at least until recently, quite a lot of the continental European approach to computing science was due to the culture those early pressures created.
In the USA, in the ‘70’s, I had an account on a university computer that was good for 4 hours in a month. I think that was just a fairly normal thing in the old days, when almost nobody had an Apple ][.
also, i think it's a pretty large amount of computer time! it was enough to do things every month that would have required decades or centuries with a pocket calculator
if you were using an ibm 370 model 168 http://www.bitsavers.org/pdf/ibm/370/funcChar/GA22-7010-4_37..., you were running on a machine with 1 to 8 mebibytes of ram (dedicated entirely to your program if in batch mode), 16 mebibytes of virtual memory, an l1 cache which i think was 8 kibibytes by default, 1.3 megabytes per second of i/o per channel (with up to 12 channels), and a fixed-point (32-bit) multiply took 780 nanoseconds, so you could do about 1.3 million multiplies per second. your data would often be on magtape, which moved 2.9 meters per second, which would be 700 kilobytes per second at 6250 characters per inch (introduced 01973). 4 hours would be 18 billion multiplies
the apple ][, by comparison, could have 64 kibibytes of ram, and read or write a floppy at about two kilobytes per second and run about 300'000 instructions per second, delivering 0.021 dhrystone mips on an apple //e, according to https://netlib.org/performance/html/dhrystone.data.col0.html. so, coming at it from a batch processing perspective, this particular 370 model had about 64 times as much ram as a maxed-out apple ][, about 64 times as much computing power (maybe 1024× for floating-point stuff), and i/o on the order of 2048 times faster
so those 4 hours of computer would allow you to process a quantity of data on magtape that would take about a year swapping floppies on an apple ][, or do an amount of calculation the apple would need weeks or months for
suppose instead of the apple your alternative was an hp-35 non-programmable pocket calculator, introduced in 01972, or more or less equivalently, a slide rule. you need about 2 seconds to enter each new operand and press an operation key. we can derate those 18 billion multiplies in the 4-hour monthly allocation to about 5 billion total operations if we include cosines, square roots, etc. this would take you 300 years to do by hand on the hp-35. on a programmable hp-65 (introduced 01974) maybe only 30 years
the enormous benefit of the apple ][ was that you got millisecond turnaround instead of hundred-kilosecond turnaround. for many purposes, those eight orders of magnitude less latency more than compensated for the two or three orders of magnitude you were sacrificing in throughput
when i started using time-shared unix machines in the 01990s (according to the dhrystone link, the decstation 3100 was 13.4 dhrystone mips, so roughly 16 times the mainframe described above) i would log into a shell server with 16-64 other concurrent users. the default cpu time limit was, i think, 10 minutes. that is, when you launched a program, if it used 10 minutes of cpu, the os would kill it. before netscape, the only time i ever saw this happen was when i accidentally wrote infinite loops. i could use an interactive program like trn or tcsh or emacs for hours and hours without hitting that limit
> The CPU has an 80-nanosecond cycle time and an 8-byte-wide data
path. (...) Among the major elements in the CPU are the instruction unit, the
execution unit, local storage, and control storage.
> (...) The faster internal performance of the Model 165 is due in part
to the use of more concurrence in CPU operations than is implemented
in the Model 65. The Model 165 CPU contains an instruction unit and
an execution unit that overlap instruction fetching and preparation
with instruction execution. The Model 165 instruction unit is
controlled by logic circuits and can process several instructions
concurrently while the execution unit is executing a single instruction.
The instruction unit prefetches instructions (maintaining them in
sequence), decodes instructions, calculates addresses, prefetches
instruction operands, and makes estimates of the success of conditional
branches. When a conditional branch is encountered, the instructions
immediately following the branch and those located at the branch address
are prefetched and placed in separate instruction buffers within the
instruction unit. Two 16-byte instruction buffers are used. This
insures the availability of prefetched instructions whether the branch
is taken or not.
> The execution unit is microprogram controlled and can execute one
instruction at a time. It has the capability of processing a new
instruction every cycle. Emphasis is placed on optimizing fixed binary
and floating-point arithmetic operations. A 64-bit parallel adder
is used to perform binary and floating-point arithmetic, while an 8-bit serial adder is used in the execution of packed decimal arithmetic.
processing a new instruction every 80-nanosecond cycle would be a peak performance of 12.5 mips, although presumably many instructions, in particular including multiplication, apparently took longer than that; i think the average was over 2 cycles per instruction. the 12.5 megahertz clock was the same on the model 168 i described above
i'm not an ibm fan. but ibm mainframes might be the central example of 'the kind of large computer a person at a university in the 01970s might be allocated a few hours a month on'
there's a table at the end of https://www.ece.ucdavis.edu/~vojin/CLASSES/EEC272/S2005/Pape... describing the clock speed, cache size and associativity, cache line size, tlb size, etc., of numerous 360 and 370 models, including the 165 and 168, unfortunately lacking year of introduction. if someone else is doing this kind of investigation they may find it a useful reference
in https://narkive.com/9nl6cj2Q.3 (which is by wheeler but which i can't find in his own web pages) he says the model 168 was 3.0 mips, while the 370/195 (which i think never shipped) reached 10 mips peak
> I
wrote it very quickly, but I was in IBM. And for someone in IBM in those days to publish
something, it had to go through the clearance procedures in IBM. And so I submitted it to the
clearance procedure, and of course the problem with the book from an IBM point of view was
that it was not coming out saying that IMS was the greatest product ever invented. So
everybody who reviewed it in the clearance procedure did two things. First, they found
something I had to change. And second, they found somebody else who had to review it. This
process was clearly going to go on for a long time! (...) I did actually hawk a proposal and sample chapters around to several publishers,
and I got some proposed contracts, but I wasn’t getting the clearance.
When I was at Stanford with the AI project [in the late 1960s]
one of the things we used to do every Thanksgiving is have a
programming contest with people on research projects in the Bay
area. The prize I think was a turkey.
McCarthy used to make up the problems. The one year that
Knuth entered this, he won both the fastest time getting the
program running and he also won the fastest execution of the
algorithm. He did it on the worst system with remote batch
called the Wilbur system. And he basically beat the shit out of
everyone.
And they asked him, "How could you possibly do this?" And
he answered, "When I learned how to program, you were
lucky if you got five minutes with the machine a day. If you
wanted to get the program going, it just had to be written
right. So people just learned to program like it was carving
in stone. You sort of have to sidle up to it. That’s how I
learned to program.
#--
Made width:
wget -c "https://nitter.esmailelbob.xyz/pic/orig/media%2FGM0SuezXkAANlXk.jpg" -O a.jpg
tesseract -l eng a.jpg a.txt
https://twiiit.com will redirect any Twitter URL to Nitter, then you can read any post with even Dillo,
Lynx or JS less browsers. Then, the image it's one click away, just download it.
> When I was at Stanford with the AI project [in the late 1960s] one of the things we used to do every Thanksgiving is have a computer programming contest with people on research projects in the Bay area. The prize I think was a turkey.
> [John] McCarthy used to make up the problems. The one year that Knuth entered this, he won both the fastest time getting the program running and he also won the fastest execution of the algorithm. He did it on the worst system with remote batch called the Wilbur system. And he basically beat the shit out of everyone.
> And they asked him, "How could you possibly do this?" And he answered,
> "When I learned to program, you were lucky if you got five minutes with the machine a day. If you wanted to get the program going, it just had to be written right. So people just learned to program like it was carving stone. You sort of have to sidle up to it. That's how I learned to program."