Here's my alternative history steampunk novel proposal:
What if Turing was such a minimalist engineer instead of a great mathematician, that he decided not to include a "halt" instruction, because you could always just terminate a program with an infinite loop: Bang! One fewer instruction! Less is more, worse is better, RISC is more efficient, right? Just emulate halt in software! The Woz would be proud.
But as it turned out in this universe, halt was the one instruction you needed to express the halting problem, the foundation of modern computer science! The looping infinitely problem would be moot, because all programs would always loop infinitely!
His genius was to add that one little magical thing that otherwise could have been missing: the halt instruction.
Kind of like Hamilton, who fruitlessly tried to multiply triples, which his children tormented him relentlessly about, until he finally realized he was missing a magical dimension, invented quaternions, and vandalized a bridge, all in the same day.
In case you think you've figured out the 2-state solution and you want to check that you can't do better, I've uploaded the number of 1s your machine should output at https://modalduality.org/static/busy-beaver-2.txt, but did not include the actual configuration.
Ah, so Aaronson is talking about the Busy beaver shifts function of 2 there. I gave the answer to the busy beaver function counting the maximum number of 1s.
Just some food for thought. The reason that the busy beaver challenges are enormously difficult to do manually, for higher number of allowed states, is because of two things:
self-modification and fixed points
A turing machine has no understanding of code vs memory. You can scratch out turing machines in their tape form, and simply write a program in C and restrict the memory to a couple of ints and also use labels to write to the code section. This is more practical and still an excercise in how you can use the code as memory once you are certain it is past execution.
But even more clever is having the fixed points of your printing routine be actual equal to NEW instructions written to the code section. Then you begin to truly treat the pc program like a turing machine and push the boundaries of how many operations you can perform before halting.
Now..about fixed points. By fixed point I really just mean a point at which the behavior of a program switches into a different context or ultimately the point at which the program finally halts (necessary for the challenge.)
Lets assume we have byte variables, A and B
We can print 1s quite naively doing this:
A = 0
B = 0
while A++
while B++
print "1";
We would get 256*256 = 65536 prints
But we can extract way more if we use the code section. And we can extract even more if we come up with fixed points / conditions other than just all our variables wrapping back around to 0. Ie. using number theory with each variable as a modulus, we can stop on much rarer conditions.. such as primes that are 1 mod 5, 2 mod 7, 3 mod 11, etc..Essentially, the entire world of coincidental structure and things that appear to be true but may just have very very large outliers..can be exploited to keep on looping. For some examples, take a look here: https://www.quora.com/What-are-some-mathematical-theories-th...
Literally if you can write a function that counts which is greater, li(x) or pi(x), and returns true or false. Then you just do while(liCmpPi(BigNum(allMyMemory).increment()) print(1)
you will have more prints than the universe has atoms. more than the seconds in the universe until heat death. math is fun :-(
There were a couple points in the explanation which were kind of unclear to me, though:
What makes it okay to identify the naming of a large number with producing that large number of things (e.g. 1's)? This felt like a disconnect to me since in the introduction we are asked the largest number we can write down in 15 seconds, but my understanding is that the Busy Beaver functions aren't 'naming a number', but instead we're counting the number of 1's output by them (or maybe the function returns the number of ones rather than the string itself?). Actually, I think I'm starting to see the connection now: if we're considering the mathematical function rather than an executing program, then it just 'instantaneously' gives back a single number, and that single number is whatever sequence of ones the Turing machine would produce, which we could interpret as a reference to a numeric value rather than counting the digits? Maybe still a bit lost. Another way of stating my confusion: a person could easily write down a number larger than the number of 1's output by the 7-state solution (102*10^10^10^1870532) within 15 seconds (I could write that number down in maybe 5 seconds or so)—so in what sense is the Busy Beaver function naming a larger number?
There was another thing that tripped me up for a while, though I think I get it now, which was: what characterizes the Busy Beaver function? Is it that it "returns the maximum of the running times of all programs of length n that do eventually halt"—or is it that it "counts the number of 1's possible to be written by an n-state Turing machine before it halts"? It's pretty clear to me now that it's the latter, and that the former (the BBS function) is just one implementation of the general concept, but it gave me some trouble to figure that out.
> What makes it okay to identify the naming of a large number with producing that large number of things (e.g. 1's)?
Nice catch! I glossed over this but this is pretty important. The issue is that for any _specific number_, it takes a constant amount of time to write a number larger than that. Related: solving chess is technically O(1) since there are a constant number of positions we have to check.
What we need instead for this concept is a series of numbers, those can be compared with the question "Does one eventually get to a point where it will always be larger than the second?". We can parametrize the "biggest number" by the number of seconds given, say we can write down one character per second. Then a person's largest number function might be something like "f(n) = 9 ^ n", which is easy to see _is_ a computable function, and thus smaller than BB(n) for the same number of seconds, eventually.
In practice, just something like BB(100) - or BB(BB(100)), even can't really be written out by any computable method in a finite amount of time.
> which was: what characterizes the Busy Beaver function?
The proper one is just the number of 1s, but I talked about the shift function because it's easier to see why it's uncomputable. Tibor Rado proved that the Busy beaver function is also uncomputable in his 1962 paper, if you're interested you could check out his reduction.
I have a feeling you would enjoy this paper by Scott Aaronson. It outlines some interesting issues at the intersection of philosophy and computation. Page 9 is “known integers” and touches on your argument.
Thanks! I should probably include some more details on how to program the simulator.
Turing machines are just one simple way for formalizing programs, without all the complexities of modern programming languages. The core idea is that whatever a supercomputer (or any future computer) can compute, so can a Turing machine (it might just take more time): this is called the Church-Turing thesis.
The specifics on how to program Turing machines don't actually matter too much in modern complexity theory, but the rigorous basis is there when needed.
--
A Turing machine is always in some "state," (starts at A) and it's head is always over some cell of the tape, initially all zeros. Then the program, which is the grid in angelic hierarchy, specifies what the Turing machine should do. If it's in state A and the cell the head is at is 0, just go to the corresponding cell (marked in bright red), it will tell you the next state to go to, a number to write down on the tape, and a direction to step toward (Left or Right).
For intuition about why more states allow for more complex programs, try writing a 1-state program that writes two 1s and then halts (it's not possible). Then try it with 2 states.
Besides being able to store 1's and 0's on the tape, a Turing Machine program can encode data by the state that it's in (which is kind of the whole point). I think that's what goldenkey is getting at when he said "you can use the code as memory". The fact that you're in this code or that code is how it remembers.
So to "pick up" a bit from the tape, move it somewhere else, and place it down, first you'd design a state machine to "move somewhere else" (whatever that means, let's say simply move 10 steps to the left), and then you'd duplicate it so you have twice as many states, one for "carrying a 1" and another for "carrying a 0".
When the read head sees the 1 or 0 on the tape (the source data), it branches to either the first state of the "carrying a 1" or the "carrying a 0" copy of the "move somewhere else" program, to move to the destination location on the tape. Each of those sub-programs would then have a final state that respectively writes a 1 or a 0 onto the tape, and then they can (but don't have to) merge back together into the same state (continuing the main "calling" program).
So the more bits you need to temporarily store in the imaginary Turing machine head that moves around (similar but not identical to a register), the more states you need (which can explode exponentially for complex programs)! So you could move two bits at a time by making four copies of the "move somewhere else" program for "carrying a 00 .. 11", and so on.
Instead of using a loop counter to loop four times, you could effectively just duplicate the states of your program four times, linking the end of one to the beginning of the next.
Making a conditional loop that loops a bounded number of times is a lot harder (and that's kind of what the game is about), requiring a whole lot more states, perhaps linked together in a switch structure like some sort of "Duff's device", or using some splice of tape for temporary storage.
It's easiest to simply represent numbers in unary (7 base 10 = 1111111 base 1) instead of binary, using 0 as a "stop bit", because memory efficiency isn't a concern (you have infinite tape, might as well use it!).
Manipulating binary numbers with Turing machines is a big deal since there is no direct support for it, so you have to implement it in "software". If you thought software floating point was slow, you haven't tried using a software binary library for Turing machines!
So to pick up and copy a unary number (as opposed to a single bit), bounded from 0 to n, you would have to make an initial state that looked at the first bit. If the first bit was 0 then it branches to the 0'th copy of the "move to somewhere else" program, which at the end branches to the "write the 0 stop bit" state. If the first bit was 1, then it would move right and test the next bit. If the second bit was 0, it would do the same as before, but the "move to somewhere else" program would be modified to account for having already moved one to the right, then it would write 1, move right, and merge back with the "write 0 stop bit" of the 0'th branch. And so on to n, so you have n modified copies of the "move to somewhere else" program, one for each possible value of the unary number, all merging back together into the final "write 0 stop bit" state...
Or you could encoded unary numbers as their value plus n consecutive 1's, then you could reserve patterns of fewer than n consecutive 1's as special control patterns or tags, which the "move somewhere else" functions could scan for, and not confuse them with unary numbers. Anything's possible, and you can use different encodings in different parts of the tape, just like normal computers. There is no one right way to program a Turing machine!
It's pretty tedious, and it would be easier to program with a macro assembler or vhdl-like tools (or FORTH, which was used for programming the CAM6 cellular automata rules). I wonder what high level Turing machine programming and debugging environments people have developed?
In the Philosophy of Computing course I took at UMD we used "PCOMP" TM simulator that ran on a PC, which was pretty good for its time but was like trying to construct a mnemonic memory circuit using stone knives and bearskins by today's standards.
The TM simulator in this Angelic Hierarchy game has an elegant interface, simple enough to make it an easily understandable and playable game, and it would be great to see how far it could go towards developing a full blown Turing machine visualization, experimentation, programming and debugging environment!
Thanks to hga's impeccable hoarding instincts, here is Marvin Minsky's enigmatic Universal Turing Machine written in TECO:
MSG: APL 1
DISTRIB: *BBOARD
EXPIRES: 03/17/81 23:08:54
MINSKY@MIT-MC 03/11/81 23:08:54 Re: too-short programs
APL is compact, I suppose. So is TECO. When I wrote the following
Universal Turing Machine, which works, I actually understood it.
i1Aul qq+^^0:iqm^[29iiq\356y0L1 00L1 11L2 A1L1
y0L1 0yR2 1AR2 AyR6 yyL3 00L0 1AL3 A1L4 yyL4 0yR5 11L7 A1L4
yyR5 0yL3 1AR5 A1R5 yyR6 0AL3 1AR6 A1R6 y0R7 0yR6 11R7 A0R2
^[j<sR^[;-d-2ciql-^^^[ci"ed^^^[cii^[ciuq'^[>
j<sL^[;-d-2ciql-^^^[ci"ed^^^[cii-2c^[ciuq'^[>jxblx1lx2lx3lx4lx5lx6lx7hk
iyyAyyAyy^[32<i0^[>ji110101110000010011011^[ 1uq<htmbqq=>
I do not advise attempting to understand this code, which is
almost as bad as that for the Universal Turing machine.
What if Turing was such a minimalist engineer instead of a great mathematician, that he decided not to include a "halt" instruction, because you could always just terminate a program with an infinite loop: Bang! One fewer instruction! Less is more, worse is better, RISC is more efficient, right? Just emulate halt in software! The Woz would be proud.
But as it turned out in this universe, halt was the one instruction you needed to express the halting problem, the foundation of modern computer science! The looping infinitely problem would be moot, because all programs would always loop infinitely!
His genius was to add that one little magical thing that otherwise could have been missing: the halt instruction.
Kind of like Hamilton, who fruitlessly tried to multiply triples, which his children tormented him relentlessly about, until he finally realized he was missing a magical dimension, invented quaternions, and vandalized a bridge, all in the same day.
https://en.wikipedia.org/wiki/History_of_quaternions
https://en.wikipedia.org/wiki/Broom_Bridge