Hehe funny that you mention phiX174 in the context of programming, that virus is amazing... Part of the reason phiX174 is so small is that it is "compressed" by having overlapping genes; in one area, three genes overlap in the same place! This is possible because there are 6 valid reading frames (direction and start point) for reading DNA: {forward or backward} x {address % 3 == 0, 1, or 2}.
In college I actually got to take part in refactoring that virus' genome into a decompressed version with no gene overlaps. And it worked! The decompressed version is still a functioning phage, and since there are no longer gene overlaps, future genetic engineers will have a much easier time modifying the phage as they see fit.
I have no doubt that viruses of the computer kind also have made use of such techniques; and overlapping for obfuscation, not size-optimisation, is also a commonly seen trick in malware.
Hm, I used to work for Clyde Hutchison (and Ham Smith)... They told me that it didn't work when they made a naive decompression. Was there something they missed?
Did their naive decompression make the genome longer? If so, it probably didn't fit in the capsid. In our version, we had to provide one of the genes (gene F) in a plasmid in the host cell, in order to reduce the length of the decompressed genome to fit inside the capsid. See this section of our paper:
[The naive] decompression added 909 nucleotides to the wild-type genome. We next addressed practical constraints arising from the length of DNA that can be physically packaged within a øX174 capsid without impacts to reproductive fitness. Previous work has shown that the length of a øX174 genome, when packaged in vitro, must be kept within a few percent of the 5386 nucleotide wild-type length in order to avoid any significant fitness decrease ( Aoyama and Hayashi, 1985). Similar results were shown in vivo ( Russell and Muller, 1984). To reduce the decompressed genome length we removed the first 916 nucleotides of gene F, encoding the coat protein ( Air et al., 1978). We chose gene F because a plasmid containing a restriction fragment encoding wild-type gene F was able to complement two conditional gene F mutations ( Avoort et al., 1983). Additionally, the gene F coding sequence is greater than the total of the combined increases needed to implement the øX174.1 genome design. The truncated gene F version of the decompressed genome was named øX174.1f. To complement øX174.1f when transformed into host cells we designed a medium copy vector expressing gene F under control of a rhamnose-inducible promoter ( Fig. S1).
That's pretty amazing. I'm guessing DNA compactness is a survival advantage for viruses, and that's how this emerged? Is gene overlap common in other species?
Gene overlap is extremely rare actually! When phiX174 was first discovered, some researchers wondered if such a complexly intertwined system could even have evolved naturally, or if it might suggest that the virus was hand-engineered: http://adsabs.harvard.edu/abs/1979Icar...38..148Y
No file on common unix platforms is interpreted directly as machine code. Machine code executables have headers which contain information on the environment needed, memory mappings, program entry points etc. Binary executables are not just immediately jumped into, they must be opened by a loader which will parse these headers and do the required work before continuing execution in the program itself.
A zero-length binary fed into either ld.so (a.out loader) or ld-linux.so (elf loader) will not produce a zero-length output, but a parsing error.
Correct. A file will only be considered a binary executable if it has a proper header (ELF, Mach-O, etc) The default if no header and no shebang is to treat it like a shell script.
In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?
I think this may only work if the executable is run by the shell, at least on Linux. If I recall correctly using exec() to execute it directly won't work, and there's special fallback code in the shell itself to run executables with neither a header nor shebang as shell scripts. It's generally safest to include the shebang.
>In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?
why is this only on old systems? it seems very efficient.
What's more efficient is that most shells don't bother running /bin/true at all. "true" is just a shell builtin, so it doesn't need to spawn a process just to get an obvious return value out of it.
I'm not sure I understand how it's "self replicating"? An empty file, compiled or not, won't replicate itself. The Makefile to compile the empty c file is also rather large, but even it must be executed in a loop.
Can someone explain the basis of this contest and how this c file is "self-replicating"?
It's abuse of the contest rules, which encouraged at the IOCCC. You just have to be absolutely certain that it really IS "technically valid"[1] or it risks disqualification.
This one abuses the idea that they specify how the contest will be judged, which this "program" satisfies, as long as you don't look at the source.
If you also look into the Makefile accompanied, it's just copying itself to smr file and giving it executable rights. So smr.c when built outputs it's source code.
The makefile is not large. It's a makefile for all the entries for 1994 competition. And it's only 4 lines for smr.c
There is PDP-11 command (one word, 16 bits) which makes a copy of itself into a previous word (with address, in bytes, less by two) and passes the control to the new copy. Octal 103737... or something like that.
Doesn't require make :) or cp and is pretty active - fills up all the memory space below the starting point in no time.
I hang out on /r/dailyprogrammer sometimes - they had a challenge about these types of programs the other day and called them "quines." Check through some of these if you're interested, a few are 2 characters long (not sure how many bytes that is): http://www.reddit.com/r/dailyprogrammer/comments/2n11w8/2014...
I'll be darned if I can find the URL to the story, but it is a brilliant application of giving the customer what they wanted.
The short of it is that when you dropped back to the command line, early versions of DOS didn't clear the memory. That meant that the program really just went into stasis, but in practice it meant restarting; see, the program ran off the floppy then, so to save or load to another disk was tricky. Because memory would be executed from the same address, you could insert this disk with an empty executable that would get DOS to send execution to that address, which happens to be the program already in memory.
This had the huge boon of allowing you to drop to the command line, do something, and the resume working! Apparently they sold the disks for like $5 (or pounds) each, and the customers were extremely pleased. And the programmer was got a kick out of having rather impressive return per byte written :)
I once wrote an implementation of the classic Redcode "Imp" program in x86 assembly. It does not satisfy the requirements of this contest, but it does act as the Imp program, perpetually copying itself through RAM. :-)
Speaking of rule abuse, I love this recent entry, which uses the IOCCC's own tool to calculate the size... and abuses a bug in it! It's mainly for a joke, though there IS a huge program-size abuse in the custom BIOS the program uses.
*RULE 2 ABUSE DISCLAIMER*
- cable3.c is 4043 bytes in length (half an 8086)
- iocccsize -i < cable3.c returns 1977 (the year the
4.77 MHz 8086 CPU was announced)
- Therefore, any suspicions the judges may have regarding
rule 2 non-compliance may be well-intentioned but
are groundless.
- Nonetheless, the author would like to apologise to the
judges for the one-big-block-of-code nature of this entry,
which turned out to be unavoidable. Hopefully the joys
of this entry will make up for its shortcomings.
Meh. Empty program doesn't replicate itself. It's similar to recent rule-nitpick discussion about selling for 0$. Selling for 0$ isn't selling and nothing isn't replicating itself.
Definition of replicate is:
to repeat or copy (something) exactly
Empty program doesn't copy or repeat anything.
Entry rejected.
EDIT:
It's not reddit, downvotes doesn't go here for disagreeing.
The problem is posed in natural language and it has its own rules. In natural language (at least in English) doing nothing isn't copying or replicating. You may check the dictionaries.
Next time someone will claim claim 1 million sales: million nothings for 0 dollars to nobodies.
EDIT (to other child)
Yes, empty set is subset of other sets but that's because the definition of subset. Copy or replicate in natural language (in which the problem is posed) mean something entirely different.
I disagree. You are copying something exactly. Just because that something is empty does not mean it's not being copied. Using cp, by definition of the command, is copying something.
Isn't this a quine, rather than a self-replicating program? A quine produces the source, and a self-replicating program should produce an executable - no? This seems to be a fallacy of equivocation.
http://www.ncbi.nlm.nih.gov/nuccore/NC_001422.1
Enterobacteria phage phiX174 sensu lato, complete genome
5386 bp ss-DNA