The world's smallest self-replicating program

corwinbad · on Nov 29, 2014

phiX174 virus - one of the world's smallest self-replicating programs :-)

http://www.ncbi.nlm.nih.gov/nuccore/NC_001422.1

Enterobacteria phage phiX174 sensu lato, complete genome

5386 bp ss-DNA

upwardbound · on Nov 29, 2014

Hehe funny that you mention phiX174 in the context of programming, that virus is amazing... Part of the reason phiX174 is so small is that it is "compressed" by having overlapping genes; in one area, three genes overlap in the same place! This is possible because there are 6 valid reading frames (direction and start point) for reading DNA: {forward or backward} x {address % 3 == 0, 1, or 2}.

In college I actually got to take part in refactoring that virus' genome into a decompressed version with no gene overlaps. And it worked! The decompressed version is still a functioning phage, and since there are no longer gene overlaps, future genetic engineers will have a much easier time modifying the phage as they see fit.

http://www.sciencedirect.com/science/article/pii/S0042682212...

userbinator · on Nov 30, 2014

Overlapping instructions/data is something also commonly encountered in tiny demos (512b and below), e.g. http://meatfighter.com/puls/ and http://finalpatch.blogspot.ca/2014/06/dissecting-128-byte-ra...

I have no doubt that viruses of the computer kind also have made use of such techniques; and overlapping for obfuscation, not size-optimisation, is also a commonly seen trick in malware.

dnautics · on Nov 29, 2014

Hm, I used to work for Clyde Hutchison (and Ham Smith)... They told me that it didn't work when they made a naive decompression. Was there something they missed?

upwardbound · on Nov 29, 2014

Did their naive decompression make the genome longer? If so, it probably didn't fit in the capsid. In our version, we had to provide one of the genes (gene F) in a plasmid in the host cell, in order to reduce the length of the decompressed genome to fit inside the capsid. See this section of our paper:

[The naive] decompression added 909 nucleotides to the wild-type genome. We next addressed practical constraints arising from the length of DNA that can be physically packaged within a øX174 capsid without impacts to reproductive fitness. Previous work has shown that the length of a øX174 genome, when packaged in vitro, must be kept within a few percent of the 5386 nucleotide wild-type length in order to avoid any significant fitness decrease ( Aoyama and Hayashi, 1985). Similar results were shown in vivo ( Russell and Muller, 1984). To reduce the decompressed genome length we removed the first 916 nucleotides of gene F, encoding the coat protein ( Air et al., 1978). We chose gene F because a plasmid containing a restriction fragment encoding wild-type gene F was able to complement two conditional gene F mutations ( Avoort et al., 1983). Additionally, the gene F coding sequence is greater than the total of the combined increases needed to implement the øX174.1 genome design. The truncated gene F version of the decompressed genome was named øX174.1f. To complement øX174.1f when transformed into host cells we designed a medium copy vector expressing gene F under control of a rhamnose-inducible promoter ( Fig. S1).

dnautics · on Nov 30, 2014

ah, ok, had no idea that the capsid was that sensitive to genome length! Thanks! So it really did compress it for a useful purpose.

corwinbad · on Nov 30, 2014

Why funny? biology is coding. 1,0 or A,T,G,C - all the same :-)

Get your genome compiler here: http://genomecompiler.com

GuiA · on Nov 29, 2014

That's pretty amazing. I'm guessing DNA compactness is a survival advantage for viruses, and that's how this emerged? Is gene overlap common in other species?

upwardbound · on Nov 29, 2014

Gene overlap is extremely rare actually! When phiX174 was first discovered, some researchers wondered if such a complexly intertwined system could even have evolved naturally, or if it might suggest that the virus was hand-engineered: http://adsabs.harvard.edu/abs/1979Icar...38..148Y

wamatt · on Nov 30, 2014

I would think the opposite may be true, given that compressed information tends to be more sensitive to error, than uncompressed.

TazeTSchnitzel · on Nov 30, 2014

Redundancy helps resistance to error, yeah. This thing could be quite easily damaged in transmission.

TazeTSchnitzel · on Nov 29, 2014

The Makefile is the most interesting part: http://www.ioccc.org/1994/Makefile

  smr: smr.c
  	@${RM} -rf smr
  	${CP} smr.c smr
  	${CHMOD} +x smr

Apparently, an empty file marked as executable will execute!

TillE · on Nov 29, 2014

Because it's being interpreted as a shell script, I assume.

jhardy54 · on Nov 29, 2014

Without a hashbang it's actually interpreted as machine code, it just doesn't have any machine instructions.

Tuna-Fish · on Nov 29, 2014

No file on common unix platforms is interpreted directly as machine code. Machine code executables have headers which contain information on the environment needed, memory mappings, program entry points etc. Binary executables are not just immediately jumped into, they must be opened by a loader which will parse these headers and do the required work before continuing execution in the program itself.

A zero-length binary fed into either ld.so (a.out loader) or ld-linux.so (elf loader) will not produce a zero-length output, but a parsing error.

delroth · on Nov 29, 2014

No, without a shebang it is interpreted as a shell script.

bodyfour · on Nov 29, 2014

Correct. A file will only be considered a binary executable if it has a proper header (ELF, Mach-O, etc) The default if no header and no shebang is to treat it like a shell script.

In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?

makomk · on Nov 30, 2014

I think this may only work if the executable is run by the shell, at least on Linux. If I recall correctly using exec() to execute it directly won't work, and there's special fallback code in the shell itself to run executables with neither a header nor shebang as shell scripts. It's generally safest to include the shebang.

TheSoftwareGuy · on Nov 30, 2014

>In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?

why is this only on old systems? it seems very efficient.

rspeer · on Nov 30, 2014

What's more efficient is that most shells don't bother running /bin/true at all. "true" is just a shell builtin, so it doesn't need to spawn a process just to get an obvious return value out of it.

TazeTSchnitzel · on Nov 30, 2014

It's not as efficient as it seems, as that means the shell is invoked.

fit2rule · on Nov 30, 2014

Filesystems come and go. Not everyone has an executable bit.

TazeTSchnitzel · on Nov 29, 2014

Interesting that this works at all. Perhaps it's special-cased?

Alupis · on Nov 29, 2014

I'm not sure I understand how it's "self replicating"? An empty file, compiled or not, won't replicate itself. The Makefile to compile the empty c file is also rather large, but even it must be executed in a loop.

Can someone explain the basis of this contest and how this c file is "self-replicating"?

pdkl95 · on Nov 30, 2014

It's abuse of the contest rules, which encouraged at the IOCCC. You just have to be absolutely certain that it really IS "technically valid"[1] or it risks disqualification.

This one abuses the idea that they specify how the contest will be judged, which this "program" satisfies, as long as you don't look at the source.

    $ make smr
    cp smr.c smr
    chmod +x smr
    $ ./smr > smr.dup
    $ diff smr smr.dup
    $ echo $?
    0

Also, it is a valid C file!

    $ stat -c "%s" smr.c
    0
    $ gcc -Wall -c smr.c
    $ echo $?
    0
    $ stat -c "%s" smr.o
    927

The judge's comments for this one explain this in a bit more detail:

http://www.ioccc.org/1994/smr.hint

[1] the best kind of "valid" :)

fsniper · on Nov 29, 2014

If you also look into the Makefile accompanied, it's just copying itself to smr file and giving it executable rights. So smr.c when built outputs it's source code.

The makefile is not large. It's a makefile for all the entries for 1994 competition. And it's only 4 lines for smr.c

michael_fine · on Nov 29, 2014

I think what they were referring to was a quine

Alupis · on Nov 29, 2014

yes, but is using an external program (make and cp) really count? I wouldn't consider that really as "self-replicating".

fsniper · on Nov 29, 2014

every self replicating entry in the competition is compiled with other binaries.

maaku · on Nov 30, 2014

And most of biology (including you).

avmich · on Nov 29, 2014

There is PDP-11 command (one word, 16 bits) which makes a copy of itself into a previous word (with address, in bytes, less by two) and passes the control to the new copy. Octal 103737... or something like that.

Doesn't require make :) or cp and is pretty active - fills up all the memory space below the starting point in no time.

51Cards · on Nov 29, 2014

For those not understanding and wanting to see the "source" for this entry... it was a Zero byte file.

TazeTSchnitzel · on Nov 29, 2014

Download here: http://www.ioccc.org/1994/smr.c

JetSpiegel · on Nov 30, 2014

Just post the source code here:

HaseebR7 · on Nov 30, 2014

Done.

silverfox17 · on Nov 29, 2014

I hang out on /r/dailyprogrammer sometimes - they had a challenge about these types of programs the other day and called them "quines." Check through some of these if you're interested, a few are 2 characters long (not sure how many bytes that is): http://www.reddit.com/r/dailyprogrammer/comments/2n11w8/2014...

TheSoftwareGuy · on Nov 30, 2014

2 characters is usually 2 bytes, unless they are using some weird unicode format, in which case it shouldn't be longer than 8 i think.

sysk · on Nov 29, 2014

Is the copyright notice supposed to apply to the empty program?

mojuba · on Nov 29, 2014

Not sure about the copyright, but I wouldn't be surprised if someone patented it in the US.

IsTom · on Nov 29, 2014

Actually, there was a story about somebody selling 0-byte executable for DOS allowing you to resume programs because of some peculiarity.

HCIdivision17 · on Nov 29, 2014

I'll be darned if I can find the URL to the story, but it is a brilliant application of giving the customer what they wanted.

The short of it is that when you dropped back to the command line, early versions of DOS didn't clear the memory. That meant that the program really just went into stasis, but in practice it meant restarting; see, the program ran off the floppy then, so to save or load to another disk was tricky. Because memory would be executed from the same address, you could insert this disk with an empty executable that would get DOS to send execution to that address, which happens to be the program already in memory.

This had the huge boon of allowing you to drop to the command line, do something, and the resume working! Apparently they sold the disks for like $5 (or pounds) each, and the customers were extremely pleased. And the programmer was got a kick out of having rather impressive return per byte written :)

abecedarius · on Nov 29, 2014

Here you go: http://peetm.com/blog/?p=55

HCIdivision17 · on Nov 30, 2014

I wish I could edit my comment - it wasn't even MS-DOS! But, yep, that is the story. Hopefully I won't forget GO.COM in the future.

Thanks a mil for finding the link! I had a feeling I read the story from HN, and so it was:

https://news.ycombinator.com/item?id=5920732

mct · on Nov 30, 2014

I once wrote an implementation of the classic Redcode "Imp" program in x86 assembly. It does not satisfy the requirements of this contest, but it does act as the Imp program, perpetually copying itself through RAM. :-)

I just uploaded it to https://github.com/mct/assembly-toys/tree/master/imp

ismailmechbal · on Nov 29, 2014

Related to this article is "Ken Thompson" reflections on Trusting Trust http://cm.bell-labs.com/who/ken/trust.html

kremlin · on Nov 29, 2014

lol not quite in the spirit of the challenge.

pdkl95 · on Nov 29, 2014

Forcing the creation of a new contest rule is a tradition in the IOCCC.

kremlin · on Nov 29, 2014

That tradition goes back a long, long way. When Plato defined Man as "A featherless biped," someone showed him a plucked chicken.

A great tradition.

ismailmechbal · on Nov 29, 2014

Exactly, and that someone is the humble "Diogenes of Sinope"

pdkl95 · on Nov 30, 2014

Speaking of rule abuse, I love this recent entry, which uses the IOCCC's own tool to calculate the size... and abuses a bug in it! It's mainly for a joke, though there IS a huge program-size abuse in the custom BIOS the program uses.

http://www.ioccc.org/2013/cable3/hint.html

    *RULE 2 ABUSE DISCLAIMER*

    - cable3.c is 4043 bytes in length (half an 8086)
    - iocccsize -i < cable3.c returns 1977 (the year the
      4.77 MHz 8086 CPU was announced)
    - Therefore, any suspicions the judges may have regarding
      rule 2 non-compliance may be well-intentioned but
      are groundless.
    - Nonetheless, the author would like to apologise to the
      judges for the one-big-block-of-code nature of this entry,
      which turned out to be unavoidable. Hopefully the joys
      of this entry will make up for its shortcomings.

bluecalm · on Nov 29, 2014

Meh. Empty program doesn't replicate itself. It's similar to recent rule-nitpick discussion about selling for 0$. Selling for 0$ isn't selling and nothing isn't replicating itself.

Definition of replicate is: to repeat or copy (something) exactly

Empty program doesn't copy or repeat anything. Entry rejected.

EDIT: It's not reddit, downvotes doesn't go here for disagreeing. The problem is posed in natural language and it has its own rules. In natural language (at least in English) doing nothing isn't copying or replicating. You may check the dictionaries.

Next time someone will claim claim 1 million sales: million nothings for 0 dollars to nobodies.

EDIT (to other child)

Yes, empty set is subset of other sets but that's because the definition of subset. Copy or replicate in natural language (in which the problem is posed) mean something entirely different.

mynameisvlad · on Nov 29, 2014

I disagree. You are copying something exactly. Just because that something is empty does not mean it's not being copied. Using cp, by definition of the command, is copying something.

jamesrom · on Nov 29, 2014

The empty set is a subset of the empty set.

eximius · on Nov 29, 2014

Don't forget that the empty set is also the subset of the empty set, making them equal. ;)

FireRabbit · on Nov 30, 2014

In redcode

    mov 0,1

is the smallest self-replicating program

NaNaN · on Nov 30, 2014

Really a good joke for “Every TEXT is a quine.”

Varcht · on Nov 29, 2014

Thanks, I was pretty messed up just now after watching the movie Pi, now I'm even worse....

mjcohen · on Nov 29, 2014

In "Pi", the math is from the Dover book by Jahnke and Emde and the formula for the ratio of the sides of the rectangle is wrong.

disputin · on Nov 29, 2014

Isn't this a quine, rather than a self-replicating program? A quine produces the source, and a self-replicating program should produce an executable - no? This seems to be a fallacy of equivocation.

palunon · on Nov 30, 2014

If you run it, it will replicate itself...

  $ ./smr >smr.out
  $ diff smr smr.out

TazeTSchnitzel · on Nov 29, 2014

quines are a subset of self-replicating programs.

chaosfox · on Nov 29, 2014

  smr: smr.c
  	@${RM} -rf smr
  	${CP} smr.c smr
  	${CHMOD} +x smr

this is bullshit if you ask me..

simarpreet007 · on Nov 29, 2014

Agreed, WTF.