Hacker News new | past | comments | ask | show | jobs | submit login
The world's smallest self-replicating program (ioccc.org)
79 points by ismailmechbal on Nov 29, 2014 | hide | past | favorite | 63 comments



phiX174 virus - one of the world's smallest self-replicating programs :-)

http://www.ncbi.nlm.nih.gov/nuccore/NC_001422.1

Enterobacteria phage phiX174 sensu lato, complete genome

5386 bp ss-DNA


Hehe funny that you mention phiX174 in the context of programming, that virus is amazing... Part of the reason phiX174 is so small is that it is "compressed" by having overlapping genes; in one area, three genes overlap in the same place! This is possible because there are 6 valid reading frames (direction and start point) for reading DNA: {forward or backward} x {address % 3 == 0, 1, or 2}.

In college I actually got to take part in refactoring that virus' genome into a decompressed version with no gene overlaps. And it worked! The decompressed version is still a functioning phage, and since there are no longer gene overlaps, future genetic engineers will have a much easier time modifying the phage as they see fit.

http://www.sciencedirect.com/science/article/pii/S0042682212...


Overlapping instructions/data is something also commonly encountered in tiny demos (512b and below), e.g. http://meatfighter.com/puls/ and http://finalpatch.blogspot.ca/2014/06/dissecting-128-byte-ra...

I have no doubt that viruses of the computer kind also have made use of such techniques; and overlapping for obfuscation, not size-optimisation, is also a commonly seen trick in malware.


Hm, I used to work for Clyde Hutchison (and Ham Smith)... They told me that it didn't work when they made a naive decompression. Was there something they missed?


Did their naive decompression make the genome longer? If so, it probably didn't fit in the capsid. In our version, we had to provide one of the genes (gene F) in a plasmid in the host cell, in order to reduce the length of the decompressed genome to fit inside the capsid. See this section of our paper:

[The naive] decompression added 909 nucleotides to the wild-type genome. We next addressed practical constraints arising from the length of DNA that can be physically packaged within a øX174 capsid without impacts to reproductive fitness. Previous work has shown that the length of a øX174 genome, when packaged in vitro, must be kept within a few percent of the 5386 nucleotide wild-type length in order to avoid any significant fitness decrease ( Aoyama and Hayashi, 1985). Similar results were shown in vivo ( Russell and Muller, 1984). To reduce the decompressed genome length we removed the first 916 nucleotides of gene F, encoding the coat protein ( Air et al., 1978). We chose gene F because a plasmid containing a restriction fragment encoding wild-type gene F was able to complement two conditional gene F mutations ( Avoort et al., 1983). Additionally, the gene F coding sequence is greater than the total of the combined increases needed to implement the øX174.1 genome design. The truncated gene F version of the decompressed genome was named øX174.1f. To complement øX174.1f when transformed into host cells we designed a medium copy vector expressing gene F under control of a rhamnose-inducible promoter ( Fig. S1).


ah, ok, had no idea that the capsid was that sensitive to genome length! Thanks! So it really did compress it for a useful purpose.


Why funny? biology is coding. 1,0 or A,T,G,C - all the same :-)

Get your genome compiler here: http://genomecompiler.com


That's pretty amazing. I'm guessing DNA compactness is a survival advantage for viruses, and that's how this emerged? Is gene overlap common in other species?


Gene overlap is extremely rare actually! When phiX174 was first discovered, some researchers wondered if such a complexly intertwined system could even have evolved naturally, or if it might suggest that the virus was hand-engineered: http://adsabs.harvard.edu/abs/1979Icar...38..148Y


I would think the opposite may be true, given that compressed information tends to be more sensitive to error, than uncompressed.


Redundancy helps resistance to error, yeah. This thing could be quite easily damaged in transmission.


The Makefile is the most interesting part: http://www.ioccc.org/1994/Makefile

  smr: smr.c
  	@${RM} -rf smr
  	${CP} smr.c smr
  	${CHMOD} +x smr
Apparently, an empty file marked as executable will execute!


Because it's being interpreted as a shell script, I assume.


Without a hashbang it's actually interpreted as machine code, it just doesn't have any machine instructions.


No file on common unix platforms is interpreted directly as machine code. Machine code executables have headers which contain information on the environment needed, memory mappings, program entry points etc. Binary executables are not just immediately jumped into, they must be opened by a loader which will parse these headers and do the required work before continuing execution in the program itself.

A zero-length binary fed into either ld.so (a.out loader) or ld-linux.so (elf loader) will not produce a zero-length output, but a parsing error.


No, without a shebang it is interpreted as a shell script.


Correct. A file will only be considered a binary executable if it has a proper header (ELF, Mach-O, etc) The default if no header and no shebang is to treat it like a shell script.

In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?


I think this may only work if the executable is run by the shell, at least on Linux. If I recall correctly using exec() to execute it directly won't work, and there's special fallback code in the shell itself to run executables with neither a header nor shebang as shell scripts. It's generally safest to include the shebang.


>In fact, an empty executable file has been used to implement /bin/true on some old systems. Why take the added cost of a disk seek when just the information in the inode will do?

why is this only on old systems? it seems very efficient.


What's more efficient is that most shells don't bother running /bin/true at all. "true" is just a shell builtin, so it doesn't need to spawn a process just to get an obvious return value out of it.


It's not as efficient as it seems, as that means the shell is invoked.


Filesystems come and go. Not everyone has an executable bit.


Interesting that this works at all. Perhaps it's special-cased?


I'm not sure I understand how it's "self replicating"? An empty file, compiled or not, won't replicate itself. The Makefile to compile the empty c file is also rather large, but even it must be executed in a loop.

Can someone explain the basis of this contest and how this c file is "self-replicating"?


It's abuse of the contest rules, which encouraged at the IOCCC. You just have to be absolutely certain that it really IS "technically valid"[1] or it risks disqualification.

This one abuses the idea that they specify how the contest will be judged, which this "program" satisfies, as long as you don't look at the source.

    $ make smr
    cp smr.c smr
    chmod +x smr
    $ ./smr > smr.dup
    $ diff smr smr.dup
    $ echo $?
    0
Also, it is a valid C file!

    $ stat -c "%s" smr.c
    0
    $ gcc -Wall -c smr.c
    $ echo $?
    0
    $ stat -c "%s" smr.o
    927
The judge's comments for this one explain this in a bit more detail:

http://www.ioccc.org/1994/smr.hint

[1] the best kind of "valid" :)


If you also look into the Makefile accompanied, it's just copying itself to smr file and giving it executable rights. So smr.c when built outputs it's source code.

The makefile is not large. It's a makefile for all the entries for 1994 competition. And it's only 4 lines for smr.c


I think what they were referring to was a quine


yes, but is using an external program (make and cp) really count? I wouldn't consider that really as "self-replicating".


every self replicating entry in the competition is compiled with other binaries.


And most of biology (including you).


There is PDP-11 command (one word, 16 bits) which makes a copy of itself into a previous word (with address, in bytes, less by two) and passes the control to the new copy. Octal 103737... or something like that.

Doesn't require make :) or cp and is pretty active - fills up all the memory space below the starting point in no time.


For those not understanding and wanting to see the "source" for this entry... it was a Zero byte file.



Just post the source code here:


Done.


I hang out on /r/dailyprogrammer sometimes - they had a challenge about these types of programs the other day and called them "quines." Check through some of these if you're interested, a few are 2 characters long (not sure how many bytes that is): http://www.reddit.com/r/dailyprogrammer/comments/2n11w8/2014...


2 characters is usually 2 bytes, unless they are using some weird unicode format, in which case it shouldn't be longer than 8 i think.


Is the copyright notice supposed to apply to the empty program?


Not sure about the copyright, but I wouldn't be surprised if someone patented it in the US.


Actually, there was a story about somebody selling 0-byte executable for DOS allowing you to resume programs because of some peculiarity.


I'll be darned if I can find the URL to the story, but it is a brilliant application of giving the customer what they wanted.

The short of it is that when you dropped back to the command line, early versions of DOS didn't clear the memory. That meant that the program really just went into stasis, but in practice it meant restarting; see, the program ran off the floppy then, so to save or load to another disk was tricky. Because memory would be executed from the same address, you could insert this disk with an empty executable that would get DOS to send execution to that address, which happens to be the program already in memory.

This had the huge boon of allowing you to drop to the command line, do something, and the resume working! Apparently they sold the disks for like $5 (or pounds) each, and the customers were extremely pleased. And the programmer was got a kick out of having rather impressive return per byte written :)



I wish I could edit my comment - it wasn't even MS-DOS! But, yep, that is the story. Hopefully I won't forget GO.COM in the future.

Thanks a mil for finding the link! I had a feeling I read the story from HN, and so it was:

https://news.ycombinator.com/item?id=5920732


I once wrote an implementation of the classic Redcode "Imp" program in x86 assembly. It does not satisfy the requirements of this contest, but it does act as the Imp program, perpetually copying itself through RAM. :-)

I just uploaded it to https://github.com/mct/assembly-toys/tree/master/imp


Related to this article is "Ken Thompson" reflections on Trusting Trust http://cm.bell-labs.com/who/ken/trust.html


lol not quite in the spirit of the challenge.


Forcing the creation of a new contest rule is a tradition in the IOCCC.


That tradition goes back a long, long way. When Plato defined Man as "A featherless biped," someone showed him a plucked chicken.

A great tradition.


Exactly, and that someone is the humble "Diogenes of Sinope"


Speaking of rule abuse, I love this recent entry, which uses the IOCCC's own tool to calculate the size... and abuses a bug in it! It's mainly for a joke, though there IS a huge program-size abuse in the custom BIOS the program uses.

http://www.ioccc.org/2013/cable3/hint.html

    *RULE 2 ABUSE DISCLAIMER*

    - cable3.c is 4043 bytes in length (half an 8086)
    - iocccsize -i < cable3.c returns 1977 (the year the
      4.77 MHz 8086 CPU was announced)
    - Therefore, any suspicions the judges may have regarding
      rule 2 non-compliance may be well-intentioned but
      are groundless.
    - Nonetheless, the author would like to apologise to the
      judges for the one-big-block-of-code nature of this entry,
      which turned out to be unavoidable. Hopefully the joys
      of this entry will make up for its shortcomings.


Meh. Empty program doesn't replicate itself. It's similar to recent rule-nitpick discussion about selling for 0$. Selling for 0$ isn't selling and nothing isn't replicating itself.

Definition of replicate is: to repeat or copy (something) exactly

Empty program doesn't copy or repeat anything. Entry rejected.

EDIT: It's not reddit, downvotes doesn't go here for disagreeing. The problem is posed in natural language and it has its own rules. In natural language (at least in English) doing nothing isn't copying or replicating. You may check the dictionaries.

Next time someone will claim claim 1 million sales: million nothings for 0 dollars to nobodies.

EDIT (to other child)

Yes, empty set is subset of other sets but that's because the definition of subset. Copy or replicate in natural language (in which the problem is posed) mean something entirely different.


I disagree. You are copying something exactly. Just because that something is empty does not mean it's not being copied. Using cp, by definition of the command, is copying something.


The empty set is a subset of the empty set.


Don't forget that the empty set is also the subset of the empty set, making them equal. ;)


In redcode

    mov 0,1
is the smallest self-replicating program


Really a good joke for “Every TEXT is a quine.”


Thanks, I was pretty messed up just now after watching the movie Pi, now I'm even worse....


In "Pi", the math is from the Dover book by Jahnke and Emde and the formula for the ratio of the sides of the rectangle is wrong.


Isn't this a quine, rather than a self-replicating program? A quine produces the source, and a self-replicating program should produce an executable - no? This seems to be a fallacy of equivocation.


If you run it, it will replicate itself...

  $ ./smr >smr.out
  $ diff smr smr.out


quines are a subset of self-replicating programs.


  smr: smr.c
  	@${RM} -rf smr
  	${CP} smr.c smr
  	${CHMOD} +x smr
this is bullshit if you ask me..


Agreed, WTF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: