Hacker News new | past | comments | ask | show | jobs | submit login
Xmas.c (1988) (udel.edu)
559 points by evah 11 months ago | hide | past | favorite | 95 comments



The equivalent from the TeX world is `xii.tex`:

    \let~\catcode~`76~`A13~`F1~`j00~`P2jdefA71F~`7113jdefPALLF
    PA''FwPA;;FPAZZFLaLPA//71F71iPAHHFLPAzzFenPASSFthP;A$$FevP
    A@@FfPARR717273F737271P;ADDFRgniPAWW71FPATTFvePA**FstRsamP
    AGGFRruoPAqq71.72.F717271PAYY7172F727171PA??Fi*LmPA&&71jfi
    Fjfi71PAVVFjbigskipRPWGAUU71727374 75,76Fjpar71727375Djifx
    :76jelse&U76jfiPLAKK7172F71l7271PAXX71FVLnOSeL71SLRyadR@oL
    RrhC?yLRurtKFeLPFovPgaTLtReRomL;PABB71 72,73:Fjif.73.jelse
    B73:jfiXF71PU71 72,73:PWs;AMM71F71diPAJJFRdriPAQQFRsreLPAI
    I71Fo71dPA!!FRgiePBt'el@ lTLqdrYmu.Q.,Ke;vz vzLqpip.Q.,tz;
    ;Lql.IrsZ.eap,qn.i. i.eLlMaesLdRcna,;!;h htLqm.MRasZ.ilk,%
    s$;z zLqs'.ansZ.Ymi,/sx ;LYegseZRyal,@i;@ TLRlogdLrDsW,@;G
    LcYlaDLbJsW,SWXJW ree @rzchLhzsW,;WERcesInW qt.'oL.Rtrul;e
    doTsW,Wk;Rri@stW aHAHHFndZPpqar.tridgeLinZpe.LtYer.W,:jbye
Put that in a .tex file and run `pdftex` on it, then look at the resulting PDF file, which will look like this: https://shreevatsa.net/post/xii/


Yeah but that's not obfuscated, it just looks like normal TeX to me.


There's a reason there's no IOCCC for Perl or TeX. :-)

BTW there's a 2017 followup called xii-lat.tex: https://github.com/davidcarlisle/dpctex/tree/main/xii-lat (https://mirrors.ctan.org/macros/plain/contrib/xii-lat/xii-la...) — having been done nearly 20 years later, it has even more tricks, so is more "fun" to unpack.


It's like a form of especially logical compression


I grabbed this when it was originally published, but somehow the file name I have is different from the one in this article. Mine is called "carol.c" and I just compiled and ran it on a modern system. The compiler spat out the following warnings:

gcc -o carol carol.c

carol.c:2:1: warning: return type defaults to ‘int’ [-Wimplicit-int]

    2 | main(t,_,a )

      | ^~~~
carol.c: In function ‘main’:

carol.c:2:1: warning: type of ‘t’ defaults to ‘int’ [-Wimplicit-int]

carol.c:2:1: warning: type of ‘_’ defaults to ‘int’ [-Wimplicit-int]


GCC 14 won't permit implicit int any more ... https://fedoraproject.org/wiki/Changes/PortingToModernC#Remo...


I guess I got lucky then. This Ubuntu 23.10 install has gcc 13.2.0-4ubuntu3 installed from the repo. Given that the code is pre-ANSI C, and that it won in the category of "Least likely to compile successfully", it's surprising that there are no other issues.

I wouldn't normally run such a radioactive distro, but this laptop is so new that there is no LTS distro that works on it. (AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics)


Just compile with -std=c89


Good. It's almost as bad as not returning a value from a function being a warning and not an error.


The issue is calling xmas() in main before it's defined. Compiling with GCC on macOS gives the error:

  xmas.c:16:5: error: call to undeclared function 'xmas'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    xmas(2, 2, "");
Moving main() to the bottom compiles and executes with the proper output.


Real GCC or Apple’s clang symlinked to gcc?


Surprisingly few warnings, and all on the same line!


This makes me think about kolmogorov complexity. The program here looks like gibberish but produces the desired output, would there be even shorter programs that don't look like they make sense but produce the same output? How would you search for these programs?


The current world record for the shortest C program that prints 12 Days of Christmas lyrics is 431 bytes: https://code.golf/12-days-of-christmas#c


Well that post just cost me a couple of hours.

And still 136 bytes off 431:

  #define p printf 
  int main(){const char *g[]={"A Partridge in a Pear 
  Tree.\n","Two Turtle Doves, and","Three French Hens,","Four 
  Calling Birds,","Five Gold Rings,"," Geese-a-Lay"," Swans- 
  a-Swimm","t Maids-a-Milk","e Ladies Danc"," Lords-a-Leap"," 
  Pipers Pip","Twelve Drummers Drumm"};
  const char *d[{"First","Second","Third","Four","Fif","Six","Seven","Eigh","Nin","Ten","Eleven","Twelf"};for(int i=0,j;i<12;i++){p("On 
  the %s%s day of Christmas\nMy true love sent to 
  me\n",d[i],i>2?("th"):"");
  for(j=i;j>=0;j--){p("%s%s%s\n",j>4&&j<11? 
  d[j]:"",g[j],j>4?"ing,":"");}}}


You can shave 25 more by using printf as is, removing "const", removing spaces between char and *, and removing parenthesis around "th".


Are those consts really necessary? There are also a few character missing after char *d.


> Are those consts really necessary?

Technically yes, initializing a plain char * from a const char * (from array decay) is a constraint violation (an error the standard requires the compiler to detect and complain about) even in C89. Many compilers will let you off with a warning though.


which characters are missing?


"]=" after "*d["


It's not necessary in C.


   const char *d[{"First","Second",...,"Twelf"};
is not a valid C unconditionally.


You should tell my compiler.


To be honest, I understand code.golf's premise, but still, code being public (maybe optionally?) would be nice.


I agree. Stack Exchange's Code Golf has public source, but the best there is 644 bytes: https://codegolf.stackexchange.com/a/4198


Is it broken for anyone else the code prints hello world and then some numbers


The leaderboard is public but the actual code is private, that hello world is just an example c program.


> would there be even shorter programs that don't look like they make sense but produce the same output

Yes, mostly likely

> How would you search for these programs?

Brute force search is very inefficient so the real answer generally is "be clever" in the mathematical sense.

In general, KC is not computable so there is no program that can take a string and return the shortest program that computes that string.

However it's always in principle possible for someone to prove that the KC of some particular string is X.


> However it's always in principle possible for someone to prove that the KC of some particular string is X.

Only for a bounded number of strings. I.e. there is a finite set of string X such that for strings outside of X, you can not prove their KC, even in principle.

This result by Chaitin [1] can be paraphrased as: you cannot prove a 2 kilo theorem with a 1 kilo theory.

[1] https://www.jucs.org/jucs_2_5/the_limits_of_mathematics/Chai...


I think it's honestly quite hard to know, as it's really (generally speaking, AFAIPK) impossible to directly compute the KC in most cases, only really from the feasibility standpoint we can check that it's slower than some other version/value.

Which is quite exciting, it sets us up nicely for long-running competitions and the like due to the logarithmic-like growth curve (with sometimes some very fun discoveries on the ultra-tail-end of things! <3 :D)

Am currently running a mini-competition with a current prize bounty of $100 (distributed proportionally by % contribution in logspace) for an LLM that can memorize the most digits of Pi by March of next year. Pi is nice as it actually is quite compressible in theory and seeing if a model is able to learn a sufficiently-close-to-the-MDL set of weights that recovers a highly-compressed algorithm from the data would be extremely nice, indeedy!

However, whether this is feasible or not with off-the-shelf models and such is not entirely easy to know, so for now, it's just a digits-memorizing competition, and we shall see where it goes from there!!! <3 :'))))


Great explanation! And looks like the IOCCC is still going strong in 2023: https://www.ioccc.org/years.html


That page shows the last IOCCC was 2020. But the home page has an update from May 2023: "We do plan to hold a 28th IOCCC." Much like Nethack releases, some things are worth waiting for.


Hopefully it is more like Nethack and less like Kingkiller Chronicles.


But thanks to the long wait for Nethack we got great Slashem releases...


Fun thing I learned recently about the 12 days of Christmas: every gift is a type of bird. Leaping ladies, lords, all of them.


I'm not so sure about this.

Wikipedia says "The earliest known publications of the words to The Twelve Days of Christmas were an illustrated children's book, Mirth Without Mischief, published in London in 1780" https://en.wikipedia.org/wiki/The_Twelve_Days_of_Christmas_(...

This site https://www.birdspot.co.uk/culture/the-birds-of-the-twelve-d... tries hard to link them all to birds but it's a far stretch, considering it falls apart at "Five Gold Rings". "However, Mirth and Mischief includes an illustration that clearly depicts the rings as jewelry." Archive.org even has a scan of the book: https://archive.org/details/mirth_without_mischief/page/n7/m...


I learned that from The Office :D


My own investigation, from 20+ years ago:

http://michaeldnahas.com/xmassong/index.html


Still works on trunk (if you turn off the warning :))

https://compiler-explorer.com/z/hGvs1e9jo


This resurfaced a good memory of my last two semesters of university (2022), professor showed us this code snippet right at the start of one lecture.


"I remember my last two semesters of university. All the way back to last year!" It's kind of fuzzy being such a long time ago (or maybe all of the beer and other consumables), but if I recall correctly (from last year),...."

I can't tell if you were being serious or just a damn good understated bit of comedy


Our professor in college got us this in the printed study materials about the C language, so I remember that I typed it in by hand once.


There is a similar task in Rosetta Code: a program to produce the "Old Lady Swallowed a Fly" iteratively growing song.

https://rosettacode.org/wiki/Old_lady_swallowed_a_fly


I like how Tcl is just the lyrics, compressed:

> puts [zlib inflate [binary decode base64 "7VRNa8MwDL3nV2(...)"]]

https://rosettacode.org/wiki/Old_lady_swallowed_a_fly#Tcl

Python, Nim, Julia and likely others have similar versions too.


IanP was a colleague of mine a lifetime ago. Thanks for reminding me.


Wow! A University of Delaware link on Hacker News? Go Hens!


My brain hurts.


One of my favorite things about the IOCCC is that Larry Wall won it twice and then went on to design Perl, which explains a lot.



I'd love to see what obfuscated Perl looks like!


I'm still hoping to see unobfuscated Perl.


- Any module from CPAN

- Pixpang (or Pangzero, can't remember which one used Perl and SDL)

- Frozen Bubble?


I think you missed the joke. Or maybe I did.


(Insert “The Office” meme with Pam saying “they are the same picture”.)


That time we managed to sneak some (semi-)obfuscated Perl into the very serious Red Hat Enterprise Linux 6 customer documentation:

https://access.redhat.com/documentation/en-us/red_hat_enterp...


"this simple Perl script"



That’s amazing!


That’s a very old version of C. The function signature of main uses the old K&R style for function parameters.

I doubt it would compile currently!


> I doubt it would compile currently!

It compiles (and runs!) fine with just "gcc a.c"; nothing more needed. clang does throw some fatal errors, but there's probably some combination of flags to make it work; I couldn't be bothered to look further.

I've compiled plenty of code from the 80s; I can't recall a single example where it didn't work that wasn't due to OS-specific stuff. Tons of warnings and maybe some special-flags? Sure. But it works.

There's even some modern (well, "modern") code around today that uses K&R style function parameters.


TIL that on macOS, both `gcc` and `clang` are present, but `/usr/bin/gcc` is in fact just a hard link to `/usr/bin/clang`. So I couldn't compile it with gcc (which is clang) on the Mac, but it compiles just fine with gcc on Linux.

It's really astonishing that we can compile 35 years old code with our current tooling. I somehow doubt that we'll be able to compile 35 years old Java/Go/Rust/... code without changes in the futue.


Java is in that age ballpark and code with a similar level of dependencies as this of that vintage works just fine. Java has a solid backward compatibility story. The Java standard runtime also comes with much more than the C library so I’d say it’s better.


That's surprising! But I kind of get why they do that, as so many tools hard-code "gcc" rather than using "cc" or "$CC".

I'm not all that familiar with Java or Rust, but Go is very compatible; it's hard to predict the future, but it's already a third on its way to "35 years of compatibility".

JavaScript is pretty compatible as well; crummy JS from the 90s typically still works today (arguably it's compatible to a fault, with things like arr[-1] not being fixed).


C was designed to be portable.


I'm sure a few compiler maintainers take pride in making sure certain 'famous' code snippets still compile as time goes on.


IOCCC is "test cases for free", so if C compilers haven't added these code to their testing suite they really should.


I tried -std=c89 with clang; no luck. If anyone gets further do let us know.


This is K&R C, which predates the 89 standard.


> It compiles

That's really unfortunate! The category this entry competed in is "Least likely to compile successfully"


What modern code are you referencing?


Most if not all of bash.

zlib changed it just a few months ago (1.3 release from August): https://github.com/madler/zlib/issues/633

I've also seen it in some FreeBSD code, but that was quite a few years ago and I don't know to what degree that's been cleaned up – not so easy to grep for, but a quick check in git log shows e.g. https://github.com/freebsd/freebsd-src/commit/e5d0d1c5fbbc and some other things, so it seems there's still some K&R around.

Probably others as well, this is what I happen to know from the top of my head.


It literally has 1988 in the title.


One self evident statement does not invalidate the other :-)


It does compile, even with `-std=c17` under GCC. With a few warnings about implicit `int`s, of course. Even `-Wall` only adds a single warning (about !0<t not meaning what you could think it means.)


That's very presumptuous of GCC to assume that it thinks it knows that what I think !0<t means is wrong.

I wonder if NaN is larger then !0


I wrote "could think", calm down.


There’s a lot of code in even the modern forks of BSD (Free, Open, etc) in that style. Still compiles fine!


>The program is smaller than even the 'compressed' form of its output,

>and thus represents a new departure in text compression standards.

7z and gzip disagree with this statement.


It's only "compressed" if it was made by compress(1) in France. Otherwise it's just sparkling entropy coding.

For reference:

    1490 xmas-with-leading-comment.c
     913 xmas-without-leading-comment.c
    2357 xmas.out
    1297 xmas.out.9.Z
    1038 xmas.out.10.Z # actually better than with more bits!
    1048 xmas.out.11.Z # compression with 11..16 bits have the same size
     319 xmas.out.1.gz # compression levels 1..2 has same size
     317 xmas.out.3.gz
     307 xmas.out.4.gz # compression levels 4..9 have same size
Note that despite looking hard I haven't found a version of `compress` that supports `-H`, which is referenced and decompressable by gzip. I'm not sure how common it was in the wild.


  % gcc xmas.c
  xmas.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
      2 | main(t,_,a)
        | ^~~~
  xmas.c: In function 'main':
  xmas.c:2:1: warning: type of 't' defaults to 'int' [-Wimplicit-int]
  xmas.c:2:1: warning: type of '_' defaults to 'int' [-Wimplicit-int]


  % wc -c <xmas.c
  913
  % ./a.out | wc -c
  2359
  % ./a.out | compress | wc -c
  1048


zstd -19 compresses the text of the song to 309 bytes.

To be a fair comparison, though, you'd have to write zstd -d in 604 bytes. I suppose to be REALLY fair, though, you have to count the bytes of code in the compiler itself. A convenient enough implementation of compression could index into the C compiler binary to find the bytes it needs. (For example, my GCC contains "first", "second", and "third" in the binary, which a sufficiently clever implementation could make use of. "Exit on the first error occurred.", "Append a second underscore if the name already contains an underscore.", "Warn about suspicious calls to memset where the third argument is constant literal zero and the second is not.", etc. I didn't check but I doubt turtle doves or maids-a-milking come up that often in the description of warning flags.)


zstd didn't exist in 1988.


This article was posted to HN today, in 2023!


And that comment was clearly written in 1988. I fail to see what's so hard to understand about that and why people feel the need to "prove" it's "wrong".


I don't think people are trying to prove it's wrong. Most of the comments are saying that general purpose technology in 2023 doesn't reduce the file size by as much as the manual job in 1988 did.


This particular entry to the IOCCC was for 1988, those two algorithms didn’t come about for a few more years after that, gzip in the early nineties and 7z later that decade. The note is probably correct when comparing against the state of the art at the time.


You are right. Here is the source of "compress" that existed at that time. It compresses the produced song to 1048 bytes, not less.

https://www.nic.funet.fi/index/minix/compress.c

The program that produces the song, without the introductory comment, is 913 bytes, as presented in the article. Removing whitespaces it uses just 800 bytes and produces the song which is 2359 chars here. The whole C is:

    main(t,_,a)char*a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_,
    main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==
    2?_<13?main(2,_+1,"%s%d%d\n"):9:16:t<0?t<-72?main(_,t,
    "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#;\
    #q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l q#'+d'K#!/+k#;\
    q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# ){nl]!/n{n#'; \
    r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#\
    \
    n'wk nw' iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c ;;\
    {nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;\
    #'rdq#w! nr'/ ') }+}{rl#'{n' ')# }'+}##(!!/")
    :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1):
    0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a,
    "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
It compiles and links even without #include.


gzip is based on the DEFLATE algorithm, which is a combination of LZ77(1977) and Huffman coding(1952).( https://en.wikipedia.org/wiki/Gzip )


DEFLATE was created / implemented / speced / whatever word you want to apply by Phil Katz no earlier than 1989


"Based on" is not the same as "identical".


Since the word 'compressed' is in quotes they are probably suggesting that they mean when processed by the 'compress' command as available on UNIX at the time, as opposed to some other compression available at the time.


I just checked, gzip is from 1992, 7z from 1999.


The program was written in 1988. I ran the text through LZSS which was published in 1982, so was available before 1988. I used a 1989 public domain version by Haruhiko Okumura, which is after 1988, but I don't believe it is optimized to improve upon the compression level of the 1982 algorithm.

It took it from 2357 bytes to 534 bytes, which is smaller than the Xmas.c program which I counted as 917 bytes, but another poster counted 913 bytes.


"New departure" simply doesn't mean "record-breaking compression ratio".


Here's what Chat GPT came up with:

#include <stdio.h> #define o putchar #define p main #define q(a) return a; #define r ) { #define s { #define t for #define u if #define v else #define w while

char x="a partridge in a pear tree.\ntwo turtle doves\nand three french hens, four calling birds, five gold rings;\nsix geese a-laying, seven swans a-swimming,\neight maids a-milking, nine ladies dancing, ten lords a-leaping,\neleven pipers piping, twelve drummers drumming, "; char y[]={"first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eigth", "ninth", "tenth", "eleventh", "twelfth"};

int p() s int i=0, j, k, l; t(;i<12;i++) s printf("On the %s day of Christmas my true love gave to me\n", y[i]); j=0, k=0, l=0; t(;x[j];j++) s u(x[j]==' ' && (l==i || (l<i && x[j+1]=='\n'))) s u(!k)r k=1; o('and '); } v u(x[j]!='\n')r o(x[j]); } v r k=0; l++; } } o('\n'); } q(0) }




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: