Hacker News new | past | comments | ask | show | jobs | submit login
93% of Paint Splatters Are Valid Perl Programs (colinm.org)
689 points by thaumaturgy on Apr 4, 2019 | hide | past | web | favorite | 132 comments

> Also, ImageNet’s website was down on the day that we decided to perform this research. We therefore paid an unemployed person to download 100 examples of paint-splatter artwork by searching Pinterest using the query “paint splatter wallpaper”.

Such brutal honesty would be welcome in other papers.

The author recently quit Google[0], so I believe he's unemployed. I think that's the joke (also, he really likes his memes).

[0] https://mobile.twitter.com/mcmillen/status/10863837264504053...

There's a footnote on "unemployed person" with the text "the first author."

> We therefore paid an unemployed person

> Such brutal honesty would be welcome in other papers.

To say nothing of the commitment to employment fairness!

I love this footnote so much:

> This feature does enable a neat quine: the Perl program “Illegal division by zero at /tmp/quine.pl line 1.”, when saved in the appropriate location, outputs “Illegal division by zero at /tmp/quine.pl line 1.” The reason for this behavior is left as an exercise for the reader.

The part that left my office in stitches:

> (To be fair to Perl, when perl is run with the -w flag to enable warnings, it does helpfully inform the user that at some point in the future, the Perl developers will most likely pick gggijgziifiiffif as a new reserved word:

>> Unquoted string "gggijgziifiiffif" may clash with future reserved word at - line 1.)

Knowing Larry, I believe he is currently working on a patch.

Larry left perl5 about 15 years ago, and nobody every will reserve random keywords like this. Keywords are very problematic in perl5 and are not needed all. You can add random new keywords at runtime, and you can add methods doing almost everything, like accepting blocks or functions, implementing most control structures.

     $ perl -MO=Deparse -e "Illegal division by zero at /tmp/quine.pl line 1."
    'division'->Illegal('zero'->by('at' / 'tmp' / 'quine' . line'->pl(1)));
     -e syntax OK

It's trying to divide "Illegal division by zero at /" by "tmp/quine.pl line 1." and "tmp/quine.pl line 1." evaluates to 0?

It is slightly more complicated than that :-) https://fanf.dreamwidth.org/131318.html

the actual parse is

      ((“at” / “tmp”) / “quine”)
       . line->pl(1.0)

I think it's equivalent to something like

  Illegal::division(by::zero(at/tmp/quine . pl::line(1.)))

Or "quine.pl line 1." Evaluates to 0?

Or should that be 0.0???

Yeah, I think it's both divisions. Both 'perl -e "in /tmp"' and 'perl -e "tmp/quine"' result in the same division-by-zero error.

> This feature does enable a neat quine: the Perl program “Illegal division by zero at /tmp/quine.pl line 1.”, when saved in the appropriate location, outputs “Illegal division by zero at /tmp/quine.pl line 1.” The reason for this behavior is left as an exercise for the reader.

For quines, see the chapter "Air on G's String", a dialogue between Achilles and the Tortoise in Douglas Hofstadter's book GODEL,ESCHER,BACH: AN ETERNAL GOLDEN BRAID.

Because of the frequency I see undecipherable strings like this in actual Perl code, I don't like Perl the programming language. However, I love Perl the project. Take a look at the config file for Perl and how many systems and architectures it accommodates: https://github.com/Perl/perl5/blob/blead/Configure

It's clearly such an incredible labor of love by Larry Wall et al. that I'm sad for them it isn't more popular. Just a mammoth amount of work by nice, passionate people.

In my world (bioinformatics), it was initially the go-to language. Even today I very occasionally run across something using BioPerl, requiring me to stumble through CPAN again. I get pretty nostalgic thinking about it.

While some people tend to say Perl was chosen by the young bioinformatics community because of it's excel in text processing, I think Perl was just the go-to scripting language in the late 90s. In my field (numerical relativity), Perl was also the go-to language in the early days. People then moved on to Python because of the numpy/scipy ecosystem which allowed to combine Perl-like scripts with Matlab/Fortran like post processing. Today, everything is Python. Python2 :-P

Perl was being used in bioinformatics in the early 1990s - definitely before the "late 90s". Perl was becoming the go-to language in Unix fields in early 1990s, and bioinformatics comes out of that Unix tradition. (See also my comment at https://news.ycombinator.com/item?id=11381917 .)

There's no reason to guess which factors contributed to Perl's uptake in the young bioinformatics community. Text processing is only one of six reasons Lincoln Stein listed in 1996 in "How Perl Saved the Human Genome Project". See https://web.stanford.edu/class/gene211/handouts/How_Perl_HGP...

I switched to Python in 2011 (basically because Django was a lot easier to get into as web framework, better docs and because too many ways to do things made Catalyst confusing for a beginner).

Perl was starting to loose ground by then, but was still pretty popular. My guess is around a third of bioinformaticans were using it as the goto language at that point.

Its a pity, I miss Perl and still pull it out for quick scripts that involve manipulating the filesystem as its got so much better syntactic sugar for that sort of thing. And using regexes in Python is such a chore in comparison. I always need to look up the docs, whereas I remember how to use them in Perl despite hardly using the language in 8 years

You might enjoy, in a "never try this in practice" sort of way, some code I did to make a working Python variant, which compiled to Python byte code, and supported Perl-like pattern match syntax, like:

    # get_function_names.py
    for line in open("python_yacc.py"):
        if line =~ m/def (?P<name>\w+) *(?P<args>\(.*\)) *:/:
            print repr($1), repr($args)
See http://dalkescientific.com/Python/python4ply-tutorial.html .

It got appreciative hisses when I did a lightning talk about it at a PyCon.

FWIW, I co-founded Biopython back in 1999.

Perl gives you as much rope as you want to hang yourself. If you can develop a style, and try to avoid being too clever, it can be just as readable as any other language.

And then someone else tries to read your code. One of the things I love about ruby dev is its not about providing 10000 ways to do the same thing but about making sure that given a simple problem, every developer will solve it in pretty much the same way.

What? With ruby, there are at least 4 ways to find out how big something is (.size, .length, et al). You want to loop over something? You can use python style for in loops, .each, and several other ways. Your block could be a block, or it could be a symbol with an & in front. Want a bit of code without a name? You have blocks, procs, and lambda. Want a string? You have string and symbol, and ne'er the two shall cross paths.

Ruby is specifically designed as a replacement for perl, and keeps a lot of the same warts ($igils, and globals like $?, $1, etc). While I agree it's better than perl for readability, it's not because there's only one way to do any given thing.

I have had to maintain absolute crap in Python and beautifully written Perl. 90% of it is down to the developer.

Are you crazy? I explicitly dislike ruby because I feel it encourages everyone to make their own DSL like bullshit wrappers everywhere - Leading to a million and a half ways to do anything and everything.

While I don't use it often, I feel like GoLang at least vaguely tried to adopt a "single solution" approach to language design. They keep the standard library as compact as possible, and formatting is non-optional with gofmt.

The best languages sacrifice readability the least for a given amount of cleverness.

Or the best give you all the flexibility you could ever need and allow you the freedom to write readable and maintainable code or use all the features to write an insanely terse oneliner that can do the job quickly and be thrown out.

Except, of course, they are rarely thrown out :)

Exactly. Like BASIC.

I'd argue that use of clever or advanced patterns in BASIC actually renders the code rather opaque. I.e. compare coroutine usage between BASIC and ruby (if you're even able to get it working in BASIC).

You just described Tcl. :-)

> Take a look at the config file for Perl and how many systems and architectures it accommodates: https://github.com/Perl/perl5/blob/blead/Configure

  25249 lines (23725 sloc)
Good grief. This is incredible. Thanks for sharing that.

Lots of the comments are really funny.

> It's clearly such an incredible labor of love by Larry Wall et al. that I'm sad for them it isn't more popular. Just a mammoth amount of work by nice, passionate people.

That would even more fittingly describe Perl 6.

For some reason, there seems to be a complete lack of toxic behavior in the Perl community (both 5 and 6).

Larry Wall is a wonderful human.

I recently wrote a little program that will create a Markov Chain from on our perl codebase, generate a few sentences and check against `perl -c`. It takes less than 10 attempts to generate a bunch of lines of [very funny] valid Perl code.

Hey! I did that too! Except my version ran the Markov chain against its own source code, which was padded with a bunch of meaningless subs that did were never called but were valid code. It also ran in an infinite loop, which worked well. I had to kill it after a few minutes because I was afraid it would start running shell commands.

And the terrible truth is this is how we got skynet.

I was too scared to run randomly generated perl-code. But I did have some fun fuzzing awk/gawk a while back.


Perl can return the favor. https://metacpan.org/pod/Acme::EyeDrops

93% of random OCR outputs are valid Perl programs. The paint splatters are a distraction.

Yes, but it is a fun distraction and kept me from my work for a few minutes.

Its sigbovik. The entire conference is a distraction. That's the whole point.

There’s at least an infinite number of infinitely long inputs to Perl. Which proportion of these will terminate without an error?

Trick question.

An infinitely long input will never finishing reading, so we'll never find out :)

> We manually filtered out all images with any form of overlaid or watermarked synthetic text, because the Perl program (?) “iStock by Getty Images” is not particularly interesting.

It's also not a valid Perl program:

    bash-4.3$ perl -e "iStock by Getty Images"
    Can't locate object method "Getty" via package "Images" (perhaps you forgot to load
    "Images"?) at -e line 1.

I miss Perl. Why did PHP win again, in the late 90s? (though I'm mostly Python these days and shan't complain, there was a good 10 year stretch of PHP there I'm not proud of).

PHP won because it was easy to install and restrict (by memory usage) on shared servers, as well as the fact that writing a PHP script was as simple as renaming an .html file to .php so web designers who wanted some dynamic functionality could use it really easily.

As bad as some aspects of PHP were, making it so trivial to install and make available on a server was a brilliant move that lots of languages could still learn from today.

If you have a project you want to become popular putting real work in to the initial onboarding is vital.

The parent nailed it. It's easy to complain about PHP from various standpoints both pragmatic and PL-centric, but it has an amazing superpower which might be summed as "serverless for the web, v0".

No deployment setup will ever match the pure joy of dragging a .php file into WinFTP, watching the little blue progress bar, and refreshing your browser.

I practically have that level of “luxury” in my Awful[0] (CHICKEN Scheme) web app that I’m working on right now. With a couple lines of code[1] added to my project (that I have only running in development mode), I can simply save my source code file in Emacs, have it automatically saved on the server through Emacs’s TRAMP feature, reload the page in Firefox, and instantly see my changes. If I like it, I make a commit through magit and I’m off to the next task.

Of course, I don’t run it like this in production (to refresh the production app I have to go to "/reload" from a certain IP address and be logged into the admin account).

Don’t worry: the “old ways” are still here, even if no longer mainstream.

[0]: http://wiki.call-cc.org/eggref/5/awful#a-hello-world-example

[1]: http://wiki.call-cc.org/eggref/5/awful#reload-applications-c...

Yes! But not exactly relevant: this deployment technology is supported by Perl/CGI just as well.

Definitely. I wasn't trying to be relevant so much as I got nostalgic at the mention :)

Interesting. Similar to how typescript can be added with a rename from foo.js to foo.ts.

even on something with dependencies? i.e., with a few requires or ES2016 imports?

Yes, if your tsconfig is sufficiently permissive it will simply give any imports for which it can't find a type definition the 'any' type and assume that you can do anything you like with them. Obviously this doesn't give you any help from the type checker but it will work fine and you can go back and add typings later when you decide you need them.

PHP won because you just wrote the code and it worked. With Perl you had to deal with the cgi interface which was a pain. Mod_perl came later. PHP was just extremely easy to get started with — you did t need to really understand Perl modules, the cgi spec, or anything to get started with it.

I gave up on PHP around 1.8b6 or something like that. However, before mod_perl arrived, I have used Perl to generate PHP code: a minutely cronjob would read the database and generate the PHP code with the right pulldown values from the database. It was the best of 2 worlds: a sane programming language and fast (non-CGI) execution.

Yeah... Ruby's very nice and feels Perl-ish sometimes.

That's intentional.


"Ruby is a language of careful balance. Its creator, Yukihiro “Matz” Matsumoto, blended parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp) ..."

The name "Ruby" is also a play on "Perl" ("pearl").

Yep, if you like Perl you've got to spend some time with Ruby. It shares a heritage. Many of the people I used to Perl with moved over and were (and are) very happy. That said many of the "perl-isms" in Ruby are looked down upon by the community, but you can still use them ;-)

I programmed Perl for many years and Ruby as well; I see the similarities but I very much prefer Perl and find idiomatic Ruby not very pleasant to read or write. Matter of taste ofcourse.

mod_perl couldn't live safely on shared servers. You had to rent the whole box if you wanted to use mod_perl.

Not even mentioning the memory leakiness of it, it even had a module for tracking leaks:



I have a lot of legacy code built on mod_perl that survived over five hardware generations without missing a beat. But mod_perl has become system-cripplingly leaky over the last couple years. I nearly entered a debug rabbit hole before considering a simple workaround. After a quick apache conf edit, mod_perl was swapped out for plain old CGI. Heresy perhaps, but everything is running smoothly again.

Because for anything you wanted to do there were 6 different ways to do it in Perl, making it rather hard to learn Perl.

Perl also looks like line noise, which makes it hard to read.

PHP also won because you can take a .html file add <?=$x+1?> and it's a valid PHP program. Perl was harder.

It only looks like line noise if you make it look like line noise.

And people did. And other people celebrated. Context is a lot.

mod_php was much simpler to use than mod_perl; and much more amenable to shared hosting

Do any of the splatters generate a "use strict;" at the beginning?

It would be interesting to reverse-engineer this to figure out which paint splatters reliably produced "use strict;".

Presumably one could have AFL fuzz determine the input needed to generate any desired outcome if splatter generation is automated.

How would you prevent it from finding "paint splatters" which are just renderings of the actual text in question?

Quite against my better judgment, I've made the transition from Perl-hater to Perl- er... -tolerator.

I found myself arguing yesterday with another person in the team about the proper use of Perl references, which I cursed and spat on for 2 days when I first had to use them.

I once read that my current position mirrors that of most Perl users.

(my boss has a self-confessed "irrational" hatred of Python because of semantic whitespace. Which is ironic, because they're the one who pushed for using YAML, which also relies on semantic whitespace (at least in the way we use it))

I kinda agree with you boss. I don't hate python, but I find semantic whitespace both tyrannical and error prone. Interestingly, Makefiles also have semantic whitespace, which seems to be much more commonly despised than python's.

What do you find error prone in it? As I just posted above, it solves the problem of loosing braces when moving code blocks around.

The whitespace thing annoyed me when I first stared using Python. Then after using Python for a few months I went back and modified some Perl. Moved some blocks of code around and lost a brace somewhere. Spent 20 minutes trying to work out where. Then I realized that it's is a problem that never occurs in Python.

So who's going to write the app we can carry on our phones into MOMA and evaluate the Jackson Pollocks?

> source code not available yet because i am bad at GitHub

I had to chuckle at that. The whole this is so ridiculous and it's wonderful :D

The paper is part of the SIGBOVIK "conference" that takes intentionally funny computer science papers (of varying degrees of rigor) every year. This paper is in good company alongside "Aumann agreement by combat", "Survival in chessland" (an investigation of what chess piece you want to be to minimize your chance of being captured in a game), "NaN gates and flip FLOPS" (re-representing binary operations as valid floating-point logic on NaN and inf, then building a hardware implementation of a machine that computes on that logic using standards-compatible floating-point representations of those values), and "Need more RAM? Just invent time travel!"

Full proceedings document for this year: http://sigbovik.org/2019/proceedings.pdf

100% of TECO https://en.wikipedia.org/wiki/TECO_(text_editor) inputs are valid programs, or so the story goes.

"a common game for TECO fans was to enter their name as a command sequence, and then try to work out what would happen"

I wonder how much of Perl's decline can be attributed to the Duke-Nukem-Forever-tier delay in releasing perl6.

It seemed like so much of the web ran on it in the late 90s and early 2000s that it would have at least as much traction as PHP does today.

Perl6 was a reaction to Perl's loss of mindshare, not its cause.

One can speculate what might have happened if focus had been put on evolving Perl5, or getting Rakudo production-ready as-is instead of going through yet another round of 'tinkering' (eg making the compiler backend-independent, the New Object Model refactor, creation of MoarVM, the Great List Refactor, ...) - however my crystal ball seems to be broken and just continues to display 42 no matter the question...

> The design process for Perl 6 began in 2000

I don’t remember Perl losing mindshare around that time; quite the opposite.

Perl6 wasn't _released_ until 2016, though.

You mean, 1.0 of the spec was released then. Don't pretend it wasn't perfectly possible to write and run useful Perl6 programs for a good ten years prior.



For some definition of 'perl6'. I have a perl6 book from o'reilly from 2006, and I flipped through it recently. It bears almost no resemblance to modern perl6. There were a couple of things that were still the same (and not just vestiges from perl5 like sigils and function calls looking like function calls), but by and large it was something completely different.

You mean 2015.

Specifically December 25th 2015.

> Perl6 was a reaction to Perl's loss of mindshare, not its cause.

True, but when your reaction takes 10 years, you lost a lot of ground. When perl6 was started perl was declining but still in serious use.

And today Perl is declining but still in serious use.

It's percentage has dropped by a significant amount, but I would bet the amount of new code has not dropped nearly as much.

Also I'm fairly certain that Go is close to a 10 year reaction to Python. (If not it's implementation.) The only difference is that it wasn't done in public. It's so much easier to ignore the years of development when you can hide it.

In my opinion a lot. I got my first job in 2003 programming Perl 5 and the way it was talked about, it sounded like Perl 6 was just a year or two away.

Still my favorite language.

This has, in fact, reinforced my sincere belief that Perl is God's chosen language.

There it is.

I strongly recommend that you read the linked paper also.

Interpreting a Banksy as Perl should hack the DNC and GOP to give all their money to charities.

Even Bobby Tables mother would be impressed by such a hack.

I like Perl. It is magic.

It's by far my preferred method of turning thoughts into code quickly, though I don't know what that says about my thoughts.

The free form nature is amazing to spec out an idea and see what happens. Where it falls short for me is static analysis - a simple question like 'if we change this module, how many scripts will be affected?' is very difficult or impossible to answer due to the number of ways you can abuse Perl. Though in most cases, we get by.

Indeed it is! You just OCR a random blob and get a division by 0 error.

For additional fun, try piping random characters into perl:

    $ chars='A-Za-z0-9!"#$%&'\''()*+,-./:;<=>?@[\]^_`{|}~'
    $ </dev/urandom tr -dc $chars | head -c 10 | tee random.pl | perl
Usually you'll encounter a syntax error, but sometimes it does execute without errors. Usually the successful ones are successful only because Perl finds an errant octothorpe and skips over all the other literal line noise after it.

Regardless, fun to know that

is a valid - useless, but valid - Perl program.

The irony is that the readability of 93% of actual Perl code is equivalent to the paint splatters

The logic doesn't follow the other direction. It isn't a necessary condition. We're doing (93% paint splatter -> perl code) but that isn't equivalent to (93% perl code -> paint splatter). The first could be valid AND the following can be valid (0.1% perl code -> paint splatter)

Aren't 100% perl code -> paint splatters. For each code there exists a paint splatter i.e. image of the code itself.

I'm just saying the logic doesn't follow. Given the claim does not necessarily require the opposite to be true. I don't actually know much about this perl being paint splatter and why that even makes sense.

Your statement might make sense, but what I'm saying is that the claim in the title doesn't equate to the claim you made. That's all I'm saying.

You’re super fun at parties I bet

My first programming was done with Perl. Those first attempts at programming definitely resembled a monkey throwing paint at a wall, so this does not surprise me.

No way. I've been known to joke to people around me for a decade that 90% of speech bubbles of cursing comic characters contained valid Perl code. I don't believe someone actually set out to prove it (or something very similar at least). Things like these are why I love HN.

ah, a nice and absurd sigbovik presentation. everything I hear from that convention has been simultaneously silly and amazing (especially the stuff that Tom7 did like having a printed program with only printable characters that is a valid and compiling c program)

Um. He made an executable using only printable characters.

If you never worked in perl, you might think people are being a bit mean and want to take a contrarian position. It really was that bad. You'd come back from lunch and need to pull out a notepad and start diagramming to understand the code you are written an hour earlier. The only language I've ever seen, outside of joke languages like brainfuck, where the typical program was less readable is APL.

> You'd come back from lunch and need to pull out a notepad and start diagramming to understand the code you are written an hour earlier.

I think this says more about your skill level in programming Perl or programming in general at that time than it does about Perl. Or at least the influences that you learned Perl from.

It's possible to write nearly unreadable code in any language. It's easier to do so in Perl, since it's pretty freeform. It's also not that hard to write very clear code. But you're complaining about your own code. I think perhaps blaming the tool at that point is looking in the wrong place.

I was certainly not a good programmer back then. But when I started writing java (1.1!) instead my bad code was a lot less bad.

While I’m sure it’s possible to write maintainable perl I’ve never heard anyone say “we had 10,000 lines of perl and it was a joy to work with.” Ever.

The Java version puts the prior comment in perspective. I agree even most Perl programmers would not want to work on a codebase from this Era. The problem is that Perl was the de-facto web programming language at the time, and large swaths of the internet was written by Perl neophytes and programming amateurs. The code generated by this population was necessarily problematic.

By the same token, PHP codebase of the early 2000's were a horrible mess. There is a downside to being among the popular and accessible languages of a period. It takes a while for the general community skill level to increase and best practices to trickle down to new people. Javascript of a few years ago wasn't all that different, except for a large amount of experienced programmers from other languages also being involved because of its privileged place as the language of web browsers.

These days, I wouldn't imagine most codebasse of 10k lines in Perl being noticeably worse than in Python or Ruby.

But what I want to know is what percentage of perl programs could be represented as paint splatters?

Wonder what it would take to transform, as an one to one function, any random text into a syntactically correct program of any language. Off the top of my head, enclosing the text in a string literal would work, but are there more interesting methods out there?

Why not just generate random strings and then see how many of them are valid Perl programs. Pretty sure it will land around the same number. The image ->OCR->Perl is making a boring statement fancy for no reason.

How can one grab the output? The following (first line) saves nothing to the file as is seen in the second line and with the program in the third line:

~$ ./perltest | tee output.txt

~$ cat output.txt

~$ cat perltest



Not surprising, I often felt you could roll your face across the keyboard and it'd be a valid perl program.

is it true with all OCR engines

I LOL'd at "TLDR: read the paper and view the gallery of pretty Perl programs."

Perl is a mess.

Perl is art.

Art is Perl.

Well, about 93% of it at least.

93% of a sample space of 100 splatter artworks is.

It's modern art and worth millions. And I hear Jim Carrey is submitting packages to CPAN now.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact