Hacker News new | past | comments | ask | show | jobs | submit login
Code that is valid in both PHP and Java, and produces the same output in both (gist.github.com)
405 points by adamnemecek on Sept 1, 2016 | hide | past | web | favorite | 88 comments

When reading this, I remember that while I was studying computer science back in the early 90s, such programs was known as "polyglot" programs. One such example that circulated on email and usenet groups back then had at least 6 programming languages. After a bit searching I actually found back to that program:


It is both valid C, Pascal, COBOL, FORTRAN, Postscript, Bash, ksh and apparently also 8086 machine code (.com).

Pretty disappointing that it relies so much on comments.

You can write an arbitrary 'polyglot' program in any N languages that have incompatible comment delimiters: just write the same program in each language and wrap it in the comment delimiters of the other N-1 programs. That's not interesting, though.

That's not enough, if you start with language A's escape sequence /* that must not break languages B, C, D, E or F when starting a file with /* next B's escape sequence must not cause problems with C, D, E, or F. So, they must be different but still compatible.

Yep, I was expecting something different.

Hmm. Could use like 50 Lisp dialects to create the largest polyglot!



But... Why???

Why not?

Sometimes pushing against the boundaries of what can/can't or should/shouldn't be done is entertaining, educational, or both.

I mean, why would you abuse the spindles and armature in a hard drive to walk it across the floor like a poorly balanced washing machine, make a Unix clone for your 386, turn the Green Building into a Tetris game, write a Brainfuck interpreter in TECO, or run Flappy Bird on a Super Nintendo by exploiting bugs in Super Mario World? Seriously, what kind of right thinking person would engage is such perverse, time-wasting silliness? ;)

See above.

Phava is great!

But, quine-relay still wins: https://github.com/mame/quine-relay

That was pretty much THE thing that won my respect for the Ruby community.


That and the radiation-hardened quine:


Prior to seeing those, I was kind of dismissive of Ruby, but seeing those totally changed my mindset about code, and my prevailing opinion of execution environments and interpretted languages and scripting in general.

Great share, but I'd like to kindly suggest that you think again about your conclusions there. You were dismissive of an entire programming language and its community (we've all been there), and then 1 member of that community did something impressive, and you changed your entire opinion of said language and community (I guess most of us have been there too).

Wouldn't it be a much more interesting conclusion that there's fantastic people and not-so-fantastic people everywhere and it mostly just depends on where you look?

I mean, HN likes to be very dismissive of PHP but at the same Composer (the de-facto PHP package manager) avoids dependency conflicts entirely and provably because it contains a home-cooked SAT solver[1]. In PHP. Tackling an NP-complete problem without breaking a sweat. To me that's both super-awesome and sensible at the same time.

Maybe stereotypes about programming communities have more to do with marketing and accidental who-happens-to-be-most-prolific and less with the language or community just being good or bad.

[1] http://www.naderman.de/slippy/src/?file=2012-06-07-Composers...

> I mean, HN likes to be very dismissive of PHP but at the same Composer (the de-facto PHP package manager) avoids dependency conflicts entirely and provably because it contains a home-cooked SAT solver[1]. In PHP. Tackling an NP-complete problem without breaking a sweat. <

Any idea where can I find more details on that. The slideshow you linked does not provide enough information.

I wish I did! I can't remember how I learned that it's SAT based and google isn't very forthcoming :-(

Probably because most Google searches for "SAT" involve prep courses that help you achieve a score of 1600... [._.]

On the other hand, jargon searching wikipedia helps narrow results:


i thought the SAT solver was a PHP port of libsolv.

Well, I'd like to kindly suggest that you think again about your assumptions, regarding why I was dismissive of Ruby. Ruby was still kinda/sorta new-school in 2013. Yet another technology stack as far as I could tell. Ruby was not quite a thing yet, back in 2008 or so. Probably because Rails was still pretty much somewhat "new."

Back in 2008, I still had yet to learn that Ruby was a compiled language, and that Rails was a runtime environment for web applications. In corporate enterprise 401k cubicle farms, there were only hushed mentions of Ruby among hobbyists as early as autumn, 2009 (the first time I remember hearing tell of such a thing). And mostly among those who developed on Macs, which was actually a little bit rare back then. Ubuntu was similarly revolutionary at that time, even if LAMP stacks were well established.

Rails hype was kicking into full gear around 2006/2007.

Well that's lovely.

"Wouldn't it be a much more interesting conclusion that there's fantastic people and not-so-fantastic people everywhere and it mostly just depends on where you look?"

Yes...except this wasn't thier conclusion...so...

This seems like a strange comment. Your parent suggests that your grandparent should have made a specific different interpretation, and you respond that the suggested interpretation is different?

That's not because of the Ruby community, but because of Endoh, a world champion of IOCCC.

Probably also true.

Wow, that is super impressive! Surprisingly short program too.

A bit back I wrote a Forth interpreter that's a single file containing shell (a tiny bit), awk, C, and (a limited dialect of) Forth.


The awk byte-compiles the initial Forth dictionary into the C file; if you edit the C file, you then run it (as a shell script) and it updates the byte-compiled word definitions. Then you can compile it as a standalone C program.

Amazingly, this was actually the simplest way to solve the problem...

Java is fun. The following is valid Java code:

[Hint: Hello world]

But that's not really surprising, it's each character as a unicode escape (see section 3.2 of the Java spec). It's possible in any language that allows escaped code points.

Most languages only tend to allow them in string literals, or identifiers. Java is fairly unique in that it has a pre-processor that replaces those throughout the source code before actually compiling. Which means that a lot of benign-looking source code can have fairly surprising behaviour, such as exhibited in the polyglot where

actually cancels the line comment.

Not surprising, yes.

But what is silly is that the Unicode sequences are also interpreted in comments.

  /* Comment region begins. \u002a\u002f
  String s = "Comment was closed in the previous statement";

That's a recipe for pure evil.

Oh man, if I had known that at my last job I could have had great fun with some of coworkers via my commits.

here is the JavaScript version of that (run it, it's valid code, see http://www.jsfuck.com):


See also http://blog.portswigger.net/2016/07/executing-non-alphanumer... for executable JavaScript without parentheses.

Those interested in this topic may enjoy Ange Albertini's presentation at the 31st Chaos Communication Congress, on "Funky File Formats", https://events.ccc.de/congress/2014/Fahrplan/events/5930.htm... .

> Binary tricks to evade identification, detection, to exploit encryption and hash collisions. * artistic binaries - why they are possible, how they work. - quines - polyglots & chimeras - schizophrenic - AngeCryption - hash collisions

The presentation video and slides are available. It has examples of mixing multiple binary formats together, including valid files for one format which, when decrypted, produce another valid format.

What's the purpose of the:


bytes? This sequence appears multiple times, but is commented out in both languages' interpretations.

\u000A\u002F\u002A is the unicode for the beginning of a multi-line comment (\u000A\u002A\u002F being the end of one). The Java compiler will translate it as a comment while the PHP interpreter will ignore it (see https://www.reddit.com/r/ProgrammerHumor/comments/50guhc/thi... for more info).

Wow, that seems really dangerous. Apparently, this also applies inside string literals: http://javajee.com/unicode-escapes-in-java

So if you want to sneak in a back door, all you need is the right excuse to put some unicode escapes in a block comment, and you can hide your code in plain sight.

Something like:


I tried this in Eclipse, and the one saving grace is that while the basic syntax highlighting doesn't pick up on most of the de-commented part, the code intel seems to do an actual parse, and does a very subtle highlighting of "someService". (Edit: After reopening the file, that highlight is gone and it looks like a normal comment).

But if I saw that, I think I'd just assume the syntax coloring was messing up. And of course, if I'm looking at the code in GitHub, it will look just like a normal comment.

I just tried this in Android Studio, Eclipse, Emacs, Gedit, Nano, Vim and all of them get this wrong.

I feel the sudden urge to test every Java syntax highlighter in use and file lots of issues.

GitHub definitely needs to know about this due to their popularity and subsequent lowest-common-denominator status (which isn't a bad thing, just the truth); this sort of attack only requires a PhD in how to use the clipboard, and not any other particular knowledge or skillset.

GitHub indirectly uses the Java bundle for TextMate, where I filed this issue: https://github.com/textmate/java.tmbundle/issues/45

Since the escape sequences have to be handled everywhere, it seems unlikely that this will ever be fixed completely, but I hope that something will be done about it.

This issue has also been raised on the Eclipse bug tracker: https://bugs.eclipse.org/bugs/show_bug.cgi?id=3533

In 2001.

That is insane.

I added a pingback to this thread to the bug, along with some general encouragement that fixing this is a wise idea.

oh please do.

Interesting, thank you for explaining!

Given that they're prefixed with //, why does the sequence get interpreted at all? IIRC "///[star]" does not open a block comment.

EDIT: oh! That's what the 0x0a is for -- it's a line feed, which puts the /[star] onto the next line. Sneaky.

EDIT2: hmmmm HN formatting... had to replace my asterisks with [star] because it was just italicizing everything in between.

Looks like the color coding on github doesn't understand it and doesn't gray it out like it should.

  In [2]: u"\u000A\u002F\u002A"
  Out[2]: u'\n/*'

It appears java will interpret the inline escaped unicode, which is hiding a block comment.


Well, that sequence is a newline followed by "/". My guess is that one of the languages parses "//\u000A\u002F\u002A" as:


Yes, Java compiler translates Unicode chars before compilation (and before many other things).

See this classic example: http://stackoverflow.com/questions/20383687/how-it-is-possib...

It's opening and closing comment blocks.

The Java code does not run between these instances.

"But now we'll never know if Schrodinger's computer is running php or java..."

"I hope you die in a fire of a 1000 java compilers."

The Jurassic Park quote in the comments is perfect.

"You were so preoccupied with whether or not you could, you didn't stop to think if you should"

Here's a polyglot that's valid in a large number of languages: https://github.com/mauke/poly.poly

At least the following:

    * C (89)
    * C (99)
    * C++
    * Haskell (multiple extensions, not sure how that works)
    * Bash
    * ZSH
    * Posix SH
    * Make
    * Perl 5
    * Perl 6
    * Ruby
    * Python
    * Brainfuck
    * HTML
edit: formatting

This is often called a "polyglot" program, if you're interested in finding more examples.


Also interesting are document formats wherein one would embed other document formats. Examples would include embedding malicious files within a PNG; it'd typically start with distributing a PDF file:



I recall the quercus project from Coucho. Translated php into jvm byte code. One way to truly accelerate php many years back.

Basic, Python 2, Perl

    print "Hello World"

Now how about a quine that's valid in PHP and Java?


This 100-language quine relay passes through both on its way around the clock. :)

Just posted this link to HN because that is insane. Like clinically, and much more insane than this one.

I actually find that one much more boring. It's basically a nested string.

From the gist comments:

> Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should. - Ian Malcolm

https://blog.goeswhere.com/2010/04/java-cpp-polyglot/ here's an example of the same sort of thing but with C++/Java

Yup, you can do that with just about any set of languages, right?

The trick here relies mostly on comments that are valid for one language, but not the other (abusing Java's Unicode preprocessing). There are language pairs that offer no provision at all for embedding parts of code that another language won't see or misinterpret. E.g., I haven't been able to write a batch file that also works as a PowerShell script yet, although you can write a batch file that doubles as a VB script through clever use of a conditional jump.

Oooh, right. So, I guess there's some pairs that are incompatible. Hmm :(

Maybe it also work for computationally equivalent (simulation/emulation) math where you're decompiling the machine language into a stream of logical NORs or such. That might connect more languages.

No, but another one you can do it in is PHP and JavaScript

    function isEven($number) {
        return $number % 2 == 0;
That's actually valid PHP code and JavaScript. But I think you'll need to use eval on that in the PHP vm because it won't have the `<?php`

You can use the same trick used here of hiding the PHP tags behind the comments /*<?php

Actually that trick creates some residual garbage (the slash and the asterisk would be printed in PHP)

A way to avoid this is to use the HTML comment notation, JavaScript just ignores it (until end of line), so just:

    <!-- <?php echo "-->";

It comes back to my mind that I ran into a source code that was both valid Java and valid C++... Can't remember what was it though.

amusing and fairly cool. That calling a non-static function in a static context in the PHP file bothers the heck out of me for some reason, though. (maybe since my coworkers always mess it up).

Security guys would love this. Multilanguage payloads.

Polyglot programming. That's wild.

Wasabi 2.0

Is this just a novelty or does it mean i can hack Enterprise Java apps with ratchet PHP code?

You could always use the 100% Java implementation of PHP in your app server:


hey cool, a Perthian :-)

I have the same response everytime someone from Perth pops up in HN :)

Quite a recent picture as well... As I recall, the Rio Tinto logo on the Central Park building was only up a couple years ago.


From the sublime to the ridiculous. When this community is looking for something ridiculous, we find it on Github. Even at the height of hilarity, there is a turn for the nerdy.

But we are all in agreement that the author of this must be locked away from society for the greater good of mankind, right?

Yes, before it raise the ire of Cthulhu.

We have enough problems in creating programs. This is the equivalent of an instagram selfie or math geek with a favorite way to generate a pattern of numbers, where someone is trying to be overly clever just to say "look it's clever"...it's just noise.

If by code you mean comments...

Comments are code

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact