>It serves no purpose, except proving that files format not starting at offset 0 are a bad idea
What exactly does it mean to start at offset 0 and why don't these file formats do that? Is there an advantage in not starting at offset 0 or is it simply oversight/indifference? Any kind of background on the problem would be appreciated, I'm really quite intrigued.
Most file types have this magic signature as the initial few bytes of the file. For example, a Windows executable always begins with the ASCII characters "MZ".
The point is that with non-overlapping magic signatures, a single file can be simultaneously identified as more than one type.
"MZ" are the initials of Mark Zbikowski, one of the developers of MS-DOS. :)
A PNG file starts with an 8-byte signature.
The hexadecimal byte values are 89 50 4E 47 0D 0A 1A 0A;
the decimal values are 137 80 78 71 13 10 26 10.
GIF starts with another marker at zero offset, so no valid GIF is a valid PNG, and vice versa.
Some formats are mutually exclusive because they “fight” for contents of first several bytes.
Some formats are more relaxed and introduce the exploited possibility of carefully engineered ambiguity.
edit: removed a section that was utterly wrong
For instance, a common obfuscation method is simply removing the magic number from the file; in this case, the program may simply try to use the file as the given format and return an error (or crash; we are talking largely about proprietary software in these cases after all) if the file can't be read.
Other than that, I can't provide any information on file formats allowed to start at offsets other than 0, or why this may or may not be a good idea (I suppose maybe it would allow an enterprising programmer to hide a malicious file by embedding it in an otherwise-innocuous format?), though I am certainly curious as well.
It seems to me that if all file format identifiers started at the zero offset, it would be impossible for a single file to identify as more than one format. However, when different formats use different offsets to identify themselves, it is possible to construct the file in such a way that it validly identifies as more than one format.
Edit: someone posted results for .exe file inside the .zip, which are a bit different (it seems like some antiviruses don't try to unpack it?), but then deleted the comment. Here's the link for .exe: https://www.virustotal.com/file/2a9c7a16cdb3c3f2285afaf61072...
Still impressive stuff and also given the use of undocumented opcodes and x86 foo it does raise a new question:
Given some VM's will fail on some of the instructions instead of running on bare metal, is it possible to have a virus that will only trigger on bare metal or VM machines thru use of undocumented op codes and the like.
Non the less a wonderful definition in hacking in its truest sence and educational on undocumented OP codes and how for some things you cant beat pure assembly for fun and jollys.
An error occurred while performing an ICAP operation: File decompression/decode error; File: CorkaMIX.zip; Sub File: No file name available; Vendor: Kaspersky Labs; Engine error code: 0x00050000; Engine version: 220.127.116.11; Pattern version: 120801.124000.8311194; Pattern date: 2012.08.01 12:40:00
In fact, they all return the same result!
== the program ==
 visually. If you ignore the newline.
What other formats don't need to start at offset 0?