
How We Decoded Some Nasty Multi-Level Encoded Malware  - cubictwo
http://blog.sucuri.net/2013/12/how-we-decoded-some-nasty-multi-level-encoded-malware.html
======
GhotiFish
I must admit, I think the last step was a little. uhm... well I just think its
funny how you used Hacker News to decrypt a script then posted about that on
Hacker News.

~~~
JonnieCache
hopefully noodly will turn up and tell us how he/she did it.

EDIT: obviously bruteforce, but still.

~~~
bediger4000
It's an XOR encryption, with a largish key - 12 bytes if I make the old hacker
news comment correctly. If you do the standard known-text XOR key finding
algorithm, you need a 24-byte known text.

That's some brute forcing, if you ask me.

~~~
alexkus
But you're just looking for the text being in a certain range of characters.
The first stage is determining the key length, you can do this by looking at
the number of different character values in positions 1,2,3... then 1,3,5,...
and 2,4,6,... then 1,4,7,... etc.

    
    
        keylength=1 127
        keylength=2 114 116
        keylength=3 107 102 106
        keylength=4 78 94 98 103
        keylength=5 111 118 115 114 119
        keylength=6 81 71 81 77 79 77
        keylength=7 104 112 110 104 110 110 109
        keylength=8 67 85 86 94 68 91 83 86
        keylength=9 87 86 86 90 92 85 83 85 82
        keylength=10 91 97 86 89 90 88 90 90 92 97
        keylength=11 90 101 102 98 99 93 100 100 100 95 93
        keylength=12 45 45 46 53 47 46 41 47 47 51 50 47
        keylength=13 89 85 100 90 89 91 94 95 82 90 86 88 92
        keylength=14 83 84 85 88 82 81 77 72 82 85 73 81 82 86
        keylength=15 73 81 72 76 76 70 75 73 74 72 77 76 73 71 73
        keylength=16 56 68 71 80 60 73 64 73 58 71 70 76 57 76 66 68
        keylength=17 81 86 82 74 77 82 85 87 81 87 78 84 77 77 79 80 82
        keylength=18 59 55 59 55 61 56 59 55 57 59 55 62 62 57 60 58 56 59
        keylength=19 77 80 80 77 71 89 78 76 73 74 68 80 80 80 74 76 78 79 72
    

The stand out line from that is keylength=12 as it provides a local minima for
the number of different characters.

Having a good idea of the key length you can look at which of the possible 256
values for each position of the key produce the top numbers of printable
characters. A quick perl script to do this gives:-

    
    
        POSS:0: n=211 chars=FHLRSUVZ]
        POSS:1: n=209 chars=cdjkouwx~
        POSS:2: n=208 chars=DIJKLORUW^_
        POSS:3: n=209 chars=BIKSVW
        POSS:4: n=206 chars=bejkntvy~
        POSS:5: n=210 chars=@CDEFKQXZ
        POSS:6: n=208 chars="#)+.023678>?
        POSS:7: n=206 chars=fgmorswz|
        POSS:8: n=208 chars=behjknt~
        POSS:9: n=205 chars=GMORSTW\
        POSS:10: n=207 chars=FLWXY[\
        POSS:11: n=208 chars=cdjkoux~
    

You'll note that the actual key (SjJVkE6rkRYj) is present in this output. I'll
carry on with this if I get a chance tomorrow.

~~~
alexkus
OK, next pick a word you think will be in the output (it's php code so I'll
guess the word "function" might be common). You then XOR each 8 character
substring with "function", taking note of the position of the substring within
the 12 character groupings, and count the resulting characters:-

    
    
        $ ./4.pl 12 "function"
        POS: 0: S->78 T->71 U->69
        POS: 1: j->76 l->64 m->59
        POS: 2: L->73 J->67 W->64
        POS: 3: V->65 Q->55 P->49
        POS: 4: m->68 v->50 `->49
        POS: 5: E->54 C->52 O->48
        POS: 6: 6->57 0->56 1->56
        POS: 7: u->56 t->54 x->50
        POS: 8: k->76 l->70 m->66
        POS: 9: R->69 U->57 H->51
        POS: 10: _->69 Y->62 ^->54
        POS: 11: j->70 m->59 l->55
    

Note that the correct key in each position is usually either the first or
second possibility, but there are some times when it isn't there at all.

(It's better if you give it a longer string, if I give it "function " with the
trailing space then it gets 9/12 as the most popular, 2/12 as second most
popular and the remaining one as the third most popular. There are even better
choices, see below, but I chose to go with this as it helps show how to
continue with non-perfect information).

Next I knocked up a quick program to allow me to try different keys to decrypt
it, it prints out the code in blocks of the appropriate keylength so it's easy
to see if a specific character position is correct or not. The program takes
various commands from stdin (k = print key, S = set key, s = set individual
key character, p = print in keylength sized blocks, P = print entire decrypted
block):-

    
    
        $ ./5.pl 12
        k
        key is aaaaaaaaaaaa
        S SjLVmE6ukR_j
        key is now SjLVmE6ukR_j
        p
        /)abjutt uy
        stcm&vauiadl
        es\12i`(!Gisue
        t("_UERQER/)
        {$YCIOKNE= $
        HTRPYCOHKIC_
        VATS=$_WOSR=
        &$NTRP_WOSR_
        VATS=$_@ET;&
        $HRTV_GBT_PA
        RS=}\12//cie&w
        itn crrhr
        `u
        ncriin \127_doe
        ($k)}@hbadcr
        ('NTRP/6.1&5
        ...
    

Getting there, but still a way off. There are a few strings like COIKOE[#n]=
which could be COOKIE, so we fiddle with the 3rd column and see some
interesting things in the output:-

    
    
        s 2 J
        p
        ...
        elseof(oasYp
        ...
        COOKOE[#n]=
        ...
        systcm(#c)=}
        ...
    

Neither of the other two choices for column 4 produce the right string, but
it's simple enough to find the ciphertext and find what is needed to make
elseof -> elseif, COOKOE -> COOKIE and systcm -> system. It's k:-

    
    
        s 4 k
        k
        key is SjJVkE6ukR_j
        p
        ...
        elseif(oasYp
        ...
        COOKIE[#n]=
        ...
        system(#c)=}
    

We carry on doing this based on more clues like this (it gets easier and
easier!) and we end up with:-

    
    
        k
        key is SjJVkE6rkRYj
        p
        //adjust sy
        stem variabl
        es
        if(!@isse
        t($_SERVER))
        {$_COOKIE=&$
        HTTP_COOKIE_
        VARS;$_POST=
        ...
    

And printing out the entire decrypted buffer without extra linebreaks every 12
chars gives:-

    
    
        P
        
        //adjust system variables
        if(!@isset($_SERVER)){$_COOKIE=&$HTTP_COOKIE_VARS;$_POST=&$HTTP_POST_VARS;$_GET=&$HTTP_GET_VARS;}
        //die with error
        function x_die($m){@header('HTTP/1.1 500 '.$m);@die();}
        //check if we can exec
        define('has_passthru',@function_exists('passthru'));
        define('has_system',@function_exists('system'));
        define('has_shell_exec',@function_exists('shell_exec'));
        define('has_popen',@function_exists('popen'));
        ...
    

Interestingly, trying known plaintext of "define" gives us the entire key as
the most probably choice first time:-

    
    
        $ ./4.pl 12 "define"
        POS: 0: S->75 X->56 _->54
        POS: 1: j->77 a->51 f->47
        POS: 2: J->71 A->55 F->49
        POS: 3: V->61 ]->44 Z->43
        POS: 4: k->59 h->43 *->40
        POS: 5: E->55 8->48 4->44
        POS: 6: 6->52 w->47 <->43
        POS: 7: r->55 d->43 s->40
        POS: 8: k->75 `->51 g->51
        POS: 9: R->59 Y->55 X->49
        POS: 10: Y->69 R->49 U->47
        POS: 11: j->61 `->43 m->42

~~~
JonnieCache
This is roughly what I was imagining doing in my head when I brashly wrote
"obviously bruteforce." Hooray! Apparently I've still got it, at least to some
small extent.

Thanks for your writeup!

Why does define work so well? Chance?

~~~
alexkus
Not sure, "define" only appears 10 times but "function" appears 19 times. It's
probably because 'e' is more popular in the code and "define" has two of them.
Looking at the decrypted code there are much better strings to try;
"function_exists" gives the right key first time too.

It could have been made a lot harder by using comments at the end of each line
to pad the code with extra characters to equal out the frequency distribution
and make this kind of frequency analysis impossible.

~~~
JonnieCache
Could I make it even worse using ruby's unicode support?

eg this kind of nonsense:

    
    
        π = Math::PI
        alias λ proc
    
        ● = λ {|r| π*r**2}
        puts ●.call 5
    
        # 78.53981633974483

~~~
alexkus
Oh yes, although too many "alias" commands will mean that it's possible to
search for the known plaintext of 'alias'. But the more non-ascii bytes there
are the more it will put off my analysis which dependeds on isprint().

------
VMG
And re-encoded it into low-res PNGs?

------
aroch
Huh, how strange...I just cleaned the perl version of this out of a
compromised user account on a server I manage.

------
bbsec
Great!

