
New Statistical De-minifier and De-obfuscator for JavaScript - mvechev
http://www.jsnice.org
======
pajtai
I took the code presented and put it into packer -
[http://dean.edwards.name/packer/](http://dean.edwards.name/packer/) , and the
"nice" output was not very helpful in that it still looked obfuscated. Maybe
I'm misunderstanding something.

So the input to js nice was the packed generateSeries function:

eval(function(p,a,c,k,e,r){e=function(c)
{return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?
String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,
String)){while(c--)r[e(c)]=k[c]||e(c);k=[function(e){return
r[e]}];e=function(){return'\\\w+'};c=1};while(c-- )if(k[c])p=p.replace(new
RegExp('\\\b'+e(c)+'\\\b','g'),k[c]);return p}('4 3(e){2 t=[];2 n=f+e;2
r=6+e;k(i=1;i<=6;i++){2 s=5.y(5.a()*(r- n+1)+n);t.b([i,s]);n++;r++}c
t}$(d).B(4(){2 e=3(0);2 t=3(g);$.h($("#j"),[{7:"l",8:e},{7:"o",8:t}],{p:{q:{u:
["#v","#w"]}},x:{9:z},A:
{9:m}})})',38,38,'||var|generateSeries|function|Math|200|label
|data|ticks|random|push|return|document||100|300|plot||flotcon
tainer|for|data1|10||data2|grid|backgroundColor||||colors|D1D1
D1|7A7A7A|xaxis|floor|20|yaxis|ready'.split('|'),0,{}))

~~~
FiloSottile
The various jsBeautifiers special-case the eval() obfuscations, and in
particular packer.

I think the scope of this tool is to annotate and de-uglify js code without
changing the logic, so you get a more readable version of the routine that
generates the eval()'d code, that sounds right to me.

EDIT: btw, can you please put the code in a code box, it messes with the
layout. Mods: this is happening frequently, is it a bug?

~~~
dang
Yes, it is a bug and on our list to fix. In the meantime, we fix it manually
when we see it. It would be helpful to fire a note to hn@ycombinator.com when
you notice it.

~~~
FiloSottile
Thanks! Will do.

------
joliss
About:
[http://www.srl.inf.ethz.ch/jsnice.php](http://www.srl.inf.ethz.ch/jsnice.php)

Apparently this is a machine learning research project.

It seems that this would be quite useful for looking at the code of closed-
source webapps. Would love to see it open-sourced or available as a service
(via an API perhaps). Think auto-deminifying browser extension.

~~~
geetee
_On average, more than 60% of the identifiers are recovered to the same name
as before the minification process._

I wonder how much of that 60% comes from common libraries like jQuery or
Underscore. Still, a neat project.

~~~
veselin
We tried to evaluate on data that is as much as possible independent from the
training data. So we evaluate on projects outside of github.

------
enixn
So, I saw this and wanted to see if it could decode output of something I saw
earlier:
[http://patriciopalladino.com/files/hieroglyphy/](http://patriciopalladino.com/files/hieroglyphy/)

I took the source of a sample pasted here and ran it through the heiroglyphy
generator:

var expect = function(val) { return "string" == typeof val; };

which output something along the lines of (truncated):

[][(![]+[])[!+[]+!![]+!![]]+([]+{})[+!![]]+(!![]+[])[+!![]]+(!![]+[])[+[]]][([]+{})[!+[]+!![]+!![]+!![]+!![]]+([]+{})...

But JSNice was unable to deobfuscate this code. Any ideas why?

~~~
NaNaN
This involves program optimizations. JSNice doesn't optimize `var x = 1+1` to
`var x = 2`

------
mallamanis
Naturalize (
[http://groups.inf.ed.ac.uk/naturalize/](http://groups.inf.ed.ac.uk/naturalize/)
) is another related machine learning based tool that suggests appropriate
identifier names for Java. Details on the implementation and an evaluation may
be found here [http://arxiv.org/abs/1402.4182](http://arxiv.org/abs/1402.4182)

------
infogulch
Whoa, it even infers local variable names automatically.

------
Groxx
Some rather spectacular failures, though of course these things happen with
statistical methods:

    
    
      /**
       * @param {string} val
       * @return {?}
       */
      var expect = function(val) {
        return "string" == typeof val;
      };
      /**
       * @param {boolean} deepDataAndEvents
       * @return {?}
       */
      var clone = function(deepDataAndEvents) {
        return "boolean" == typeof deepDataAndEvents;
      };
      /**
       * @param {(boolean|number|string)} obj
       * @return {?}
       */
      var isString = function(obj) {
        return "number" == typeof obj;
      };
    

Still, neat idea. Seems like there's a lot of room to train it, probably a lot
of fun to try to improve things :)

------
tantalor
Why was it unable to infer the type of the return value for generateSeries?
The function only has one return statement, and it already knows the type of
the return variable is {Array}.

~~~
veselin
It never shows types of return values, unless it is undefined. The reason is
that the method may be overridden with a method with other return type.

------
FeatureRush
Can this approach be extended to for example generating "matching" tests for
the code? Like "I see this function processes dates, here are some popular
test cases learned from 1000s other projects"?

Could someone point me to good resources about mining code, most data mining
and machine learning articles deal with points in multidimensional space and
not objects with complex internal structure like programs...

~~~
infogulch
The complex internal structure of programs could be reduced to graphs, and
machine learning has been working with graphs for decades.

------
yconst
Very interesting project. Would be great to get some more info on the actual
algorithms being used. The about page offers relatively limited info.

------
dblotsky
I wonder how many people, like me, right off the bad went and pasted minified
jQuery into this thing. Looks like an invaluable tool for reverse-engineering
when unminified code is unavailable.

------
leeoniya
didn't do too well on
[http://js1k.com/2014-dragons/details/1903](http://js1k.com/2014-dragons/details/1903)
:)

but processed jQuery relatively fast.

------
joshribakoff
I put in some real world JS found on Hulu, and got a slew of errors like this
one:

Line 1: Parse error. missing ; before statement

~~~
mvechev
Josh,

Thanks for trying it out. I tried few large samples from Hulu and they seemed
to work fine, e.g.:

[http://static.huluim.com/huluguru/i18n/en-
us/translations-9d...](http://static.huluim.com/huluguru/i18n/en-
us/translations-9d4d6e9074fa024132b56b9ce3aeee71.js)

But indeed, sometimes there could be issues if the code does not compile with
the compiler of choice.

------
mvechev
Thank you all for the comments...keep them coming.

we will definitely soon provide more details on how the overall system
works...

------
SimeVidas
> var width = $container.height();

Hehe

------
veselin
Yes, it is statistical. A browser extension would be cool!

------
pbielik
Handy tool to have. Works suprisingly good.

------
nolanpro
What is this sorcery?!?!?

Infinitely helpful in reverse engineering google stuff like
www.googletagservices.com/tag/js/gpt.js

Thanks!!!

------
enscr
Is it doing something beyond pretty print in Chrome?

~~~
placeybordeaux
Yes.

------
insyncim64
Nice tool!

