Hacker News new | past | comments | ask | show | jobs | submit login
Improving our homemade JavaScript obfuscator (antoinevastel.com)
58 points by avastel on Sept 9, 2019 | hide | past | favorite | 29 comments



Ugh. This whole blog post is giving me the heebie-jeebies. Replacing all static accesses with dynamic ones, and with endless function calls everywhere has to destroy performance. You're intentionally making it hard for the JIT compiler to do its work.

And for what? Obfuscating your code so that people don't steal it or whatever? This kind of obfuscation can be easily and programmatically reversed engineered if someone really wants to, so... why do it? Just to screw with people trying to look at the source code of a web page?

People complain about JavaScript minifiers and WebAssembly that they're making the web less open and hackable, but at least those things have a point to them. There's a performance upside! This is just "naah, lets make the web slower, more closed and less hackable, for... you know... reasons."


It strikes me as bizarre dissonance to write a blog post sharing the work you have done to avoid sharing your work.


The title of his blog implies he is "working on browser fingerprinting" and is proud of his occupation. So obviously the work he does is for ad agencies that frankly never gave two fricks about performance.

Unless you are making a browser game and want to make life sad for cheaters, I don't really see a need or reason for obfuscating, and even then its not even ideal and a biproduct of a lazy solution (ie. not running simulation on a server)


I would guess he was or is looking to defeat obfuscation of some bots (see his previous post https://antoinevastel.com/javascript/2019/08/31/sneakers-sup...) and then got distracted/fascinated by the techniques the obfuscators used. I'd be generous and read the post in the satirical style of "10 ways you can screw up the web".

BTW I totally agree with the sentiment that obfuscation is bad - and I would include shipping bundles without sourcemaps. The web got popular because of transparency, and people learning from each other. (If you really have IP-expressed-in-code you want to protect, don't ship it to clients!)


He's a researcher, he might be working on browser fingerprinting so that browsers can avoid it. That's how I interpreted it, at least; the fact that he interned at Brave seems to confirm it.


Seems like most of his work on fingerprinting is centered around bot and headless browser detection.


> So obviously the work he does is for ad agencies that frankly never gave two fricks about performance.

Yikes, super hostile about an assumption you pulled out of your bum. Let's not dismiss other people's work so flippantly.


You are right. In hindsight it wasn't a reasonable assumption.


If I were to be making a JavaScript obfuscator, I would simply start by rewriting the AST so that Exceptions would become the driver behind the code. That way, it would make it really hard to reverse the code without executing it.

Also sprinkle some parts of the code that check how much time it takes to execute it and then takes a different code path if it was interrupted.

What is done here is child's play, the author is clearly not familiar with old-school assembly obfuscation - this code is one script away from being de-obfuscated.


Ouch, that does not sound right.

May I asume, quite some bluescreen error's etc. were the result of madness like this?

"sprinkle some parts of the code that check how much time it takes to execute it and then takes a different code path if it was interrupted"

I mean, I respect it technically, if someone can do this and not disrupt behaviour or performance, but I doubt it is a smart thing to do, if stability and performance is the goal. An I believe that should be the goal of any software ...


Do you have any references for the old-school assembly obfuscation?


I had once worked on an executable packed by Themida. It used:

  - 8 layers of decrypting the initial executable
  - one of them decrypted the import table from the executable
  - each of those layers employed several methods of detecting that you're running under debugger
  - each of those layers employed methods of causing exceptions in popular debugging software
  - every single memory page was also encrypted while running, and a breakpoint was set up whenever a jump was made to it. The protection mechanism would first decrypt the next page and then encrypt the previous page.
Even back then (15 years ago), the more complex option of Themida would generate a unique virtual machine with a unique bytecode for itself for a given executable, which would then execute itself.

[1] https://reverseengineering.stackexchange.com/questions/16966...


So does Themida represent the state of the art on obfuscation, or is that found in state sponsored malware, do you have any idea?


The current version of themida is good, and widely used, but not much of an obstacle to experienced people (although it depends on what you're trying to do).

Denuvo is widely used on AAA titles and seems like a pain in the ass to deal with (ie games seem to take a while to pirate when protected with it and it adds a stupid CPU burden at times), but it doesn't have the edge it once had in the battle.

https://en.wikipedia.org/wiki/Denuvo


Timers in .js are fuzzed. Good luck getting accurate timings.


In my (extremely limited) experience with reversing JS, I'm pretty sure I've already seen these obfuscation techniques before, and common deobfuscators of the time had no problem reversing the transformation. It doesn't stop anyone except the most easily discouraged.

(The JS that's used to detect adblockers and/or coerce you into viewing ads is often obfuscated. Those of you who have played around with this stuff may recognise this keyword: DtsBlkVFQx.)


The proposed scheme trashes the performance while providing a primitive protection that is statically observable (e.g. distinguishable) and thus easily reversible.

Looks insane, in a bad way.


I would like to see the performance differences between the original and obfuscated. Most of the compiler optimizations are being made impossible by removing static access. Plus, a reverse-obfuscator is trivial for all those static-to-dynamic and base64 encoding.


The last paragraph discusses that this will be the topic of the next blog post.


Code obfuscation is idiotic and pointless no matter what form it takes, but this example is particularly egregious. This accomplishes little more than degrading performance across the board for the very real end user (bye bye JIT optimizations) while requiring some purely hypothetical reverse engineer to write one additional script before (gasp) reading the code.

I'm genuinely curious how much time and money was wasted on this imbecilic venture.


Obfuscation may seem a weird topic nowadays but only because it is lagging behind conventional data encryption. Once it achieves more fundamental results the perception will likely change.


Obfuscation provides no concrete guarantees on being able to prevent reverse engineering, as encryption does.


Aren't JS obfuscators futile? The code will still end up in the JS VM, and you can't really obfuscate the actual AST.


The next step of this cat and mouse game would be for the Javascript interpreter to detect intentionally obfuscated code and then give the user an option to stop executing the scripts on the page, just like how browsers do if it detects a script is taking too long to execute, with the default option being to stop it. This could be done heuristically from the JIT based on the CFG and the entropy of the symbol table.


While you might have access to the AST, obfuscated JS is still harder for humans to parse, and reverse engineer what the original function is doing.

For example:

    var _0x11f8=['\x6c\x6f\x67'];(function(_0x417d16,_0x4a5f0f){var _0x2d056f=function(_0x25076e){while(--_0x25076e){_0x417d16['push'](_0x417d16['shift']());}};var _0x26edf0=function(){var _0x522f05={'data':{'key':'cookie','value':'timeout'},'setCookie':function(_0x3d387a,_0x44e067,_0xfea505,_0x41ff72){_0x41ff72=_0x41ff72||{};var _0x47c6cd=_0x44e067+'='+_0xfea505;var _0x39897e=0x0;for(var _0x39897e=0x0,_0x329917=_0x3d387a['length'];_0x39897e<_0x329917;_0x39897e++){var _0x1c67fb=_0x3d387a[_0x39897e];_0x47c6cd+=';\x20'+_0x1c67fb;var _0x4c67f9=_0x3d387a[_0x1c67fb];_0x3d387a['push'](_0x4c67f9);_0x329917=_0x3d387a['length'];if(_0x4c67f9!==!![]){_0x47c6cd+='='+_0x4c67f9;}}_0x41ff72['cookie']=_0x47c6cd;},'removeCookie':function(){return'dev';},'getCookie':function(_0xc34cfd,_0x2e2a50){_0xc34cfd=_0xc34cfd||function(_0x3fc227){return _0x3fc227;};var _0x316ad0=_0xc34cfd(new RegExp('(?:^|;\x20)'+_0x2e2a50['replace'](/([.$?*|{}()[]\/+^])/g,'$1')+'=([^;]*)'));var _0x67e8e8=function(_0x595d8b,_0x29390a){_0x595d8b(++_0x29390a);};_0x67e8e8(_0x2d056f,_0x4a5f0f);return _0x316ad0?decodeURIComponent(_0x316ad0[0x1]):undefined;}};var _0x4341d3=function(){var _0x10361d=new RegExp('\x5cw+\x20*\x5c(\x5c)\x20*{\x5cw+\x20*[\x27|\x22].+[\x27|\x22];?\x20*}');return _0x10361d['test'](_0x522f05['removeCookie']['toString']());};_0x522f05['updateCookie']=_0x4341d3;var _0x44df89='';var _0x4a0249=_0x522f05['updateCookie']();if(!_0x4a0249){_0x522f05['setCookie'](['*'],'counter',0x1);}else if(_0x4a0249){_0x44df89=_0x522f05['getCookie'](null,'counter');}else{_0x522f05['removeCookie']();}};_0x26edf0();}(_0x11f8,0x126));var _0x2135=function(_0x32c6fa,_0x552733){_0x32c6fa=_0x32c6fa-0x0;var _0x18e137=_0x11f8[_0x32c6fa];return _0x18e137;};function _0x33f554(_0x101bb5,_0x53fcac,..._0x27ef18){var _0x36160c=function(){var _0xddce83=!![];return function(_0x3f37b8,_0x278fb0){var _0x3f096b=_0xddce83?function(){if(_0x278fb0){var _0x701868=_0x278fb0['apply'](_0x3f37b8,arguments);_0x278fb0=null;return _0x701868;}}:function(){};_0xddce83=![];return _0x3f096b;};}();var _0x4810be=_0x36160c(this,function(){var _0x31f51a=function(){return'\x64\x65\x76';},_0x28b288=function(){return'\x77\x69\x6e\x64\x6f\x77';};var _0x5848fd=function(){var _0x59a244=new RegExp('\x5c\x77\x2b\x20\x2a\x5c\x28\x5c\x29\x20\x2a\x7b\x5c\x77\x2b\x20\x2a\x5b\x27\x7c\x22\x5d\x2e\x2b\x5b\x27\x7c\x22\x5d\x3b\x3f\x20\x2a\x7d');return!_0x59a244['\x74\x65\x73\x74'](_0x31f51a['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x46df82=function(){var _0x5b3a8d=new RegExp('\x28\x5c\x5c\x5b\x78\x7c\x75\x5d\x28\x5c\x77\x29\x7b\x32\x2c\x34\x7d\x29\x2b');return _0x5b3a8d['\x74\x65\x73\x74'](_0x28b288['\x74\x6f\x53\x74\x72\x69\x6e\x67']());};var _0x4b23f3=function(_0x5ac3b0){var _0x4153a0=~-0x1>>0x1+0xff%0x0;if(_0x5ac3b0['\x69\x6e\x64\x65\x78\x4f\x66']('\x69'===_0x4153a0)){_0x3d9fe5(_0x5ac3b0);}};var _0x3d9fe5=function(_0x467f75){var _0x4c33bb=~-0x4>>0x1+0xff%0x0;if(_0x467f75['\x69\x6e\x64\x65\x78\x4f\x66']((!![]+'')[0x3])!==_0x4c33bb){_0x4b23f3(_0x467f75);}};if(!_0x5848fd()){if(!_0x46df82()){_0x4b23f3('\x69\x6e\x64\u0435\x78\x4f\x66');}else{_0x4b23f3('\x69\x6e\x64\x65\x78\x4f\x66');}}else{_0x4b23f3('\x69\x6e\x64\u0435\x78\x4f\x66');}});_0x4810be();let _0xdbe94a=_0x101bb5/_0x53fcac;let _0x26ea56=_0x27ef18['\x72\x65\x64\x75\x63\x65']((_0xee94a6,_0x15b1b8)=>_0xee94a6*_0x15b1b8,_0xdbe94a);console[_0x2135('0x0')](_0x26ea56);}_0x33f554(0x1,0x2,0x3,0x4);
and

    function test(x, y, ...args) {
      let z = x / y;
      let ret = args.reduce((a, v) => (a * v), z);
      console.log(ret);
    }
    test(1, 2, 3, 4);
are functionally identical. (I used an off the shelf obfuscator)

No amount of obfuscation will stop a determined reverse engineer from pulling apart your code, but it can increase the cognitive load to the point where most people won't bother.

Also, just in case, nobody should ever run untrusted code, particularly not obfuscated untrusted code. Including the examples I posted above.


Absolutely correct. Yet the only point of obfuscating JavaScript I can see is to execute malign code, which implies that it's targeted at people who "won't bother" deobfuscating or even looking at it anyway, which makes obfuscation redundant. And obfuscation won't stop a determined engineer anyway. So, uh, ¯\_(ツ)_/¯.

Thus, it only looks vaguely interesting from an academic point of view. But even then, it's just a matter of flattening and spamming the AST, but you can only get this far with JS.


I think you're quite correct. If you're investing in obfuscation, you're probably making the wrong investment. Malicious actors have to obfuscate to avoid detection for as long as possible, and benign actors would probably be better served focusing on their core tech. That said, obfuscation is very good at increasing the difficulty of interpreting and understanding the purpose of code.


obfuscated JS is still harder for humans to parse

No one serious will analyse obfuscated JS without passing it through a deobfuscator first, so the point is moot. But if you really want to make your code "look confusing", there's more amusing obfuscators for that (also easily deobfuscated):

http://utf-8.jp/public/aaencode.html

http://utf-8.jp/public/jjencode.html


When I reviewed different obfuscator products, it was based on the idea it was only a "speedbump," for our threat actors, and imposed the cost of someone having both the motive and means to reverse it.

If you would prefer that a summer student at an enterprise customer doesn't replace your product with an in-house work-alike, it's useful. Similar if it's cheaper to buy your product than spending several hours reversing it.

If your business model relies on the integrity of a secret (key, derivation component, method, etc), it probably has a single, catastrophic failure mode and obfuscation isn't your solution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: