
Beyond Source Maps - mnemonik
http://fitzgeraldnick.com/weblog/55/
======
sillysaurus3
In case anyone else is wondering what a source map is, it's apparently
Javascript's version of C's debug symbols. That is, given a chunk of optimized
runtime code, it will show you which part of your original source code
generated it.

~~~
kevingadd
Source maps are barely like debug symbols. All they do is map lines to lines.
No information on where variables went, no info on name mangling...

In practice they are only useful for less-aggressive minifiers and 'basically
sugar' compilers like coffee script. Trying to use them for something like
emscripten or JSIL output is a mess.

~~~
cromwellian
That is not true, the fifth field of the mapping is the original name of the
variable.

~~~
kevingadd
I just reread the spec and no mention of variables exists in it. That was
definitely not added as a feature when I read the spec initially. Can you cite
the relevant part of the spec and demonstrate how it is used?

P.S. Variables live in scopes, so attaching variable information to line
mappings is pretty gross... I suppose if it works it works, so better than
nothing.

~~~
cromwellian
[https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiO...](https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit)

The 'names' field is a list of all original symbols in the program. Each
mapping is 1, 4 or 5 fields long. If 5 fields long, the 5th field is an index
into this names field, e.g.

Or see a more readable description on html5rocks
[http://www.html5rocks.com/en/tutorials/developertools/source...](http://www.html5rocks.com/en/tutorials/developertools/sourcemaps/)

~~~
kevingadd
Thank you. Even now, rereading the spec and looking at the part you
identified, I can barely tell it does what it does. That html5rocks overview
is much better.

It'd be great if they revised the spec to not use such opaque terminology, and
directly describe what the 'names' list is used for. The only reference I can
find that explains this is 'If present, the zero-based index into the “names”
list associated with this segment', which I only understand how that you said
what it means.

Since one of your other comments made it unclear: Does this apply to _all_
variables, or only global variables?

~~~
skybrian
I haven't seen the names field actually used by a debugger, so it's somewhat
theoretical.

~~~
cromwellian
I don't think you've seen them in Chrome Dev Tools with GWT because I don't
believe we are generating the source maps properly. IIRC, I asked about this
before and was told Dev Tools does support deobfuscating globals as far as the
stack frame is concerned. The fact that we are still seeing obfuscated
function names on the call stack I think is our bug.

Edit: I take that back. I just looked at the Blink source code, as far as I
can tell, it throws away the name index. Therefore, Chrome Dev Tools won't
deobfuscate symbols, but Google server-side JS deobfuscation framework will,
which is a pitty. I guess it is time for a Chrome Dev Tools patch. :)

------
kevingadd
Glad to see people thinking about making source maps useful for more real
world scenarios. As mentioned in my other comment
([https://news.ycombinator.com/item?id=7388775](https://news.ycombinator.com/item?id=7388775))
they are not useful for most 'js as compilation target' scenarios.

~~~
azakai
I think that's a bit negative. Yes, they don't provide important things like
identifying variables as you mention, but mapping back to the original source
is still extremely useful by itself.

~~~
kevingadd
A large subset of scenarios do not have a 1-1 mapping of output source to
input source. Source maps have been represented as a solution to various
debugging woes when they strictly seem to handle 1-1 mapping and nothing else.
The linked blog post addresses this to a degree, and that's why I think it's
great to see it cover this topic.

The misleading presentation of source map features is problematic because I
think it leads end-users of transpilers to believe that it will solve their
problems and ask for it as a feature addition, when in reality it won't do
them much good.

I do think the people who specced source maps were correct to start with the
narrowest possible feature set, though, even if the resulting spec seems
strangely micro-optimized in ways that make it much harder to generate than it
would be otherwise.

~~~
cromwellian
I think the motivation of the original design was probably efficiently. Google
ships many very large JS bundles, Gmail, Calendar, G+, etc. I think Gmail may
be on the order of 4Mb obfuscated. Loading up large maps would hog tons of
memory and be slow.

SourceMaps originally came out of a need for better stack trace deobfuscation
I think, use inside of debuggers came way later.

I am totally onboard with fixing the deficiencies of the current format to
make debugging better. Now that neither Firefox nor Chrome support the JSVM
hooks we needed to make GWT DevMode work, we are stuck with sourcemaps for
debugging, and Java programmers are not pleased with having to look at mangled
symbol names in their scopes and watches.

I don't really care what the format of the file is, as long as it preserves a
number of features:

1) can still ship and debug 'stripped' binaries, doing deobfuscation offline
on the server 2) memory and network efficient. Don't make my JS debugger
freeze loading these things in, or each up hundreds of megabytes 3) supports
cascading and merging. We have hybrid apps at Google, that is, apps that are
compiled with multiple transpilers and linked together. We 'merge' sourcemaps
from these together in a unified sourcemap. So whatever the future format is,
it should support efficient composition.

~~~
kevingadd
Yeah, I think they did a decent job of optimizing for constraints, other than
the use of JSON and VLQ (Honestly, protobufs would have been a universally
better encoding for this, since they provide fixed-size records you can seek
through and already offer a superior variable-length encoding.)

I think extending what we already have instead of reinventing the format is
smart. All of your concerns listed at the end are things I care about too.

------
bluerobotcat
I'm not completely sure I'm reading this proposal correctly, but at first
blush it seems like it would only support lexically scoped variables. If so
then that would be an unfortunate limitation.

ClojureScript, for instance, supports dynamic bindings.

~~~
mnemonik
Perfect example of why having an extensible format is so important! It should
be easy to just add a new debugging annotation for dynamic bindings without
changing the format at all!

------
cromwellian
We've had some of these problems with GWT, but I think some of them can be
solved with a much simpler mechanism than serializing an AST.

For example, for each symbol we record, we can simply introduce something
called a 'scope identifier' and then define a language specific mechanism by
which debuggers can compute scope identifiers.

Consider the following problem for GWT:

class Foo { String hello; }

class Bar { String world; }

Both of these can be compiled and obfuscated an an object with field 'a'. So
when encountering a heap reference to an object with field 'a', does it
deobfuscate to 'hello' or deobfuscate to 'world'?

We have many kinds of scopes: global scope, object scope, function scope, etc.
In this case, its an object scope. If we uniquely identify each scope, we can
perform the mapping as long as there's a way to compute runtime type, e.g.

Store 'a' as '2:a' (scope identifier '2') Put a mapping in the sourcemap that
'Foo' maps to '2', and now we can introspect the heap reference, determine it
is 'Foo', map it to '2', and properly display it as 'hello'

Different languages may have different ways of setting up classes and defining
runtime types. GWT for example, does not rely on JS constructors, but instead
hangs a classId off of the prototype. I think Dart does something different.
In any case, it is a simple extension to the existing format and backwards
compatible. The only runtime API needed is a function
getScopeIdForReference(obj) which takes a compiled object and gives its scope.

The AST approach I think has several problems:

1) if the annotations are present in the shipped obfuscated code, it leaks
information that people don't want. Many people do not want to leak the source
filenames or directory structure of production apps.

2) it increases the download size for consumers who are not developers and
don't need the maps

3) if written out as a separate file that co-exists with a stripped binary, it
increases memory requirements of the debugger.

The way Google deploys Gmail, for example, we use sourcemaps in production as
well as development. When exceptions are thrown on the client, the stack
traces are reported back to servers, deobfuscated via sourcemaps, and put into
a triage system. A person looking at a bug report sees a deobfuscated trace,
but the consumers aren't burned with having to download the sourcemaps in
annotated js.

I think the idea of defining an actual sourcemap API for the debugger
(getDisplayValues, getLocals, eval, etc) has a lot of value, because these
function are source-language dependent, so having
GWT/Dart/Emscripten/CoffeeScript/etc generate them is useful.

But I don't think annotating the actual JS, or switching the sourcemap format
to an AST is a big win. Yes, it is a simplification, but it also trades off
other useful properties of sourcemaps.

~~~
mnemonik
_> 1) if the annotations are present in the shipped obfuscated code, it leaks
information that people don't want. Many people do not want to leak the source
filenames or directory structure of production apps._

You would be able to continue using the approach you describe for GMail. There
is no reason to publicly serve the debugging information unless you want to.

 _> 2) it increases the download size for consumers who are not developers and
don't need the maps_

No, the debugging information would still be an auxiliary file like it is now.

 _3) if written out as a separate file that co-exists with a stripped binary,
it increases memory requirements of the debugger._

Debuggers need to keep the source map around now, anyways; this is no
different.

Regarding the scope identifiers: it would solve scoping for the most part, but
it fails to handle the last three requirements I defined:

\- _It should provide a way for the JavaScript debugger to display values in a
meaningful way._

\- _It should optionally provide an eval capability, for use from a REPL,
watch expression, or conditional breakpoint._

\- _The format should be future-extensible. That is, when SourceMap.next v2
rolls out, any SourceMap.next v1 consumer should still be able to parse and
use instances of SourceMap.next v2 (although without the new features, of
course)._

Inspecting values would still be a pain, and you wouldn't have watch
expressions or conditional breakpoints, etc.

Also, much of the value in what I described in the blog post is the future-
extensible format. It allows us to fix our mistakes post facto.

------
elwell
Or just sit back and hope for a super cool Google dev tools update.

~~~
21echoes
did you read who the blog post was written by?.. Nick co-authored the paper
that defined source maps, and implemented the Firefox dev tools part of it. In
other words, if there's going to be a "super cool [Google/Firefox] dev tools
update", there's a pretty good chance he's gonna (help to) write it.

