
Show HN: Hiding code from Git diff - dimastopel
https://www.twistlock.com/2017/12/13/hiding-content-git-escape-sequence-twistlock-labs-experiment/
======
pkamb
The way I've noticed people (accidentally) "hiding code" is by making changes
during the conflict resolution stage of a git merge.

This often happens _after_ code review, and there's no oversight of the
conflict resolutions.

If the merge commit does get looked at, there will be potentially hundreds of
lines and files of automated diffs within which the unrelated new code is
hidden.

~~~
pkamb
What I personally do is commit the merge as is, _with_ all the <<< === >>>
conflict markers.

Then, in the next couple commits, fix each conflict via a normal commit that
mixes and matches "mine" and "theirs" as needed + deletes the markers.

Each resolution commit is independent of the giant merge commit, and can be
separately diff'ed and code reviewed via the github PR or any other tool.

Also easier IMO to make the complicated resolution you want, or reset back to
a previous resolution attempt without messing up the entire merge.

The drawback is that some commits won't compile, but compilable mid-branch
commits are not something that I view as a goal.

~~~
jsteemann
Non-compiling commits make bisecting harder, when the goal is to find the
commit that broke the build or a specific test. IMHO having to skip over non-
compiling merge commits only makes bisecting take more time, and it may also
"pollute" the test pipelines if all commits are going to be built.
Additionally, merging and fixing the merge in separate commits makes it much
hard to revert a merge later, in case it turned out that the merge introduced
something unwanted).

So I will always recommend doing the merge and fixing the merge in the same
commit. Ideally this produces a squash commit that can easily be reverted
later when needed. This makes the merge process more time-consuming, but it
can help to keep the builds more stable, and to more easily track which commit
(merge) introduced a particular problem.

~~~
pkamb
> makes it much hard to revert a merge later

I don't think this would apply, as I'd be merging from `dev` > `feature`,
resolving conflicts, and then doing a single clean no-conflict merge back from
`feature` > `dev`. That's the only commit you'd need to revert to undo the
merge.

(or on a new, third, `merge` branch, if you don't want to pollute the
`feature` branch with commits from `dev`)

I admit the non-compiling commits might be a problem in some workflows, but
`git bisect skip` is the solution to the only problem I've ever had with it.

~~~
jsteemann
It's kind of a problem if the compile/build step takes rather long. Then you
want to avoid having any additional non-compiling revision to check when
bisecting.

But I agree in general, if the compile/build turn-around times allow for it,
then a single no-conflict squash merge back into devel is definitely a very
good approach to have clean and compiling commits, and for identifying the
commits/merges that break the builds.

------
kbutler
The issue with with terminal handling of escape sequences, rather than with
git.

If you are concerned, you can use a gui diff tool, or you can pipe it through
a filter that removes escape codes:

    
    
      seal:~/work/gitPocDiff$ cat > uncolor
      #!/usr/bin/env perl
      ## uncolor — remove terminal escape sequences such as color changes
      while (<>) {
          s/ \e[ #%()*+\-.\/]. |
             (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
             (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
             (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
             \e.|[\x80-\x9f] //xg;
          print;
      }
      seal:~/work/gitPocDiff$ chmod a+x uncolor
      seal:~/work/gitPocDiff$ git diff eeeb ebc3 | ./uncolor
      diff --git a/main.c b/main.c
      index 6daa9b2..0321ff2 100644
      --- a/main.c
      +++ b/main.c
      @@ -3,4 +3,9 @@
       int main()
       {
           printf("I'm just a stub!\n");
      +    /*
      +     * Must always return a value Hidden > */printf("Insert bad backdoor here...!\n");/*
      +     * TODO: Return result
      +     */
      +    return 0;
       }
    

Credit to: gilles
[https://unix.stackexchange.com/a/14707](https://unix.stackexchange.com/a/14707)

~~~
realusername
You can also just pipe it to cat -e, it works the same. git diff | cat -e make
all escape sequences visible.

~~~
kbutler
There's always a utility... Thanks!

------
mikedilger
If the output of a git command was raw text sent to stdout, like cat, then I
would argue that escape sequences should be ignored and passed through
unmolested.

But the output of git tools is colored (if enabled) using terminal escape
sequences. Given that, the output should fully comply with the appropriate
terminal escape sequences and should filter the ones the author has
discovered.

------
deathanatos
Unix pipes lack any ability to say "this is the _kind_ of data that I'm
pushing you." The push raw, unidentified bytes. If the downstream program had
some idea what format the data was in, it would know how to output it.

Imagine an alternate universe where a pipe was a stream of bytes and a MIME
type. If a terminal gets data written to it from a program in
application/terminal.vt100, then it knows it should process those escape
sequences. If it get text/plain; charset=utf-8, it knows it should not, and
can take a different action.

I think such a universe would make for more pleasant piping, too. Imagine that
every interactive command on a terminal ended with an imaginary |show command;
it's job is to read the mimetype and display, on the terminal, that data. A
command that emitted CSV data could then automatically render in the terminal
as an actual table. A program that emitted a PNG could render an ASCII art
version of that image (or, since we're in an imaginary universe, imagine our
terminal has escape sequences for images – which actually exists in some
terminals today – it could emit the actual image!). Essentially, the program
emits the data in whatever form it natively speaks, and that implicit |show
parses that into something appropriate for the terminal. (That way, the
program can still be used in pipelines, such as make_a_png | resize_it.) If
you want your raw bytes, then you can imagine

    
    
      make_a_png | force_type 'application/octet-stream'
    

and then the final, implicit |show would perhaps output a nice hexdump for
you. (Since emitting raw bytes to a terminal never makes sense.)

Now, I'm a bit fuzzy on exactly how the pipeline being executed, the shell,
and the terminal all interact exactly, and I'm nearly certain such a change
would require low-level, POSIX-breaking changes. But it was a dream I had, and
I think it might be a better model than what we work w/ presently.

(In the article's case, though, git log/diff both emit terminal sequences, so
even in my imaginary world, they'd need to know that they should escape them.
It mostly works for simpler types, but git rm could conceptually emit just
text, since I don't think it ever colors.)

~~~
ken
This is one of the problems that I've been attempting to solve with Strukt[1].
Its operations work by processing a stream of objects (which have named keys,
and values with real types), rather than bytes.

[1]: [https://freerobotcollective.com](https://freerobotcollective.com)

You're right that shoving raw bytes to a terminal doesn't make sense. I
display binary data as a picture of the bits themselves. After a little while,
it becomes surprisingly informative. You can spot patterns like "ASCII(ish)
text" or "all 0's".

You're right, too, that once you start tugging on one corner, the whole thing
starts to come apart. That's why I didn't even bother trying to make it a
"Unix shell". There's just too many issues with trying to remain backwards
compatible that far back. When the operations aren't programs, for example, I
can optimize _between_ them.

Unfortunately, perhaps, "people who spend money on software" and "people who
are looking for a Unix shell" are pretty much disjoint sets, so I'm initially
going after markets like EDA and ETL.

~~~
deathanatos
Your website, BTW, serves itself as ISO-8859-1¹, but the actual data is UTF-8.
The result is that any non-ASCII is mojibake. (Such as the
degree/minute/second symbols in the location example, or the second text
example).

¹you serve a Content-Type of text/html, with no charset, and you have no
<meta> charset in the HTML itself, so this is the default.

> _I display binary data as a picture of the bits themselves._

And after looking at that on the website, that is a very interesting approach.

~~~
ken
> Your website, BTW, serves itself as ISO-8859-1

Thanks! I didn't realize that, since it worked fine in all of my browsers, and
nobody has said anything about it.

It's a simple HTML page (from some Jekyll templates), served up through an AWS
S3 bucket, and I just learned that while it auto-detects mime-types, it
doesn't do this for encodings. Fortunately, there is a way to specify it by
hand [1].

[1]: [https://github.com/aws/aws-
cli/issues/1346#issuecomment-3332...](https://github.com/aws/aws-
cli/issues/1346#issuecomment-333268233)

Should be fixed now.

> And after looking at that on the website, that is a very interesting
> approach.

I've been surprised by the response to this. It's the 5th or 6th design I
tried, and the first one I didn't totally hate, but I've had a user tell me
it's super cool and I should feature it more prominently.

One of my philosophies is "When in doubt, show the user their data".

------
ComputerGuru
I actually reported this same issue but under the context of a bug rather than
as a security issue, exactly 2 months ago: [https://public-
inbox.org/git/CACcTrKctqAWeWWrc9Q+Y7ewXGc_o+u...](https://public-
inbox.org/git/CACcTrKctqAWeWWrc9Q+Y7ewXGc_o+uJoeHS83LDw5O_s1-3Nug@mail.gmail.com/)

Same response, “it’s the pager’s job,” in that case.

~~~
jwilk
I mentioned the problem on oss-security in 2016:

[http://openwall.com/lists/oss-
security/2016/04/21/9](http://openwall.com/lists/oss-security/2016/04/21/9)

(I think my claim that git escapes some control characters is incorrect; it's
less that does it.)

> Same response, “it’s the pager’s job,” in that case.

It can't be the pagers job, because pager has no way to distingiush from
escapes generated by git (to make things colorful) from malicious escapes.

------
CGamesPlay
This is tenuous at best. iTerm2, at least, doesn't support the "conceal" color
(^[[8m) and any other color would rely on the "victim's" color scheme to work
properly. Git is safely removing sequences that could actually cause problems
like OSC. This technique also wouldn't stand up to review on Github or any
other online code review platform.

------
carapace
(Tiny grey sans-serif body text means you hate your readers' eyes.)

------
jsteemann
One mitigation for this is to use `col -bp` as a git pager, but it seems
clumsy. However, for completeness I post it here with an example for how this
makes the backdoor visible.

Reproduce instruction:

# clone repo and cd into it

git clone
[https://github.com/twistlock/gitPocDiff](https://github.com/twistlock/gitPocDiff)

cd gitPocDiff

# show diff (this includes the backdoor but it won't be visible)

git show ebc3f506a7ec8278e1a3ad4108612b66d10b41ca

# expose the backdoor, using `col -bp` as pager

git show ebc3f506a7ec8278e1a3ad4108612b66d10b41ca | col -bp

~~~
jwilk
No, "col -bp" is not bulletproof. The attacker could still use backspace
characters to hide malicious content.

------
tomcam
Unfortunately had to stop reading due to their text-hiding scheme--miniscule
light-gray text on white background

~~~
mrob
Firefox's Reader View reduces the problem, and with some custom CSS, solves it
completely (see
[https://news.ycombinator.com/item?id=15080363](https://news.ycombinator.com/item?id=15080363)
). I imagine other browsers include a similar solution.

------
amigoingtodie
What about code being 'deleted' that git diff does not catch?

Is there a recommended external diff tool that ameliorates these issues?

~~~
hawski
Maybe cat -v?

    
    
        $ git diff | cat -v
    

However it's considered harmful:
[http://harmful.cat-v.org/cat-v/](http://harmful.cat-v.org/cat-v/) ;)

~~~
UncleEntity
or

git diff > tmp.diff

and open with a text editor?

------
ChuckMcM
Interesting to see how the standardization of escape sequences by ANSI leads
to a vulnerability in code obfuscation.

~~~
shabble
we need a 'secure escape' sequence that disables all subsequent escapes until
rescinded!

And it can change the background colour of the affected lines to a shade
unrepresentable by unprivileged sequences.

Mostly tongue-in-cheek, but... one of these years I'm going to track down what
the heck the 'PRIVACY MESSAGE' sequence is actually for. ECMA48 is
infuriatingly vague, and nothing seems to actually use or support it.

------
unhammer
magit ftw: [https://i.imgur.com/a8Mvbkf.png](https://i.imgur.com/a8Mvbkf.png)

Though I'm sure Emacs has loads of other vulnerabilities :)

------
always_good
Some pitiful Vimming right there. From holding down `h` for 30 seconds to go
to the beginning of line to using :wq instead of :x.

