Hacker News new | comments | show | ask | jobs | submit login
Git Log in HTML: A Harder Problem and a Safe Solution (oilshell.org)
25 points by chubot 11 months ago | hide | past | web | favorite | 13 comments



The author goes to lengths to explain how bash sucks at embedded null bytes in variable values (strings). But Bash is fine with nulls in pipes:

    $ printf 'foo\x00bar\x00baz' | xargs -0 -n1 | wc -l
    3
The real bash solution is you'd write a command to escape HTML entities, and set up the pipeline so that bash never has to assign any values that might have nulls. If I wanted a shell that was better at programming & manipulating test data, I'd use perl.


(author here) Yes, bash can write NUL to stdout. But the point is that it can't store NUL in a string. You can't just replace \x01 from the last solution with \x00, e.g.

Try:

   git log --pretty=format:$'%h\x00%s%\x00'
The string is truncated at the first NUL. Compare with:

   git log --pretty=format:'%h%x00%s%x00'
This matters because not all tools support something like %x00 -- it's specific to git.

printf works because it's not storing NUL in a string, rather an escaped version, similar to echo -e '\x00' | od -c.

Also I don't see the difference between "real bash solution" and what I described. That's exactly what I do, but you can't escape the entire string because it would remove your markup.

I'd like to see a solution that is shorter but also correct against adversarial input like mine is.


EDIT: I should say that the first command wouldn't work anyway because argv is NUL-terminated -- I mentioned this in "Note 2" with respect to grep.

But still I'm failing to see what your point is about the "real bash solution". That's what this is, at least if you want to be 100% correct about escaping.

I criticized the "normal bash solution" in the section on "The Worst Parts of Shell".


> But still I'm failing to see what your point is about the "real bash solution"

It's accepting that bash sucks at strings, and trying to keep it as far away as possible from the data in the pipeline. You keep pointing to the nulls and say, bash is bad at this and oil is good; but it's a strawman since bash doesn't have to even see the nulls for a bash script to be 100% correct at escaping the git log messages.


As noted in the article, I'm suggesting that the shell itself should able to handle searching a stream for NULs, without this roundabout way of doing things:

    count-nul() {
      # -o puts every match on its own line.  (grep -o -c doesn't work.)
      od -A n -t x1 | grep -o '00' | wc -l
    }
You can do this in Python or Perl, but it's not a stretch for the shell to do it directly.


And I’m saying, this isn’t a feature I feel the lack of in bash. If I’m working with non-text data, I’ll move to a language with better handling of complex data structures like perl & explicitly handle concurrency myself.

To be clear: oil has to be a compelling alternative to perl, not bash.


On the original post he says switching to a "real" PL is probably "right" but onerous. I won't nitpick on whether or not it's onerous, but maybe it's not as onerous as the author thinks.

A few years ago I blogged about generating a Markdown changelog based on PRs http://mattdeboard.net/2014/01/14/automatic-changelog-genera...

Though the output is different (just concerned with merge commits), I reckon you could probably do something similar for HTML with a `git log` command modified from what's in my post. No git API library needed.

That said, the author wanted to explore doing this just with shell and deal with the problems and stuff that come up there, so, cool! I'm not passing judgment on that or comparing what I wrote to what he did, just throwing it out there.


I agree—I love pushing shell to its limits as much as anyone, but I had so many shell scripts that grew and grew till the point I got annoyed and rewrote them in perl/ruby and every time I thought it came out better (shorter, more comprehensible). Finding that point where it's better to be a "real" language is a bit of an art form though. For me it's heuristics like needing arrays or hashes or parsing command line arguments.


Yup, that is exactly the point of Oil. In fact I promised I would write an "elevator pitch" but I haven't done that yet [1].

In short: It's the "upgrade path" from bash. That is, the ONLY language you can automatically convert shell to.

Rewriting even a 30 line bash script in Perl/Ruby/Python isn't trivial, and of course there are many 1000+ lines bash scripts laying around. People even write new ones today (e.g. for setting up clusters and Linux distros.)

I'm current looking at running the 2500 line 'abuild' shell script which builds and deploys Alpine Linux packages [2]. Although many people like to avoid shell, if you use Unix, it's at the foundation of everything you do.

The bash clone OSH is actually close to done (although it's slow). But I still have to work on the new Oil language [3] which you can convert shell to. Hopefully that won't take as long as OSH :)

[1] http://www.oilshell.org/blog/2017/07/31.html

[2] https://git.alpinelinux.org/cgit/abuild/tree/abuild.in

[3] http://www.oilshell.org/blog/2017/02/05.html


(author here) That's a very similar problem, but it looks like it's sloppy about escaping. Correct me if I'm wrong, but you could have a change description that looks like:

    Fix escaping of \*.
And it wouldn't be printed correctly because it conflicts with Markdown's own escaping.

    $ echo 'Fix escaping of \*' |markdown
    <p>Fix escaping of *</p>
Or:

    Fix parsing of a[i]()

    $ echo 'Fix parsing of a[i]()' | markdown
    <p>Fix parsing of a<a href="">i</a></p>
In other words, you have a "markdown injection attack" (which is probably not very consequential, but it leads to bad output.). The whole point of this exercise is that the escaping is 100% correct.


Sure, understood.



If you are not in an adversarial model, you could achieve that using my tool fugitive [1] with the right templates :).

[1] https://clandest.in/fugitive/about/




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: