
Git Log in HTML: A Harder Problem and a Safe Solution - chubot
http://www.oilshell.org/blog/2017/09/29.html
======
falsedan
The author goes to lengths to explain how bash sucks at embedded null bytes in
variable values (strings). But Bash is fine with nulls in pipes:

    
    
        $ printf 'foo\x00bar\x00baz' | xargs -0 -n1 | wc -l
        3
    

The real bash solution is you'd write a command to escape HTML entities, and
set up the pipeline so that bash never has to assign any values that might
have nulls. If I wanted a shell that was better at programming & manipulating
test data, I'd use perl.

~~~
chubot
(author here) Yes, bash can write NUL to stdout. But the point is that it
can't store NUL in a string. You can't just replace \x01 from the last
solution with \x00, e.g.

Try:

    
    
       git log --pretty=format:$'%h\x00%s%\x00'
    

The string is truncated at the first NUL. Compare with:

    
    
       git log --pretty=format:'%h%x00%s%x00'
    

This matters because not all tools support something like %x00 -- it's
specific to git.

printf works because it's not storing NUL in a string, rather an escaped
version, similar to echo -e '\x00' | od -c.

Also I don't see the difference between "real bash solution" and what I
described. That's exactly what I do, but you can't escape the entire string
because it would remove your markup.

I'd like to see a solution that is shorter but also correct against
adversarial input like mine is.

~~~
chubot
EDIT: I should say that the first command wouldn't work anyway because argv is
NUL-terminated -- I mentioned this in "Note 2" with respect to grep.

But still I'm failing to see what your point is about the "real bash
solution". That's what this is, at least if you want to be 100% correct about
escaping.

I criticized the "normal bash solution" in the section on "The Worst Parts of
Shell".

~~~
falsedan
> _But still I 'm failing to see what your point is about the "real bash
> solution"_

It's accepting that bash sucks at strings, and trying to keep it as far away
as possible from the data in the pipeline. You keep pointing to the nulls and
say, bash is bad at this and oil is good; but it's a strawman since bash
doesn't have to even see the nulls for a bash script to be 100% correct at
escaping the git log messages.

~~~
chubot
As noted in the article, I'm suggesting that the shell itself should able to
handle searching a stream for NULs, without this roundabout way of doing
things:

    
    
        count-nul() {
          # -o puts every match on its own line.  (grep -o -c doesn't work.)
          od -A n -t x1 | grep -o '00' | wc -l
        }
    

You can do this in Python or Perl, but it's not a stretch for the shell to do
it directly.

~~~
falsedan
And I’m saying, this isn’t a feature I feel the lack of in bash. If I’m
working with non-text data, I’ll move to a language with better handling of
complex data structures like perl & explicitly handle concurrency myself.

To be clear: oil has to be a compelling alternative to perl, not bash.

------
mattdeboard
On the original post he says switching to a "real" PL is probably "right" but
onerous. I won't nitpick on whether or not it's onerous, but maybe it's not as
onerous as the author thinks.

A few years ago I blogged about generating a Markdown changelog based on PRs
[http://mattdeboard.net/2014/01/14/automatic-changelog-
genera...](http://mattdeboard.net/2014/01/14/automatic-changelog-generation-
with-git/)

Though the output is different (just concerned with merge commits), I reckon
you could probably do something similar for HTML with a `git log` command
modified from what's in my post. No git API library needed.

That said, the author wanted to explore doing this just with shell and deal
with the problems and stuff that come up there, so, cool! I'm not passing
judgment on that or comparing what I wrote to what he did, just throwing it
out there.

~~~
__david__
I agree—I love pushing shell to its limits as much as anyone, but I had so
many shell scripts that grew and grew till the point I got annoyed and rewrote
them in perl/ruby and every time I thought it came out better (shorter, more
comprehensible). Finding that point where it's better to be a "real" language
is a bit of an art form though. For me it's heuristics like needing arrays or
hashes or parsing command line arguments.

~~~
chubot
Yup, that is exactly the point of Oil. In fact I promised I would write an
"elevator pitch" but I haven't done that yet [1].

In short: It's the "upgrade path" from bash. That is, the ONLY language you
can automatically convert shell to.

Rewriting even a 30 line bash script in Perl/Ruby/Python isn't trivial, and of
course there are many 1000+ lines bash scripts laying around. People even
write new ones today (e.g. for setting up clusters and Linux distros.)

I'm current looking at running the 2500 line 'abuild' shell script which
builds and deploys Alpine Linux packages [2]. Although many people like to
avoid shell, if you use Unix, it's at the foundation of everything you do.

The bash clone OSH is actually close to done (although it's slow). But I still
have to work on the new Oil language [3] which you can convert shell to.
Hopefully that won't take as long as OSH :)

[1]
[http://www.oilshell.org/blog/2017/07/31.html](http://www.oilshell.org/blog/2017/07/31.html)

[2]
[https://git.alpinelinux.org/cgit/abuild/tree/abuild.in](https://git.alpinelinux.org/cgit/abuild/tree/abuild.in)

[3]
[http://www.oilshell.org/blog/2017/02/05.html](http://www.oilshell.org/blog/2017/02/05.html)

------
falsedan
Previous article's thread
[https://news.ycombinator.com/item?id=15291809](https://news.ycombinator.com/item?id=15291809)

------
p4bl0
If you are not in an adversarial model, you could achieve that using my tool
fugitive [1] with the right templates :).

[1] [https://clandest.in/fugitive/about/](https://clandest.in/fugitive/about/)

