Hacker News new | comments | show | ask | jobs | submit login
How to Quickly and Correctly Generate a Git Log in HTML (oilshell.org)
94 points by foob 11 months ago | hide | past | web | favorite | 67 comments



Reinventing escaping over and over again (which bash scripts in particular seem to encourage) is a suckers game. It's difficult to get right and if you're constantly redoing it you're eventually going to make a mistake. I've worked in web security and it's sad to see how likely it is for people with good intentions to mess this up. I'm glad the author basically came to this conclusion.

The winning strategy is to use a library/framework/whatever for embedding user-provided content into HTML. Sane HTML template libraries will do this. That library has had more time to get it right. Furthermore a well designed API will clearly indicate what is trusted vs. untrusted data and all untrusted data is properly encoded before being embedded. See the "Security Model" section of golangs HTML templates below.

An alternative to using the git tools which is appropriate for serious work (shell pipelines are great for prototyping) is libgit2. It has bindings for many languages. It's very easy to use (sometimes (not always) easier than the CLI) and often much higher performance vs. big shell pipelines (operating on text gets slow pretty fast, and often you end up using xargs...)

An example set of tools: https://golang.org/pkg/html/template/ + https://github.com/libgit2/git2go .

It's not as succinct as a bash script but it's easier to build something that's correct. Use the shell to prototype, build it right in a saner environment.


> An alternative to using the git tools which is appropriate for serious work

This belittles their and anyone else's bash scripts for git as 'toys'. That's unfair, and completely unnecessary for your point.

> big shell pipelines (operating on text gets slow pretty fast, and often you end up using xargs...)

Big shell pipelines are actually blazingly fast, since each command in the pipeline runs in parallel.

> It's not as succinct as a bash script but it's easier to build something that's correct. Use the shell to prototype, build it right in a saner environment.

I would say the ease depends on the relative difficulty and functionality of the languages. Casting bash as the insane bad language doesn't win you any points: we all know bash is a hot mess of a garbage fire, now let's use it to do useful good work.

I would struggle to justify the work I'd need to do to write the equivalent in C (or python, where it would be slower) for one command that's only run in the release process.


> This belittles their and anyone else's bash scripts for git as 'toys'.

They often make great toys in the sense that toys are fun but it's harder to build robust things in bash scripts.

Robustness may not matter (e.g. I have tonnes of scripts I use for doing work) but if you're generating HTML for a website and are worried about escaping and security then I think there are more suitable tools.

> Big shell pipelines are actually blazingly fast, since each command in the pipeline runs in parallel.

It's also possible to do that in programming languages (maybe not as succiently.)

But where bash scripts fall down is the constant re-parsing of text, data has to be passed through the kernel, and extra processes (when you use xargs, which tends to happen for complicated tasks.) This adds up super quick.

I've replaced reasonable (i.e. weren't badly mishandling the data) shell scripts that procesed and formatted the output of git utilities (on large repos) and in one case shrunk a background job from 5 minutes to a few seconds. This is common.

> Casting bash as the insane bad language

I never said that. I said it was great for prototyping. It is bad for the task FTA, though.

> I would struggle to justify the work I'd need to do to write the equivalent in C

I didn't recommend C (I did link to some Go stuff, though.) Unless your company is all bash scripts (heaven forbid) you're probably already using some language. C# at my work. PHP, Ruby, JavaScript, whatever.


> They often make great toys in the sense that toys are fun but it's harder to build robust things in bash scripts.

Programming is hard, regardless. You're being very dismissive towards shell scripters, and it comes off as elitist.

> But where bash scripts fall down is the constant re-parsing of text, data has to be passed through the kernel, and extra processes (when you use xargs, which tends to happen for complicated tasks.) This adds up super quick.

What do you mean by parsing? Copying, reading? The buffers sit in kernel memory, sure… but that's pretty fast if every line in your program is concurrent.

> xargs

Did you have a bad experience with xargs?

> extra processes

Did you have a bad experience with… running out of… PIDs?

> > Casting bash as the insane bad language

> I never said that

You implied it by calling a non-bash language 'sane'; thus, bash must be insane.

> I didn't recommend C

Your blanket recommendation of 'anything but bash' did. I feel that sometimes you need to have more context of a situation before dictating a course of action that must be followed.


> Programming is hard, regardless. You're being very dismissive towards shell scripters, and it comes off as elitist.

I think you're taking this too personally. I write _a lot_ of bash scripts and like bash (as I've already indicated.) It's not always the right tool for the right job (no tool is...)

> What do you mean by parsing? Copying, reading? The buffers sit in kernel memory, sure… but that's pretty fast if every line in your program is concurrent.

The buffers don't just sit in the kernel, they are copied between processes (via write/read.) This involves many many context switches. By contrast (but this is just an example), libgit2 will hopefully mmap your packfiles, and once the data is mapped in you don't have to leave your process.

Parsing is sometimes obvious, for example multiple passes over the data with grep and sed to shape it the way you want. It's true that components of the pipeline can run in parallel but the problem is they do a lot of unnecessary work - parallelizing unnecessary work doesn't remove it and once you hit the limit of # of cores (which is very relevant when generating HTML - something often done by webservers with many concurrent requests where you don't have cores to spare on parallelizing inefficient algorithms.)

Parsing is sometime not as obvious, for example steps in the pipeline doing redundant Unicode validation. This is an artifact of squeezing things through read/write with reusable components. It's convenient (which makes it great for prototyping, or for things that "don't matter") but this adds up to poor performance.

> Did you have a bad experience with xargs? > Did you have a bad experience with… running out of… PIDs?

I think you're being very uncivil. xargs is useful but it (with the common options like -n and -P) can spawn many processes which has a tremendous overhead vs an alternative like a function call. It's purely a performance thing.

(I probably should be using parallel but I'm used to xargs :( )


> I think you're being very uncivil.

I'm sorry. I see how my questions come off as questioning your experience, and I didn't mean to do that.

> I think you're taking this too personally.

I think your reluctance to admit that you needlessly insulted some class of devs is exclusionary. Casual readers of these comments may think it's bad to write bash, or that it's ok to bash people who do.

> The buffers don't just sit in the kernel, they are copied between processes (via write/read.)

Yes, that's right. I know how pipes work, thanks.

What you call 'parsing' I would call processing. Parsing has a well-defined meaning in the context of computer science: turning tokens into a data structure (like a syntax tree).

> parallelizing unnecessary work doesn't remove it

Is unnecessary work a problem? If you're being charged by the second, perhaps it may be cost-effective to eliminate it… but most of the time, developer productivity is a bigger cost.

I see that you are mindful of the performance implication of your code. I usually have to think of developer throughput, so I'm very tolerant of inefficient use of hardware if it means someone gets their job done faster.


> class of devs

wait... what? is this marxist class struggle framing?

there is no 'class of devs' in this sense. there is no 'social justice' here.

bash is not a person, bash is not an identity, none of this applies.


Class in the OO/taxonomy sense. I feel like you are projecting some of your existing issues onto this discussion.


it's more related to the nature of your wording and framing. no group of humans is being abused or discriminated against.


Then I should have said, type of developer. I see how 'class' has weighty connotations.

> * no group of humans is being abused or discriminated against*

I disagree, you make it abundantly clear that 'real' tools cannot be written in bash, and that people who write in bash are not producing as high-quality work as those who use other languages (like python).


Parent's original point is just that libraries/frameworks are a much safer way of writing programs that handle untrusted input in security-critical settings.

Nothing you've said contradicts this original point (and I think it's a correct assessment). A compelling counter-argument, IMO, would be a demonstration of how to incorporate tried-and-trusted security checks into a bash script.

>> I would struggle to justify the work I'd need to do to write the equivalent in C (or python, where it would be slower) for one command that's only run in the release process.

The original article lists the CVEs that justify the extra work. You may say "that doesn't matter because any input to the release process must come from a trusted source anyways", but:

1) missing input validation can cause accidental catastrophic bugs just as easily as it can cause intentional catastrophic vulnerabilities, and

2) the whole point of defense in depth is that you shouldn't go around punching preventable holes in your defenses. IMO it shouldn't be possible to escalate from "request a build of the branch named 'X'" to "owning the release server".


(author here) Yes, but you have to take into account the context. Every minute you spend "securing" your RELEASE NOTES is better spent securing the APPLICATION.

A lot of people seem to be missing the point -- this is a quick solution that also correct (*). It's a middle ground.

Yes it has drawbacks, and I analyzed those extensively in the article. Engineering is about tradeoffs. Given infinite resources, yes you want to do the "right thing". But you don't have infinite resources.


> Every minute you spend "securing" your RELEASE NOTES is better spent securing the APPLICATION.

I've never worked in a place where developer time is really that zero-sum.

If your release notes are executable code running on your infrastructure, then their security does matter.

I'm sure there's a team of developers at Avast who wish they had spent more time securing their "release notes".


why can't I upvote this infinite times


> But where bash scripts fall down is the constant re-parsing of text, data has to be passed through the kernel, and extra processes (when you use xargs, which tends to happen for complicated tasks.) This adds up super quick.

You do't always have to use xargs. There are many ways to pipeline shell scripts. For one, you can process the input stream and add markers to make it easier to identify items later in the stream, or convert it to a well defined format for a similar outcome.

Additionally, any extra resource and time usage from xargs parsing arguments (which I think is likely very small), can probably be easily recouped by the fact that you can use a single xargs flag to run commands in parallel. I've effectively used xargs to automate connecting to through SSH and running commands on hundreds of servers when automating processes.

Finally, you're setting up a false dichotomy between shell pipelines and programming. I work in Perl for my day job, and write complex, well structured and designed Perl in that job. I also write shell scripts and Perl one liners. More importantly, I often use the Perl I've written in my one liners by loading the module and calling the functions I've defined there. In this way I get the best of both worlds, where I get to use pipelines to quickly munge text, and then pass it to more complex well written routines to process. This can often be parallelized as well.


> This belittles their and anyone else's bash scripts for git as 'toys'. That's unfair, and completely unnecessary for your point.

I don't think so. It is a very common mistake to miss the point where you should switch from a shell script to Python. (... or Ruby, Haskell, OCaml, or any other language with good libraries, sane error handling and sane data structures. And yes, I'm saying this being aware of, and always using, "set -eu -o pipefail".)

In other words: A complex collection of shell code is a very good warning indicator that somebody missed that point.


Using a thin layer of bash to feed the output of a command into a short Python script is an excellent use of shell in my opinion.


But why would you, when you can just do subprocess.popen on the command and then slice and dice the result in a language that can do regexes, splitting, format strings, and so forth? I mean, if you're going to be using Python anyways, you might as well just use the standard library and skip the extra bash step entirely. The code might be a bit longer, but infinitely safer and more maintainable.


If all you're doing is rewriting the pipelines you'd have written in Bash using subprocess in Python, you're not making your code safer or more maintainable. Unless I'm missing something, subprocess.Popen requires you to explicitly wire up the inputs and outputs, whereas Bash has clear and readable syntax for the most common idioms.

For a program that mostly just calls other programs with the right parameters, Python only obfuscates the important bits with details that a shell abstracts away. For a program that has to do some non-trivial processing, pure Bash+coreutils definitely loses to a general-purpose programming language with good standard library. But that doesn't mean that you have to completely switch over.

You can leave the parts in Bash that are just pipelines of other programs (again, Python can't improve anything there) and do your further processing in Python. Whether that's implemented as a Bash script calling Python in a pipeline or Python delegating part of it's processing to a Bash script just depends on the most natural entry point.

It might feel "impure" to mix two different languages in a program, but I think using each language in the domain where it's strengths shine is a much better idea than attempting to emulate the core features of one language in the other.


Often it's a really nice and interesting exercise to approach a problem by thinking along the lines of:

"This shell script is a bit complex. Instead of rewriting the shell script in another language, is there some simple and standalone way I could massage the data, in order to make the shell script simple again?"

It's quite a lot like thinking like this:

"This Python script is getting a bit slow. Instead of rewriting it in a faster language, is there some little part of it that I could isolate and only write that in a faster language?"


> I don't think so.

You think it's necessary to let sysadmins know their place as "not real developers"? That's elitist.


> That's elitist

Wait, what?!

I do think that every "sysadmin" who is able to write super-complex shell scripts with large sed/awk scripts is also able to learn Python or Ruby. And I'm convinced that this knowledge will enable them to produce rock-solid tools (which they need!) more easily than with Shell scripts.

Splitting the world into admins and programmers - that is elitist.

You won't find this distinction in small companies, and you find that in large companies only because that's how they happen to be organized - and are heavily criticized for that, mostly by Agile and DevOps movements.

Moreover, removing that distiction is one of the core values of the Free Software movement. The early essays of RMS show this in a very clear way.

Many people are doing "real programming" without noticing, and this is why they are using improper tools for that, and should start to learn at least one general-purpose language to make things easier for themselves. It is all about using the right tool for the job, and for complex jobs, a generic programming language with proper libraries is the right tool - far better suited than a shell script gluing together highly specialized tools, which run in separate processes and communicate via ad-hoc text formats rather than proper data structures.

For every complex system (OS, games, 3D modelling tools), the emerging power users are also beginning programmers, and the more complex things you are able to build within one DSL or (embeeded) scripting language, the easier it is to jump to a general-purpose programming language. Some systems are so nice that they use a general-purpose programming language for scripting in the first place - such as Lua for many game engines, Python for Blender and Gimp, Scheme (Guile) for Guix package building, and so on.


> Splitting the world into admins and programmers - that is elitist.

That's what you do when recommend people who program in a particular language should learn a new one 'to produce rock-solid tools (which they need!)'. You're calling the bash scripts shoddy, and saying they're incompetently wasting their (and their company's) time.

> It is all about using the right tool for the job, and for complex jobs, a generic programming language with proper libraries is the right tool - far better suited than a shell script gluing together highly specialized tools, which run in separate processes and communicate via ad-hoc text formats rather than proper data structures.

I think your viewpoint is unnecessarily limiting. If that shell script works, takes 15 minutes to write, fails loudly and obviously, can be fixed & deployed in 5 minutes, and generates significant revenue, then it is better than a unit-tested, versioned, documented package.

I have seen too many tools that follow the approach you are recommending that overengineer a solution, take ages to release, and fail in obscure ways because they are trying to replicate a bash script's ability to say, "start this, then run that, then these ones in parallel, then wait for the first thing to finish".


> That's what you do when recommend people who program in a particular language should learn a new one 'to produce rock-solid tools (which they need!)'.

Please refrain from strawmen arguments.

You (deliberately?) left out the second part of my sentence that changes the meaning of your quotation by almost 180 degrees: "... more easily than with Shell scripts".

I do assume that those shell scripts are rock-solid, because as a sysadmin you need them to be! But achieving that in shell language is so much harder than it needs to be.


If you say these things, but means something else, that's on you to adapt your communication, not on me to intuit your message.

> But achieving that in shell language is so much harder than it needs to be.

So, as an exercise, how about replicating this pipeline in python:

   seq 0 9 | xargs printf "%02i\n" | xargs -I% -P0 ssh -A dev%-useast1 "docker ps -a | grep -vE '(second|minute)s ago' | tail -n +2 | awk '{ print $1 }' | tee >( xargs --no-run-if-empty docker rm -f ) >( xargs -I% echo -e 'USER docker-reaper guest mode :Docker Reaper\nNICK docker-reaper\nJOIN #channel\nPRIVMSG #channel :removing % on $(hostname)\nQUIT\n' | nc irc.example.net 6667" )" > removed_containers


> If you say these things, but means something else, that's on you to adapt your communication, not on me to intuit your message.

It is socially unacceptable to misquote. For that, it doesn't matter if you "just" left out important parts, or actively changed words.

That kind of behaviour removes not just the bases of an insightful dicussion, but of any other form of communication as well.


> It is socially unacceptable to misquote.

No it's not. If your defense against writing a terrible thing is, "you have to understand it in context", it's still a terrible thing you wrote.

Plus: it's ableist! You're putting the burden on the audience to hear/read your words without error, and to deduce your meaning correctly.

I eagerly await your translation of that pipeline.


My sysadmin at the university has explicit orders "not to program". Ironically, he is allowed to write configuration, e.g. Ansible playbooks. So I guess, the trick is to use a turing-complete configuration language which is "totally not programming".


> Big shell pipelines are actually blazingly fast, since each command in the pipeline runs in parallel.

Yeah, there was an article a few years back about that, Command-line tools can be faster than your Hadoop cluster [1], where the punchline is that because of the inherent stream processing because of pipelining, the shell pipeline was 235 times faster than Hadoop.

1: https://news.ycombinator.com/item?id=8908462


(author here) Thanks for the comment, yes a lot of people seem to be missing the point of the article, and taking extreme positions on one side or the other.

Bash is indeed a hot mess, but it gets work done! This is a trivial part of the release process and shell lets me cut right through it. It's also pretty much correct and I analyze the limitations.

And of course I think we can do better than bash while still retaining the shell paradigm -- hence the Oil shell.


Very recently I used libgit2 in an application. It was quite nice. Then I noticed my build was failing because I make static binaries and my setup couldn't handle libgit2 via Haskell for some obscure reason. So I rewrote my Git layer to use the Git porcelain shell tools (git mktree, git hash-object, git commit-tree, etc) as a quick and easy way to get it working again. I find it quite typical that using native libraries via FFI instead of shelling out to programs leads to tedious problems. And that's part of why I cherish the Unix tradition of combining different programs; shell scripting is one manifestation of that.


(author here) Yup, that is exactly my experience. You're "solving" one problem but just creating another one: versions and dependencies. Shell is more reliable in that sense.

Less software is better. Git already exists, let's use it.

This link is in the article, I think a lot of people don't get it:

http://www.catb.org/esr/writings/unix-koans/ten-thousand.htm...


> The winning strategy is to use a library/framework/whatever for embedding user-provided content into HTML.

I think I agree. Shell is very convenient, but it's just too easy to make mistakes.

I think something like eshell[0] or a shell based on sweet expressions[1] could go a long way towards providing a powerful shell in a real programming language — with safety against foot-guns like one finds in bash & zsh (the latter is still my default shell, despite its problems).

[0] https://www.gnu.org/software/emacs/manual/html_mono/eshell.h... [1] http://readable.sourceforge.net/


Yes, I addressed this point in the article -- you are advocating "the pedantic solution". The point of the article is that we can "have our cake and eat it too".

(FWIW back in 2009, I designed the original template language that Go html/template was based on, JSON Template, and later Mike Samuel added auto-escaping.)


    git log --pretty=format:"%H%x00%s" | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g; s/"/\&quot;/g; s/'"'"'/\&#39;/g; s@\(.*\)\x0\(.*\)@<tr><th>\1</th><td>\2</td></tr>@'
You could do the dumb html entifying in a real language. The article's solution is a straw man, since it's promoting their personal language.

Why did they see \x01 & \x02 as possible sentinels but not nulls? python is fine with nulls…


My solution is easier with respect to multiple fields that contain spaces. The real example has both the description and committer name. And you can add and remove fields without changing the second part of the pipeline -- you just have to add 0x00 and 0x01 in the format string.

Similar qusetion here:

https://www.reddit.com/r/commandline/comments/719pm2/how_to_...

Also, I've used that kind of sed, and it looks horrible. It makes shell look bad.


> You could do the dumb html entifying in a real language

and xargs -0 -n2 printf '<tr><th>%s</th><td>%s</td></tr>'


People always underestimate the power of utilities specified in POSIX. When you want to do text processing without dependencies (which seem to be the goal here), just use awk.


cgi.escape is just this:

    def escape(s, quote=None):
        '''Replace special characters "&", "<" and ">" to HTML-safe sequences.
        If the optional flag quote is true, the quotation mark character (")
        is also translated.'''
        s = s.replace("&", "&amp;") # Must be done first!
        s = s.replace("<", "&lt;")
        s = s.replace(">", "&gt;")
        if quote:
            s = s.replace('"', "&quot;")
        return s
And the author would simplify the job as you point out by escaping the git log output first, then wrapping it in HTML. You don't even need the null byte. %H is a hex sequence, so even a space separator would be fine. This would do:

    git log --pretty='%H %s' |
    sed -e '
      s/&/\&amp;/g;  # must be done first
      s/</\&lt;/g;
      s/>/\&gt;/g;
      s/^\([0-9a-f][0-9a-f]*\) \(.*\)/<tr><td>\1<\/td><td>\2<\/td><\/tr>/'
That said: it's best not to be cavalier about dealing with user input. This is sufficient only if the output is going into the body of the HTML document. You have to be careful about where the output is being placed:

https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%...


I addressed this here -- that solution doesn't really work with multiple fields that contain spaces:

https://www.reddit.com/r/commandline/comments/719pm2/how_to_...


I admit at that point you might want to break out perl:

    git log --pretty='format:%H%x00%an%x00%aE%x00%s' |
    perl -F'\0' -ane '
    chomp;
    print "<tr>\n";
    foreach(@F) {
      s|&|&amp;|g;
      s|<|&lt;|g;
      s|>|&gt;|g;
      print "<td>$_</td>\n";
    }
    print "</tr>\n"'
:)


The underlying problem with the first, simple, approach is that the template it is using to get things from git,

  "<tr> <td>%H</td> <td>%s</td> <tr>"
interpolates values that need to be escaped, but includes literal text that must not be escaped. (My guess is that the author meant "</tr>" for the last element, but the article says "<tr>" so I'm going with that).

The author's approach to deal with that is to mark the places in the template where escaping will be needed, and then make and use an escaping tool that recognizes those marks and just escapes the marked segments.

A simpler approach is to eliminate the underlying problem. For getting the data out of git use a template where the literal text is safe to escape, such as this:

  "%H,%s"
The escaping can then be done by a tool that escapes its entire input. That will leave the comma from the template alone, and will not introduce any new commas. The interpolation of %s might have introduced commas, but they will all be after the literal comma from the template. The interpolation of %H will not introduce commas.

The output from the escaper can then be transformed into the final output by replacing the first "," with "</td> <td>", prepending "<tr> <td>", and appending "</td> <tr>". All of these are simple in a shell pipeline using sed.


Several people brought up alternative solutions like this, and I addressed them: https://news.ycombinator.com/item?id=15295556

Summary: I may have oversimplified the problem in my example. I think my solution is nicer for the real problem. It has fewer assumptions and will "scale up" to more fields with arbitrary text. I want to write the escaping ONCE, not modify it every time I change the format of the output table.


You can skip having to escape any characters or worry if the content is correct, if you put an unformatted git log into a script tag, and then line split and set the content of each element via a JS call.

I just tried it, and it works beautifully, no problems with illegal characters.

What's wrong with this? It'd be super easy to extend if you want columns or colors or links...

    <script id='gitlog' type='text'>
      c0c3150f5 09 - 15 dahart Color widget!, #1 improving < hsv > && things [Finishes #8736345] \m/ '",.;:%$#@*
    </script>
    <div id='lines'></div>
    
    $('#gitlog').html().split('\n').forEach(line => {
      $('#lines').append($('<div class="line"/>').text(line))
    })


Also consider https://www.pixelbeat.org/scripts/ansi2html.sh for the general case of (colored) output to html conversion


`gitweb` is a server that comes with your git install.

The `gitweb` web interface includes both a log and shortlog view for repositories. You can probably use those to some benefit.

This seems to be the source of the shortlog command:

https://github.com/git/git/blob/master/gitweb/gitweb.perl#L5...


why do people insist upon reinventing the wheel badly:

git log --color=always <whatever funky coloring, options, etc you want> | aha > git_log.html

side note: aha is not installed by default on macOS but homebrew will fix that for you. Also, it has many color and styling options.


It doesn't look like it has hyperlinks:

https://github.com/theZiz/aha

Compare:

http://www.oilshell.org/release/0.1.0/changelog.html

Hyperlinks are the whole point of HTML :)


How is it that git still doesn't have machine readable output built in?


In what respect is --pretty=format not producing machine-readable output? Is putting null bytes between the fields not machine-readable enough for you? You want to use 0x1d instead? You could do that.


In the way that you have to futz around like this with null byte separators to try and avoid escaping issues.

Svn had a flag to output resonses in xml specifically for machine consumption.

No ambiguity about what it would provide, or if it was escaped or how to handle things like new lines etc.


Because if you want machine readable output you shouldn't be parsing the output of other commands - you should link with libgit2. It's actually very easy.


I'd quibble with your premise that inserting unambiguous delimiters doesn't make a machine-readable text format, but I'm so happy to have learned about the existence of libgit2 that I won't press the issue. Machine-readable text is great and all, but libgit2.so is even better.


It is great indeed, but operationally, dynamic libraries have their own problems and horrors.


Isn't that what the --porcelain option[0] is for? And for commands that aren't porcelain, what's wrong with --oneline or --format?

[0]: https://stackoverflow.com/questions/6976473/what-does-the-te...


It… does? In the specific case of git-log, is not --format=raw machine-parseable? If you only need pieces or parts of the commit, specific format strings can be used, but you can still get those to be machine parseable pretty trivially.


I don't see what was wrong with the first solution. Keep it simple!


If you use < or & in your text, it is not HTML5. Neither XHTML. However, it is probably HTML4, isn't it?


Probably. In any case, cross that bridge when you get to a release that has one of those symbols in the changelog.


Did you overlook the part where <& appears in commit messages?


No?


That's what was wrong with the first solution. The bridge was crossed just when it should have been.


Ah, I misread the article.


>Some programmers might stop here and say, Let's switch to a real programming language. Do it the right way.

Isn't using Python switching to a real programming language?


It's a little misleading how it's written. He said "use a real language", but the real distinction was using a library for the git api. He's only using the Python stdlib for text manipulation as part of a pipeline.


depends on how elitist you are


I don't want to use an API and do it the right way. That's too complicated, poindexter! (50 lines of garbage script follow)




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: