Hacker News new | past | comments | ask | show | jobs | submit login
A Git query language (github.com)
352 points by bryanrasmussen on Dec 14, 2016 | hide | past | web | favorite | 67 comments

This might benefit from SQLite's Virtual tables: https://sqlite.org/vtab.html

With Virtual Tables you can expose any data source as a SQLite table -- then you can use every SQL feature that sqlite offers. You can just tell sqlite how to iterate through your data with a few functions, with an option to push down filtering information for efficiency.

You can also create your own aggregates, functions etc.

Here's an article where the author exposes redis as a table within sqlite: http://charlesleifer.com/blog/extending-sqlite-with-python/

My thoughts went straight to PostgreSQL Foreign Data Wrappers. Something like that would be really helpful!

Wouldn't that require a running Postgres server, though? Seems a bit heavy for ad hoc queries.

For something like a GitLab server, though, that would be amazing.

According to the Ubuntu Dependencies[0] it already uses Postgres, it also needs Redis, so with gitlab we are a bit pass the heavy bit.

[0]: http://packages.ubuntu.com/xenial/gitlab

SQLite virtual tables for git would be phenomenal - you could join your git history against data from other sources! And you wouldn't need to run a huge MySQL/Postgres server process to do it.

A very cursory search suggests no one has built this yet.

Mercurial has a somewhat similar concept predating this (added circa 2010): revision sets (https://www.selenic.com/mercurial/hg.1.html#revsets) (for selection, and templates for selection but git has that built-in, kind-of, via log --format)

Mercurial's are completely general, though. Any Mercurial command that can accept a revision as an argument can also accept a revset expression. And templating isn't just for log, but for many other commands, such as grep or annotate (blame), and it's the same templating language for all of them. I also find hg templates a bit easier to read, because they're Djangoish/Jinjaish instead of being printf-ish like git's. Plus, you can save and compose Mercurial templates and revsets.

I was actually hoping that gitql had finally gotten inspiration from Mercurial and git would grow a general purpose query language, but it's read-only. :-(

Revsets are a wonderful feature, and it's something I wish git had. Just being able to say I want to see what has changed between this branch head and its latest common ancestor with trunk is an incredibly simple and useful thing to be able to do.

Complete guess based on [1], but wouldn't

  git diff HEAD $(git merge-base HEAD master)

[1]: https://stackoverflow.com/questions/1549146/find-common-ance...

In this case git can do the same thing, but notice you can only do it because git provides a special command for getting that revision. Recasts are really general (a greatest common ancestor function is provided, but can be easily synthesised from more primitive building blocks) and can be used everywhere, so you can bisect over changes you made that touched files matching a pattern, or whatever. They aren't something I use every day, but they are really useful on occasions and allow for some pretty robust tooling to be written.

This requires using bash. I find it kind of cheating that git ships bash on Windows so that Windows users can rely on bash for composing git commands. I'm not sure if Windows users are generally that happy about typing bash commands, but I guess nobody really cares what you have to type in as long as it's high in the Google hits for whatever operation you want to perform.

Mercurial's API (i.e. the CLI) makes a point of being usable with powershell and cmd.exe, which I think some Windows users appreciate.

Another way to look at this is that hg needed to bake this into their core, whereas git didn't need to. There is a non-zero cost to all additional code, so leaning on the shell to do work is generally a smart move.

Git ships with Git Bash on Windows.

Beauty of revsets is their ubiquity.

Diff would be `hg diff -r 'ancestor(default, experiment)' -r experiment`

Want to list commits in experiment? Just change diff to log. Generate patch file with all commits? Change it to export.

You can do that! "master..experiment" means "all commits reachable by experiment that aren’t reachable by master" https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection

I believe the goal is to get a diff, not a list of commits, in which case you need to figure out an expression for getting the last commit common to master and experiment so you can diff it with experiment.

`git diff master..experiment`

I just tried that, and it seems to be the same as `git diff master experiment`. We don't want to diff the two heads. The command in Mercurial is `hg diff -r 'ancestor(master, experiment)' -r experiment`. Comrade trolor seems to have found the correct git expression.


    git diff master...experiment
(note there's three periods)

That works. Can you explain why? Does diff see this as a single commit or as set of commits? If it sees it as a set, how does it decide what two commits to diff from that set? If it's just one, what does it decide to diff?

I'm kind of confused because gitrevisions(7) says the triple dot is symmetric difference, but exchanging master and experiment does not produce the same output from diff.

Not so helpfully it has a different meaning in 'git diff' than in 'git log'. Basically, it means the difference in the second branch from the first common ancestor of the two branches.

True all the way, but that may be a bit too much to chew on for people whose mind is already blown by gitql.

This is pretty cool. Looks like it's local to the current repo which makes sense for most usage. Having something like this across a swathe of repos would be useful in different ways (ex: "What has Bob committed over all the repos for our projects that involves the string 'billing'?".

Minor off topic rant about the animated example: Who doesn't put a space at the end of their prompt after the $?! Ugh!

Git map would work for many cases


  Git map ql ...

Nice. I can see a need for this as a lot of my projects are structured like that (multiple sibling repos). Running 'git map log --grep ...' seems particularly useful.

Quick eyeballing of the source, it does not handle whitespace in directory names properly. The for loop would treat them as separate, invalid, entries.

Of course I'd also fire someone on the spot that commits a project directory with whitespace in it...

Good point, switched it to a while loop.

Thank you for suggesting git-map. I tried it and intend to include it into my workflow. I thought your suggestion was a good one so I tested it. It did not work for me due to: "dyld: Library not loaded: libgit2.21.dylib". I assume this is something about my setup (mac, zsh, other stuff) but if you got this working I'd like to know so I can keep trying my with my setup. To clarify: both git-map and gitql work for me, I just can't seem to combine them.

git map is just a 8 line bash script so perhaps checking if paths are setup correctly for bash to point to your git binaries if you use zsh most of the time.

I agree. Since we work with microservices, we have +10 repo per project. Would be nice to be able to scan through all of them.

Plug: if you ever find yourself wanting to merge everything painlessly: https://github.com/unravelin/tomono :)

accompanying blog post: https://syslog.ravelin.com/multi-to-mono-repository-c81d004d...

You could submodule them all in an otherwise empty parent repo.

Bit of a hack, but could come in handy for other things (off the top of my head: "welcome to the team! Clone this one thing, it contains everything you need.").

Seconded on a multirepo tool for that.

I have a big "projects" folder with lots of different repos. I'd like to know all commits I've made on the past month, across all projects.

Currently I have a post-commit hook which sends the one line shortlog to a common file, but would be nicer to have a tool for ad-hoc queries.

> Who doesn't put a space at the end of their prompt after the $?! Ugh!

I've always had mixed feelings about that. Resolved it now, by using '>', no space, which doesn't look cluttered since there's only ~1px right next to the first char entered.

Thank you for sharing this. I was interested to know if this was a fork of gitql by cloudson. It is not. The following issue clarifies the relationship:


"The relation is its purpose: SQL + Git, written in Go. There is no relation other than that."

I thought this might help anyone else who was similarly interested.

This one actually seems very promising.

    A Git query language (github.com)
    10 points by bryanrasmussen 1 hour ago
This ought to have (2014) in the title: Latest commit 49c1c17 on 22 Jun 2014.

That makes it even more interesting, I mean the fact that this or something similar didn't get traction. I often have an idea of what I would like to know about my repo, but don't want to start hacking the answer together.

https://github.com/gitql/gitql (last commit 12 days ago)

Not the same project (look at the number of commits, and the top issue).

I was responding to:

> interesting, I mean the fact that this _or something similar_ didn't get traction

I think you'll agree this is something very similar and that has traction.

sorry, didn't note the date, I just found it because I was needing something like it and was ready to start making it because I figured - better look if someone else did the work for me first.

Very cool. I always wanted to play around with a git provider for powershell. Powershell's syntax is great for queries and you could use everything that works on the normal file system with anything that has the abstractions implemented.

The syntax seems close enough that this could just replace it, though:

    ls commits | where date < (get-date).AddDays(-4) | where message -like *foo* | select autor, message, date  -First 3 | ft

I'm no powershell guru, but I'm using posh-git [1]. Is it possible to chain commands using that tool?


Oh I love that the example gif includes:

  select author, message 
  from commits 
  where 'Fuck' in message
I'm pretty sure that query's results would fill my screen buffer.

Just for comparison, in Mercurial you would do

   hg log --template "{author}, {desc}\n" --rev "desc('fuck')"

"Fuck" is right up there with "ch-ch-changes" in my git word cloud.


I don't see the point in wrapping the data in all that ascii art noise, will make it harder to script with.

But why does it have to look like SQL (and not like xpath or jquery)?

Not many people enjoy writing SQL statements on the command line. It's verbose, the order of things is arbitrary...

I would assume that the whole point of the project is to be able to do SQLish queries on a git repo, and that it was written by and for people who are familiar with SQL and have a preference for it over other query languages. And they probably do enjoy writing SQL statements on the command line, however uncommon that may or may not be.

Seems to be very narrow group. How is this relevant to HN? Flagged the story.

You flagged the story because you don't like SQL on the CLI? Come on. Whether or not you enjoy writing SQL on the CLI, SQL is a fine language for querying data, and is probably more common than xpath. JQuery seems like a strange choice too.

I just fail to praise the attitude. "Let's faithfully reproduce the looks of 30 y.o. technology, pseudographic tables included, with make-believe over git".

Well, if you feel so strongly about it, why not build your own tool to support the "preferred" query language?

Imagine if we could just have this automatically for every program that generated text output. It doesn't seem beyond the realms of possibility that every tool could either a) structure its text output in a way that can guarantee simple command-piping to a general purpose query-language processing tool or b) in the presence of a "--output-json" flag, produce json which can then easily be queried.

Sounds like you'd like the object-based Powershell.

Or you could have a single address space (https://en.wikipedia.org/wiki/Single_address_space_operating...), and share objects directly.

Sounds like a security nightmare.

But is it really? Couldn't there be a way to isolate things at the system level?

It's the UNIX principle/tradition that line oriented text is the universal format. It's quite flexible. But I share your feeling, and I keep hearing nice things about powershell.

Also, if you work with CSV files, look at textql.

Wow. It's awesome. I have never been seen project like this before. It seems very useful. Anyway, I think it would be better if they demonstrate the example of usage using asciinema [1].

[1]: https://asciinema.org

This can easily be accomplished using `git log`, `head`, and `grep`:

  git log --pretty="format:%an, %s, %ad" --after="2014-04-10" | grep "Fuck" | head -3

If you actually want to query git data in production, it's really a better idea to copy all the data into a real SQL data warehouse. If you're using github, my company (Fivetran.com) has a connector that pulls from their API.

It would be nice to see a plugin for presto

support for "SELECT DISTINCT" would be great !

That's a great idea!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact