I've also written about some less security critical things, like shell history (http://corte.si/posts/hacks/github-shhistory) custom aspell dictionaries (http://corte.si/posts/hacks/github-spellingdicts), and seeing if one could come up with ideas for command-line tools by looking at common pipe chains from shell histories (http://corte.si/posts/hacks/github-pipechains).
I've held back on some of the more damaging leaks that are easy to exploit en-masse with a tool like this (some are discussed in the linked post, but there are many more), because there's just no way to counteract this effectively without co-operation from Github. I've reported this to Github with concrete suggestions for improving things, but have never received a response.
This works pretty good too, does not suffer from the github blocking of your script and is probably even easier.
Github might include something like a warning on your repo that it includes possible data that you might not want out there.
You can access all of this functionality with ghrabber.
One of my suggestions to Github is that they disable indexing of dotfiles of all persuasions (including contents of dot-directories), unless the repo owner explicitly opts in. That would make it much harder to find a very large fraction of the more obvious leaks.
If true, doesn't this make crippling the usefulness of GH's search really superfluous?
Full disclosure: I'm never a fan of crippling search to cover the ass of someone who has pushed sensitive information to a publicly accessible location. I'm still sore about Google's decision to do things like prevent one from searching for -for instance- credit card numbers. :(
Just look at some of these chains:
ps | grep
cat | grep
find | grep
find | xargs
grep | wc
ls | grep
echo | grep
grep | grep
A particularly odd one on the list was `type | head`. Does anyone know the purpose of this?
Remember, we are always intermediates at most things.
Oops? Ironically (assuming two distinct values of PATTERN) I think you just answered your own question. (They are different: first is disjunction of patterns, second is conjunction).
Your point has merit for scripts (performance) but for data exploration at the prompt it's almost always irrelevant: the simplicity of pipe composition outweighs anything else.