
Show HN: Sniffgit – A Python lib to find sensitive files and information in a repo - LHardi
https://github.com/Liandy213/sniffgit
======
chatmasta
Nice start. I notice this only scans the HEAD of the repository. Have you
considered implementing functionality to go back through previous commits and
check for secrets in files there? After all, once something is committed to
git, even if you change the file, the old version is still there (by design,
obviously).

For a more complex implementation of a solution to this problem, checkout
trufflehog [0], which "searches through git repositories for high entropy
strings and secrets, digging deep into commit history."

[0]
[https://github.com/dxa4481/truffleHog](https://github.com/dxa4481/truffleHog)

~~~
LHardi
Hi there, a feature to scan previous commits sounds awesome and I'll start
working on it soon!

truffleHog also provides a sophisticated approach in detecting potential
secret strings.

Thank you for the feedback! :)

------
LHardi
Hi there, I built this library after reading up some InfoSec SE posts about
what sensitive files (and information) that should be gitignored or not
included at all in a git repo.

The following article was also a motivation for me to start the project, “Dev
put AWS keys on Github. Then BAD THINGS happened”:
[https://www.theregister.co.uk/2015/01/06/dev_blunder_shows_g...](https://www.theregister.co.uk/2015/01/06/dev_blunder_shows_github_crawling_with_keyslurping_bots/)

How this library works: sniffgit starts from the root of your git working
directory, and check if there are any sensitive files (id_rsa, *.cert, etc)
that are exposed, i.e. files that haven't been gitignored or files that
shouldn’t be in a repo at all.

This library also checks textfiles for sensitive information, such as
AWS_SECRET_ACCESS_KEY, email, password, etc. Some files and directories are
not going to be read at all, though (e.g. binary file, .git, yarn.lock).

Currently, the “sensitive info / line analysis” will have a lot of false
positive result for larger projects. The reason is that it only checks for
keyword such as “password, API_KEY, email, etc” for each line in a text file.

This is my first ever open-source project. Feedbacks are truly appreciated,
particularly about OSS best practices :).

~~~
ameesdotme
Interesting project! Perhaps you could add a return value depending on whether
results were found (using sys.exit or something like that) so it can be
integrated in CI-pipelines.

~~~
LHardi
Thank you for the suggestion! I will add that feature today. I believe that
the project will be more useful if it can be easily integrated into CI
pipelines!

