
Finding secrets by decompiling Python bytecode in public repositories - gilad
https://blog.jse.li/posts/pyc/
======
scarface74
Public Service Announcement.

While for the other secrets, I’ve got nothing, there is never a reason to have
AWS secret keys in your code or in application specific configuration files.

Every AWS SDK will automatically read your keys from your config file in your
home directory locally. Just run

    
    
      aws configure 
    

When you run your code on EC2, Lambda or ECS, the same SDK’s will
automatically get the keys associated with the attached role.

~~~
robbyt
For others non AWS credentials, just use environment variables

~~~
lysium
They have to be set up somewhere, though.

~~~
thanksforfish
It should be done out of band.

Repo should have a `README` that says the secrets are in the team 1password
account, talk to <team member> to get a 1password account, and get added to
the group that has access to the vault with the creds.

The repo should have a source script that will pull the credentials[1] and
`export` them to your ENV. `direnv` can make that happen automatically[2], or
you can run that script from your `.bashrc` or similar.

You can do something similar with your favorite secrets manager. I've used a
similar approach before, with good results.

[1] [https://support.1password.com/command-line-getting-
started/](https://support.1password.com/command-line-getting-started/)

[2] [https://direnv.net/](https://direnv.net/)

~~~
Someone
If you use AWS, _“your favorite secrets manager”_ should be AWS
SecretsManager, and you should use an AWS Role to limit access to the secrets.

Have your code fetch any secrets it needs from AWS Secrets manager, using the
name of the secret (which does not have to be kept secret, so it can be in
your source repo)

That way, you don’t have to put secrets in your environment, with the risk of
leaking them.

~~~
mdaniel
For comparison, open source password managers are zero cost, 1Password is
fixed cost (even more fixed if one buys the software, instead of the
subscription), and in contrast [https://aws.amazon.com/secrets-
manager/pricing/](https://aws.amazon.com/secrets-manager/pricing/) is $0.40
per secret per month, plus a tiny but not zero cost per API access.

I'm just pointing out that AWS Secrets Manager is not at automatic, no-brainer
win

~~~
scarface74
SSM Parameter Store (not Secrets Manager) is also “zero cost” and you can have
a parameter of type “secret string”.

The other solutions don’t integrate with AWS IAM. _Something_ has to grant
access to the password vault. In the case of Secrets Manager/Parameter Store
you just grant access to the role attached to your EC2 instance/ECS
cluster/Lambda.

------
maxeonyx
This would be solved if python used an (OS-specific) cache directory for its
.pyc files. I have always disliked .pyc files... here's a concrete reason!

Question: what does python do if it doesn't have write permission in the
current working directory? Not write the cache?

~~~
eesmith
It already (since 3.2, as the article points out) uses a cache directory. Why
would an OS-specific one help?

You can also set PYTHONPYCACHEPREFIX if you want to use 'a mirror directory
tree at this path, instead of in __pycache__ directories within the source
tree' \- [https://docs.python.org/3/using/cmdline.html#envvar-
PYTHONPY...](https://docs.python.org/3/using/cmdline.html#envvar-
PYTHONPYCACHEPREFIX)

The PEP mentions your case at
[https://www.python.org/dev/peps/pep-3147/#case-5-read-
only-f...](https://www.python.org/dev/peps/pep-3147/#case-5-read-only-file-
systems) . But honestly, I don't really follow the answer. I think it means
"ignore creation and write failures."

~~~
deathanatos
> _Why would an OS-specific one help?_

E.g., if you want to back up your home dir, and omit caches, since they can be
regenerated. It's a lot easier if programs write their cache data to ~/.cache
/ $XDG_CACHE_HOME than if they intermix it / scatter it about.

~~~
eesmith
But how does it help to have one directory for all Linux-based OSes (or one
directory for RHEL, one for Ubuntu, one for Debian, etc.), and one for FreeBSD
and one for OpenIndiana?

At least, that's what I interpret "OS-specific" to mean.

~~~
deathanatos
More along the line of the original comment you're replying to, if the cache
was in, say, ~/.cache, then it won't get swept up in the repository's commits,
since the cache data is no longer inside the repository's working directory.
Then, it never gets uploaded to GitHub, and this security issue never happens.

I have seen a surprising number of people — some who are engineers by
profession too, and ought to know better — just git add everything, and then
commit it all without looking. One _should_ review the diff one has staged to
see if it is correct, but alas…

~~~
eesmith
That's possible with 3.8's PYTHONPYCACHEPREFIX, yes?

Perhaps it's worthwhile for someone to blog about this more/promote this as a
best practice? Though what's missing is the hook to connected it as
appropriate for the given platform.

I see now that "OS-specific" was meant to be interpreted as "the OS-defined
mechanism to find a cache directory", not "a cache directory which differs for
each operating system".

I would not have been confused by the term "platform dependent", which is what
Python's tmpdir documentation uses, as in: "The default directory is chosen
from a platform-dependent list" at
[https://docs.python.org/3/library/tempfile.html?highlight=tm...](https://docs.python.org/3/library/tempfile.html?highlight=tmpdir#tempfile.mkstemp)
.

------
shockinglytrue
Highly recommend "export PYTHONDONTWRITEBYTECODE=1" in your bashrc and just
forget about it. Pyc files are still an important optimization on modern
machines in some circumstances (especially with huge oft-restarted apps), but
the autogeneration behaviour has always been a pain in the ass

The bulk of your pycs are generated during package install. What tends to
remain in the usual case is a handful of files representing app code or
similar.

~~~
the_jeremy
Why do you highly recommend this? You can set git (or other VCS) settings to
avoid committing .pyc files. They're not large files; I don't see the
downside.

~~~
shockinglytrue
Difficulty with git basics is the least of your worries. For example pycs are
fundamentally racy, it is quite possible to have a .py newer than a .pyc
depending on how unlucky you were with an in-progress deployment, or some tool
that never updated the .py timestamp. Python continues to execute the pyc even
though the code changed, since the minimal benefit of the pyc would be
rendered significantly inert should Python use any kind of strong check
(instead of just comparing second-granularity stat() output) to ensure the
cached bytecode matches the source. In this way, without your permission,
Python silently plays undesirable code execution roulette with your computer
every time it starts, for as long as you have the feature enabled.

I have lost count of the number of times I've seen someone lose an hour due to
it. I can also count many instances of QA environments becoming inexplicably
bricked by it. The correct fix for this requires opening the .py and hashing
its content, at least doubling the amount of IO required to start a program.
They were a great feature when parsing small files was noticeably slow, but
this hasn't been true for almost 20 years.

It's therefore worth turning the question around: why do you think pyc files
are useful?

~~~
stestagg
In the last 12 years of writing python, I have only hit had issues with .pyc
files a handful of times, and always with python < 2.7. Anaecdotally this
experience is shared with everyone I have worked with.

If you’re seeing this regularly, it suggests there may be something unique or
uncommon in your set-up. You may wish to isolate and change whatever that is.

~~~
neurostimulant
Now that you mentioned it, I just realized I never have problems related to
.pyc files anymore ever since I switched to python 3 a few years ago. I
remember I used to have problem with deleting database migration files because
python would load the .pyc files of deleted migration scripts unless I also
delete the .pyc files (which I often forgot).

------
oefrha
Just reading the script from TFA, it attempts to find secrets.pyc and
decompile it, but doesn't even check if secrets.py is also in the repo. A
glance at search results (I just used GitHub's web interface, didn't bother to
run the code) tells me when secrets.pyc is committed, secrets.py comes with it
at least the vast majority of time.

I guess the author did find cases where secrets.pyc is committed but
secrets.py is not? It's hard to fathom how that could have happened
(especially inside "organization" settings). Sounds like the result of
absolute rookies in both Python and git following a tutorial with a step "add
secrets.py to .gitignore" but unfortunately takes ignoring __pycache__ and
﹡.pyc for granted, which is too much to ask for some people.

> it is very easy for an experienced programmer to accidentally commit their
> secrets

No, it doesn't take an experienced programmer to put __pycache__ and ﹡.pyc to
global ignore, or use a gitignore boilerplate at project creation, or notice
random unwanted files during code review.

~~~
TomMarius
I think I am an experienced developer (not Python) and this would never cross
my mind.

~~~
thaumasiotes
It would never cross your mind not to commit .pyc files to source control?
They're not even source. Committing .pyc files is to Python what committing .o
files is to C.

~~~
oefrha
> They're not even source.

To be clear, they’re not even text. You don’t need to know Python at all to
realize something’s not right when you’re committing unknown binaries to
source control.

------
xvilka
Radare2 also supports Python bytecode of diffent versions [1].

[1]
[https://github.com/radareorg/radare2/tree/master/libr/asm/ar...](https://github.com/radareorg/radare2/tree/master/libr/asm/arch/pyc)

------
neurostimulant
Another possible place to look for secrets is in public docker images. Bots
are scanning github repos for secrets all the time, but what about dockerhub
(and other docker images repositories)? I accidentally leaved a secret on my
public docker image once and that's made me quite paranoid about it now.

~~~
syntheticcorp
I scanned everything pushed to dockerhub for a few weeks but didn’t find too
much interesting stuff showing up

------
tucnak
I read "Finding secrets by decompiling Python bytecode in public restrooms" by
accident. It never occured to me that anyone would do THAT in there.

~~~
eat_veggies
there's bytecode sharpied on the walls

