Python malware starting to employ anti-debug techniques

nonrandomstring · on Dec 23, 2022

Simple but probably wrong solution; why not ban obfuscation libraries, compressed and self-loading code within the PyPI ecosystem. Any package that even refers to illegible non-source techniques gets flagged and blocked? It seems the whole PyPI ecosystem is undisciplined and could be tightened up. Why can't we progress here?

BiteCode_dev · on Dec 23, 2022

You can pip install complex stand alone executables, such as nodejs, and it's used in the entire ecosystem.

In fact, most packages are now wheels, which are not sources: they are compressed, and may contain binaries for compiles extensions, something extremely popular (the scientific and AI stacks exist only because of this).

Some packages need to be compiled after the fact, something that setup.py will trigger, and some even embed a fallback compiler, like some cython based packages.

Also, remember there is very few people working on pypi, there is no moderation, anybody can publish anything, so you would need a bullet proof automated heuristic. That's either impractical, or too expensive.

If you want a secure package distribution platform, there are commercial ones, such as anaconda. You get what you pay for.

ThePowerOfFuet · on Dec 23, 2022

I guess we're finding out the other side of that blade, huh?

kortex · on Dec 23, 2022

Self-loading code is a huge part of the value-add of python libraries. Many of the popular libraries (e.g. Numpy and friends) trigger a bewildering chain of events to compile from source if not installing from pre-built wheels. And if you do have wheels, you have opaque binary blobs. So pick your poison: compile-on-install with possible backdoor or prebuilt .so/.dylib/.pyc with possible backdoor.

The most obvious (but not necessarily easiest) approach is to phase out setup.py and move everything to the declarative pyproject.toml approach. This is not just better for metadata (setup scripts make it really hard to statically infer what deps a lib has), it also allows for better control over what installers/toolchains run on install.

Attackers still have quite a lot of latitude during the build phase, but at least libraries have the option to specify declaratively what permissions they need (and presumably the user has the option to forbid them).

Also eval/exec are terrible and I wish there were a mode to disable their usage, but I don't know if the python runtime has some deep dependency on it. Maybe there's a way to restrict it so that only low level frames can call the eval opcode.

fn-mote · on Dec 23, 2022

Would it be possible that the wheels could be built in a more-trusted / hardened environment? Having a binary blob isn't as serious when it comes from a trusted source. Almost all Debian/etc linux distributions have this feature (binary-downloading package manager).

The hardening could mitigate on-compilation hacking.

Obviously, this leaves "compile in the backdoor and wait for the user to fall into it" but at least this isn't an issue of compiling on the user's computer and it isn't a issue of binary blobs. And possibly there's a greater chance of detection if actual source code has to be available to compile.

beardog · on Dec 23, 2022

>Also eval/exec are terrible and I wish there were a mode to disable their usage,

You can use audit hooks in the sys module (as long as you load it first) to disable eval/exec/process spawning or even arbitrary imports or network requests.

ashishbijlani · on Dec 23, 2022

I’ve been building Packj [1] to flag PyPI/NPM/Ruby packages that contain suspicious decode+exec and other “risky” APIs using static analysis. It also uses strace-based dynamic analysis to monitor install-time filesystem/network activities. We have detected a bunch of malware with the tool.

1. https://github.com/ossillate-inc/packj flags malicious/risky packages.

stingraycharles · on Dec 23, 2022

I don’t think this would work well for shipping packages that use proprietary libraries. But at the very least they could be flagged, yes.

woodruffw · on Dec 23, 2022

The short answer is that this can’t be easily mitigated at the package index level, at least not without massive breaking chances to the Python packaging ecosystem: PyPI would have to ban all setup.py based source distributions.

Even then, that only pushes the problem down a layer: you’re still fundamentally installing third party code, which can do whatever it pleases. The problem then becomes one of static analysis, for which precision is the major limitation (in effect, just continuing the cat-and-mouse game.)

badrabbit · on Dec 23, 2022

Why would you think that would change a thing? Also, obfuscation has legitimate uses by people making stuff they don't want easily reversed. This isn't a python specific problem.

anothernewdude · on Dec 23, 2022

Yeah, just get rid of anything that has a binary blob. Cool. And then when PyPI gets swapped out for whatever immediately replaces it because PyPI is useless, then at least PyPI will be secure.

danuker · on Dec 23, 2022

For instance, F-Droid only permits software that can be verifiably compiled by them.

It essentially bans binary blobs yet it is very useful.

turminal · on Dec 23, 2022

F-droid also has very different goals and lives in a much smaller and in some ways much saner ecosystem.

BiteCode_dev · on Dec 23, 2022

Yes, but pypi has 4 millions release to check, and the scientific and machine learning wheels are very hard to compile (scipy contains c, fortran and assembly code, and must be compiled to mac, linux and windows).

Providing a build env for that would make it prohibitively complicated and expensive, and basically would mirror github CI.

That's the reason continuum is making money: they sell a python package distribution channel that is checked and locked.

baeaz · on Dec 23, 2022

F-Droid is infamously known for taking weeks to build a new version of any app.

yjftsjthsd-h · on Dec 23, 2022

I'm given to understand that's more about having an offline signing process then the actual builds.

nonrandomstring · on Dec 23, 2022

> PyPI is useless

Why will it be "useless". Explain your reasoning please.

pletnes · on Dec 23, 2022

Most of the libraries I use include compiled C/C++/Fortran/Rust code. Pandas, scipy, scikit-learn, … if I were limited to pure-python libraries, I would probably rather swap languages, or at least package manager, at great inconvenience.

That being said, I don’t think PyPI would be «useless» - this was the state a few years ago, and we had to compile all the libraries ourselves. I don’t want to go back.

zzzeek · on Dec 23, 2022

None of those packages are downloading and running CRAP.EXE within the setup.py process, that's not how native extensions work. It should be possible to flag packages that are downloading things when setup.py runs, much less running exec within setup.py. a python package that really needs you to run a windows installer for its dependencies should have you be doing that separately.

BiteCode_dev · on Dec 23, 2022

Yes but the problem here is the obfuscation of the malware code loading. No need to trigger it in the setup.py process, as long as you have it in the lib, you can always put a call in a .pth somewhere and run your malware as soon as any python is executed.

zzzeek · on Dec 23, 2022

it should be possible to test packages for that also. if you are testing setup.py to see that no network access or exec occurs, you could similarly run the python interpreter after install and ensure no network / exec() happens at that point either, assuming one has not imported the package. or just disallow unfamiliar .pth files from being installed altogether (outside of those generated by setuptools / etc. for normal execution).

BiteCode_dev · on Dec 24, 2022

Given that every attempt to sandbox python failed, and that every system exposed to the public have been pwn, I assume this is a cat and mouse game we can't win.

At best I suppose we could put in place checks to get low hanging fruits. But we are, after all, allowing a turing complete and highly dynamic language to execute.

> or just disallow unfamiliar .pth files from being installed altogether

That would kill the entire plugin ecosystem.

Now, the next thing could be to have a permission system, requesting access to the network, fs, .pth, etc. It would not be a bad idea, given that we are, after all, installing things that are as powerful as apps.

But it would be a gigantic effort, and users still would just accept without reading, like they do with apps.

pletnes · on Dec 23, 2022

Sure, I didn’t intend to claim that. It’s just a hassle for me to compile my own C code, which I’d have to do if binaries weren’t bundled. That’s why anaconda python took off on windows - it’s hard work to compile scipy on windows!

zzzeek · on Dec 23, 2022

pypi delivers wheel files for pre-built binaries, and that's the only way one is supposed to distribute pre-built binary executables or shared libraries. the issue of "runs malicious code in setup.py" does not apply in that case because setup.py isn't invoked.

nonrandomstring · on Dec 23, 2022

> great inconvenience

It's a convenience/security trade-off, I see.

The only solution I've ever seen to that requires investing trust in an "authority" which then becomes corrupt and censorial. One simply expands the dilemma to a triad; security/freedom/convenience.

If I am not mistaken the PyPI "Cheese Shop" is owned by the Python Software Foundation, a 501(c3) nonprofit organisation which constitutionally values Software Freedom highly. It seems natural that convenience would be sacrificed if security is of concern.

nine_k · on Dec 23, 2022

Such an authority in the Linux world used to be a distribution. Installing a binary blob provided by Debian build servers is based on decades of trust.

But there is a tradeoff between having things thoroughly vetted and tested, and moving fast.

nonrandomstring · on Dec 23, 2022

Interesting point. So as dimensions we now have

  - security
  
  - freedom
    
  - convenience
  
  - speed/newness

Who can build me a UI with four sliders that selects the packages I can install? Bonus: when I move a slider it highlights all the potential packages that changed status with reasons why they are now included/excluded.

dylan604 · on Dec 23, 2022

You're an HN reader, so you should be able to knock this out over a weekend /s

nonrandomstring · on Dec 23, 2022

You're right the prototype GUI is a weekend of work. But you also know that's not where the work is :) Now some more intelligent comments are coming in we can talk about the analysis and tagging of thousands of packages, dealing with backward compatibility and what happens when naughty malware just hops to another level of trust.

But none of that is a call to give up. We just need to think seriously about the problem we face.

harlanji · on Dec 23, 2022

Windows S Mode has PyPI restricted to pure Python due to Device Guard. I'm happy to leave it on ($250 laptop). Indeed, Numpy has been a recurring blocker, maybe 3 times now. But with general peace of mind is the only way I've known Python/PyPI, so I'm pretty happy with it. I have a few RasPis that I can use as auxiliary devices as well, which I think is a pretty cool tradeoff, hardware sandbox--not gone there yet, beyond just configuring SSH/xRDP so I'm ready if the day comes.

But I've made a ton of web apps and tools anyway, including a little process launcher that plays the role of poor man's Docker.

It'd be nice if those popular systems had a pure Python capability anyway, similar analogy being software rendered 3D back in the day.

lalaland1125 · on Dec 23, 2022

A simple warning when a library adds a binary blob should be enough. You don't need to ban them entirely.

anony23 · on Dec 24, 2022

They can just load the real payload as a second stage.

meltyness · on Dec 23, 2022

Are they to test this condition for each input to each program?

weakfortress · on Dec 23, 2022

Even simpler solution: Require cryptographic signatures of the developers of projects along with hash-verified downloads via pip.

The problem is a failure to understand security.

anakaine · on Dec 23, 2022

A malicious author could embed malicious code in the package and still get the package signed. Hashing won't prevent this sort of thing on PyPi, it just addresses in transit and alternate supplier attacks.

Blackthorn · on Dec 23, 2022

Requiring anything from open source authors is a losing proposition. Items of interest just won't end up on pypi. Iirc this chain of events already happened on another distribution platform.

josephcsible · on Dec 23, 2022

One of the underappreciated benefits of Richard Stallman getting what he wants would be that antivirus programs could then be updated to flag on all obfuscated code or anti-debugging actions.

dotancohen · on Dec 23, 2022

To what are you referring with "Stallman getting what he wants"? Code being open source?

josephcsible · on Dec 23, 2022

If all software were free as in speech.

toxik · on Dec 23, 2022

It was in fact not anti Python debugging but just run of the mill checks for IDA Pro and the likes.

tasty_freeze · on Dec 23, 2022

Those things you named are just one of the checks it made. The python part of it was also an encode bzip file that offers a bit of debugging headache, then it downloads a pyc file which was run through an obfuscator, which more of a python headache. Your "in fact" is not a fact.

fear-anger-hate · on Dec 23, 2022

The methods this malware uses for anti debugging wouldn't cause headache for anyone that isn't completely new to the subject. Download 10 random python malware samples and you'll notice that probably at least 8 of them follow this exact same packing and execution pattern. Discord hook and laughable end payload are a good indication that whoever wrote this is probably some high school kid.

The only surprising thing about this article is the claim that these type of malware haven't been spotted in pypi before. That would suggest that there isn't much of credible actors trying to spread through pypi at all.

yourapostasy · on Dec 23, 2022

Huh. It never ceases to amaze me when another demonstration is presented to me that “plus ça change, plus c'est la même chose“ in this industry. I suppose it is only to be expected that some of the old anti-piracy techniques found in 8-bit floppy- and cassette-distributed software might eventually find new philosophically-similar implementations in malware in the future.

Some of that self-modifying and anti-defeat code back then were truly works of art, and squeezed into mind-bogglingly small memory and cpu foot prints, and the malware authors will have a field day re-implementing their future cousins in spirit, and some of the greybeards amongst the white hats will get to relive their 8-bit glory days hunting and defeating them.

The article gave a description of a really super primitive technique compared to the last generation of those anti-piracy techniques, but I still see a family resemblance.

gumboza · on Dec 23, 2022

The more I hear this stuff the more I write things in Go with no external dependencies pulled in. I can do 95% of what I need to do without involving a supply chain or downloading anything random off the internet other than the go distribution itself.

shanebellone · on Dec 23, 2022

Anti-dependency mafia, rise up. I feel the same way. I code almost everything from scratch too.

isoprophlex · on Dec 23, 2022

I like the sentiment and I'm usually first in line to ridicule the 'npm install left-pad' crowd, but this doesn't always fly. Python is a great glue language to mash high performance C/fortran components together. One does not simply write sklearn or pytorch from scratch.

shanebellone · on Dec 23, 2022

"Python is a great glue language to mash high performance C"

This is exactly what I'm starting to work through. After 6 years of Python, I've finally hit the limit of what I can do with it. Now I'm working to rebuild an algorithm in C to reconnect to the Python application.

"One does not simply write sklearn or pytorch from scratch."

I also agree with this. Would either be in a product though? Personally, if it's not a product, I wouldn't mind dependencies.

anakaine · on Dec 23, 2022

Yes, they are in at least one product I can think of, and likely more. That product deploys its own conda environment and includes a huge amount of spatial analytical tools. Governments and large private enterprise the world over use ArcGIS Pro, as do many NGOs and education institutions, which is a massive leap forward for both desktop and highly integrated Web GIS work.

I'd be prepared to be a bit of blind money that other industry tools use a similar setup where the python libraries permit an exceptional cadence of development and help place those vendors products at the pointy end of the market.

How they manage dependency security isn't super clear. They're always a couple of version behind, so perhaps it's a CI/CD QA/QC thing which also includes security.

mschuster91 · on Dec 23, 2022

I get the general idea, but at the same time, I don't have the time to write my own libraries from scratch - all modern web standards are complex and most libraries filled with years to decades worth of experience of all the edge cases that crop up, particularly as most standards don't carry a "compliance test suite".

It's one thing if I were paid by my employer to re-invent the wheel, but for personal projects... I don't have that much free time for them in the first place any more, I want to get shit done and not shave yaks all day. When I want a good grind, I'll pack out Factorio or one of the LEGO Switch games...

shanebellone · on Dec 23, 2022

There's a difference in values between those who reinvent the wheel and those who leverage opensource. It sounds like you value time-to-product whereas I value ownership of said product.

There are always risks associated with building on other people's land, platforms, and codebases. However, there are also risks when reinventing the wheel. Both perspectives have advantages, disadvantages, and use cases.

bigDinosaur · on Dec 23, 2022

A compromise is to audit and then pin exact versions, or even copy and paste the code into your project. Yes, this is a clear tradeoff in that you'll lose access to newer updates, but it's certainly worth thinking about. I do it with relatively trivial libraries for things that I know the package has solved various edge cases, is small in scope, and probably won't be updated again, for example.

shanebellone · on Dec 23, 2022

I agree with you, but I'd prefer to reinvent the wheel rather than audit an existing code base.

intelVISA · on Dec 23, 2022

It's reassuring isn't it? Every time something breaks you have easy access to the mfer who wrote it.

shanebellone · on Dec 23, 2022

Exactly!

We're also talking about layers of dependencies. It's a ridiculous approach.

pjmlp · on Dec 23, 2022

I always build my whole computer from scratch from NAND gates all the way up to the full OS, build my own switches, cut the network cables myself, dependencies be dammed. /s

cuteboy19 · on Dec 23, 2022

For python at least, most of the dependencies are very justifiable. The python stdlib is very huge and satisfies most regular programs such as glue code. But for web and ML it is not possible to include these libraries in stdlib nor is it feasible to write it from scratch

shanebellone · on Dec 23, 2022

It's not difficult to write most of it from scratch. It just takes some time and attention.

cuteboy19 · on Dec 23, 2022

It's not possible. Even very basic numpy would be too slow to use, if you end up writing pure python equivalents.

If you import numpy might as well import the entire scipy ecosystem

shanebellone · on Dec 23, 2022

It's absolutely possible. My only dependency is Flask and I'll be eliminating that in time too.

Why do you need numpy for web?

Edit: I will concede that there is no point in retooling ML. Web is an entirely different circumstance though.

anakaine · on Dec 23, 2022

Let's say you are writing an API that works with some particular scientific file types on the back end, and you want to load that data into memory for fast querying and returns. Now, that data is a multidimensional time series for each file. You could spend the next months writing libraries and bashing your head against the wall, or you could leverage the 30+ years of development in that stack that enables you to read these.

Xarray to read, numba for calcs in xarray, pandas to leave it sitting in a dataframe, numpy as pandas preferred math provider. You could write the api componentry from there, sure. Or you could use a library that has had the pants tested off it and covered most of the bugs you are likely to accidentally create along the way.

There's no compelling reason to write everything from scratch. If everyone was taking that approach then there would be no reason to have an ecosystem of libraries, and development would grind to a halt because we, as a collective of people programming, are not being efficient.

shanebellone · on Dec 23, 2022

I see no compelling reason to implement a multidimensional time series for multiple files as a component of any backend API that consumes user (defined) data.

In what circumstance could that be profitable? Even if you batched data, any number of concurrent users would gobble resources at an incredible rate.

anakaine · on Dec 23, 2022

Who said anything about profit? Not everything that exists to be solved, and for which their is a demand is driven by profit. Think: regulation, environmental, NGO, citizen science, academia, government agency, public service. All places where systems can exist that are not for profit, but do grant significant capabilities to their user base.

Also, it's a particularly arrogant point of view to assume that because you cannot see a reason for something to exist that its development is invalid both now and into the future. You've also assumed the data is user defined.

I can also guarantee you that user concurrency is not an issue after some recent load testing, with load capabilities surpassing expected user requests by several orders of magnitude whilst on minimum hardware.

shanebellone · on Dec 23, 2022

I probably should have said economically viable. Handling and manipulating data like that is intensive and thus expensive. If it's not user provided data, why manipulate data with that approach?

Maybe it is arrogant. That entirely depends on whether or not a product or service uses this specific approach -- successfully. Do you have an example?

Edit: I also want to clarify that my comment doesn't suggest that the underlying technology is bad or without use cases; only that it isn't suited for remote (online) processing. It would be way cheaper to manipulate data like that locally.

anakaine · on Dec 23, 2022

Thats the point. It's not user data, and the data cannot be manipulated on the user side without excessive hardware, software, and troubleshooting skills.

Taking that scientific data and making it available in report format for those which need it that way, when the underlying data changes at a minimum once per day, is the more important aspect.

The API is currently returning queries in about 0.1 to 0.2s. They are handled async right the way through. It's fast, efficient, and the end result whilst very early in the piece is looking nice. Early user engagement has been overwhelmingly positive.

shanebellone · on Dec 23, 2022

Ok, great. What's the name of this example web application?

anakaine · on Dec 23, 2022

It's not a public endpoint, and the api is still under dev with interface largely yet to start. So, can't share / won't share. Sorry.

Where it will be shared is among those with an interest in the specific space. That includes government agencies, land managers, consultancies etc. At no cost to them, because what the outputs can help offset in terms of environmental cost dwarves dev cost.

shanebellone · on Dec 24, 2022

This doesn't represent proof of anything.

With that being said, good luck. I hope you succeed. I'd be very interest to read about it. Do share when its public.

pm_me_your_quan · on Dec 23, 2022

Ceres Imaging (Aerial and satellite imagery analytics for farming), Convoy (Trucking load auctions), etc. There are plenty of companies doing very real work that need this kind of heavy numeric lifting.

shanebellone · on Dec 24, 2022

Very cool examples. Thank you for sharing. I'm going to read into them. I'm not familiar with any web companies using this technology so it'll be interesting to dig in.

deredede · on Dec 23, 2022

Flask seems to be a very stable and feature-complete framework (I see about 3 commits per year for the last few years).

At this point isn't it easier and just as safe to manually review the code, pin the hash in a lockfile, and manually review the rare changes than it is to rewrite everything?

shanebellone · on Dec 23, 2022

Definitely. There's nothing wrong with using Flask. It's actually quite pragmatic.

In my case, replacing Flask is purely preference.

shanebellone · on Dec 23, 2022

Can someone explain why this comment is getting downvoted? I believe the statement is accurate. I'm not looking to justify or debate my position, but a clear answer might help me better approach this topic in the future.

cmeacham98 · on Dec 23, 2022

Your viewpoint, to be frank, is extremely naive and plain wrong.

Link me your reimplementations of tensorflow, numpy, and django (with similar features and same or better performance) and we can talk.

shanebellone · on Dec 23, 2022

1 comment beneath.

"I will concede that there is no point in retooling ML. Web is an entirely different circumstance though."

Edit: I just realized the way I use votes isn't necessarily the same and no one is wrong in their understanding.

Your reply connected the dots. Thank you.

throwaway_3850 · on Dec 23, 2022

Standards and requirements will change, bits will rot, and im not expecting any ecosystem, to keep up with comming and going demands.

A better solution imho would be project level capabilities, so you can pull in a dependency but restrict its lib/syscall access, so it would not compile when it turns malicious.

Maybe it will solve at least something, maybe some day.

gumboza · on Dec 23, 2022

Agree. I'd like to see an OpenBSD pledge(2) type system for libraries. So you can mask individual library capabilities rather than just programs. I don't want a web server that can write to the file system and I don't want a CSV reader that can talk to the network.

saagarjha · on Dec 23, 2022

Doing this kind of thing at the library level is generally not very useful, because security protections between things running in the same process are hard to make very strong.

LadyCailin · on Dec 23, 2022

This is a limitation of the particular language/ecosystem though, it feasible in a new language that has this security baked in to the language primitives.

PurpleRamen · on Dec 23, 2022

I don't think the Go-stdlib is significant better than the Python-batteries. For normal stuff, you can build without dependencies in python too. The problem starts when you use more complex stuff, or want to save time by using a lib delivering certain benefits. After all, you can't build and maintain everything by yourself.

gumboza · on Dec 23, 2022

Shipping Go is a hell of a lot easier.

theptip · on Dec 23, 2022

I’m wonder if there is room for a security model based around “escrow builds”.

Imagine if PyPi could take pure source code, and run a standardized wheel build for you. That pipeline would include running security linters on the source. Then you can install the escrow version of the artifact instead of the one produced by the project maintainers.

You can even have a capability model - most installers should not need to run onbuild/oninstall hooks. So by default don’t grant that.

This sidesteps a bunch of supply-chain attacks. The cost is that there is some labor required to maintain these escrow pipelines.

With modern build tools I think this might not be unworkable, particularly given that small libraries would be incentivized to adopt standardized structures if it means they get the “green padlock” equivalent.

Libraries that genuinely have special needs like numpy could always go outside this system, and have a “be careful where you install this package from” warning. But most libraries simply have no need for the machinery being exploited here.

westurner · on Dec 23, 2022

Signed, Reproducible builds from source off a trusted build farm are possible with conda-forge, emscripten-forge, Fedora COPR, and OpenSUSE OBS Open Build System https://github.com/pyodide/pyodide/issues/795#issuecomment-1...

What does it mean for a package to have been signed with the key granted to the CI build server?

Does a Release Manager (or primary maintainer) again sign what the build farm produced once? What sort of consensus on PR approval and build output justifies use of the build artifact signing key granted to a CI build server?

How open are the build farm and signed package repo and pubkey server configurations? https://github.com/dev-sec https://pulpproject.org/content-plugins/

pabs3 · on Dec 24, 2022

The Reproducible Builds project aims to make it possible to not need to trust your build machines, perhaps PyPI could use that approach.

https://reproducible-builds.org/

westurner · on Jan 1, 2023

"Did the tests pass" for that signed Reproducible build?

Conda > Adding packages > Running unit tests: https://conda-forge.org/docs/maintainer/adding_pkgs.html#run...

From https://github.com/thonny/thonny/issues/2181 :

> * https://conda-forge.org/docs/maintainer/updating_pkgs.html

> Pushing to regro-cf-autotick-bot branch¶ When a new version of a package is released on PyPI/CRAN/.., we have a bot that automatically creates version updates for the feedstock. In most cases you can simply merge this PR and it should include all changes. When certain things have changed upstream, e.g. the dependencies, you will still have to do changes to the created PR. As feedstock maintainer, you don’t have to create a new PR for that but can simply push to the branch the bot created. There are two alternatives […]

nektos/act is one way to run a github-actions.yml build definition locally; without CI (e.g. GitLab Runner, which requires ~--privileged access to the docker/Podman socket) to check whether you get the exact same build artifacts as the CI build farm https://github.com/nektos/act

A Multi-stage Dockerfile has multiple FROM instructions: you can build 1) a container for running the build which has build essentials like a compiler (GCC, LLVM) and packaging tools and keys; and 2) COPY the build artifact (probably one or more signed software packages) --from the build stage container to a container which appropriately lacks a compiler for production. https://www.google.com/search?q=multi+stage+Dockerfile

Are there guidelines for excluding entropy like the commit hash and build time so that the artifact hashes are exactly the same; are reproducible on my machine, too?

SQueeeeeL · on Dec 23, 2022

>Libraries that genuinely have special needs like numpy could always go outside this system, and have a “be careful where you install this package from” warning. But most libraries simply have no need for the machinery being exploited here.

My personal experience with any situation where I need to get some crusty random python library to run has always been a situation with a lot of "-y"ing, swearing, and sketchy conda repositories. Usually it's code that was written years ago and does some very particular algorithm that's essential, so any warnings in the pipeline basically becomes ignored by the sheer difficultly of the task.

passwordoops · on Dec 23, 2022

Apologies for the naive or off-topic question. I'm still a relatively new hobby Pythoner, and no formal training in CS.

I clearly get the security risks associated with random libs available for Python. Is this also the case for other languages like Java? Are the dependencies available to them also a relative free-for-all, or are bugs mostly accidental?

Thanks!

_frkl · on Dec 23, 2022

I think there is always a danger, for every language, when you install a 3rd party dependency from a package repoitory. But usually this is restricted to the runtime of the application that uses the 3rd party library (and maybe, depending on the language, the code-paths that are executed).

That's a difficult enough problem to deal with already, but with Python, it's possible to execute code at install time of such a 3rd party library (basically, when you do a 'pip install stuff'). So, you might never have run the application you installed, but you'd still have executed whatever malware was hiding. This is not the case for a lot of other languages. Also, Python allows the execution of code when you have an `import stuff` statement, which is also not the case in other languages, often. But this is not directly related to this, just another 'Python-specific' attack vector.

bb88 · on Dec 23, 2022

python eval() is the root of all evil.

Basically if the library uses eval() it's probably a good idea to avoid it if possible.

zzzeek · on Dec 23, 2022

that doesn't make much sense and there are necessary uses for eval() /exec(), mostly for dynamic creation of code:

For example here's Python dataclasses in the standard library using exec() to create the `__init__` and other methods that go on your dataclass:

https://github.com/python/cpython/blob/main/Lib/dataclasses....

Here's Pydantic using it for a jupyter notebook check:

https://github.com/pydantic/pydantic/blob/594effa279668bd955...

here's Pytest using it to rewrite modules so that functions like assert etc. are instrumented by pytest:

https://github.com/pytest-dev/pytest/blob/eca93db05b6c5ec101...

Here's the decorator module using it (as is the only way to do this in Python) to create a signature matching decorator for an arbitrary function:

https://github.com/micheles/decorator/blob/ad013a2c1ad796996...

All of these libraries are completely secure as eval/exec are used with code fragments that are generated by the libraries, not based on untrusted input.

eval() /exec() are not running executable files, just Python code, the same way all the rest of the package is already doing.

bb88 · on Dec 24, 2022

Right, and each one of those is a potential exploit waiting to happen.

zzzeek · on Dec 25, 2022

please support your assertion. I would also recommend opening CVEs detailing your discovered attack vectors, especially that of Python dataclasses in the standard library, which are in very widespread use. If you do in fact have some insight on how Python dataclasses are an "exploit waiting to happen", I think it's irresponsible to just sit on that information.

dylan604 · on Dec 23, 2022

but python is not the only language to have an eval(), so why is python's in particular the root of all evil?

tiagod · on Dec 23, 2022

It's not literal.

baeaz · on Dec 23, 2022

That makes me think - would it be possible to have a runtime or build time option for cpython that removes eval() ?

theptip · on Dec 23, 2022

If you run a security linter like ‘bandit’ you’ll get warnings for eval and other security holes.

It seems you can’t run bandit on deps, but perhaps if you fork them and build yourself?

If you are security conscious, having a rule that you can only install from a local pypi with packages you have forked would be a more defensible perimeter. But, a maintenance pain for sure.

bb88 · on Dec 23, 2022

Probably. But python missed their chance to nix it with the 2->3 transition.

My favorite case was when a newbie coder used eval() to evaluate something that looked json-ish, which, came from an api request.

dtgriscom · on Dec 23, 2022

Simple and powerful. Gotta love it (until your customer doesn't, big-time).

hybridtupel · on Dec 23, 2022

> Malware that is more stealth-conscious would just stop running without any indication, instead of interacting with external processes.

I always wondered if we could just use this against the malware. E.g. just run a useless process which is named/looks like a debugger and the malware stops itself. Of course that's nothing to be relied on on its own but maybe as an additional layer of defense?

revolvingocelot · on Dec 23, 2022

Makes me think of that "weird domain name"-based ransomware mitigation.

https://www.theverge.com/2017/5/13/15635050/wannacry-ransomw...

dylan604 · on Dec 23, 2022

or adding a russian language pack to your system. some of these are so silly sounding that they are almost unbelievable on first hearing of them.

fear-anger-hate · on Dec 23, 2022

Some EDRs do stuff like adding russian keyboard layout as an alternative, which stops a fair share of 'malware as service' type stealers.

Run_DOS_Run · on Dec 23, 2022

Is this really worth an article? The anti-debugging protection just stops executables using a list of hardcoded filenames (eg. Wireshark.exe).

This is neither interesting nor new and was already implemented 20 years ago by script kiddies.

What's next? A VBS Trojan that takes commands via Matrix instead of IRC?

kortex · on Dec 23, 2022

I thought it was interesting. I've never heard of some of these techniques.

analog31 · on Dec 23, 2022

Yes. Some of us depend on the smart people finding this stuff, before this stuff finds us.