Hacker News new | past | comments | ask | show | jobs | submit login
Python malware on the rise (cyborgsecurity.com)
282 points by vesche on July 13, 2020 | hide | past | favorite | 62 comments

I think a lot of people will read 'python malware' and assume packages; that's not what this is about.

A lot of exploits are two-stage. Stage one is usually the vulnerability, usually written in C given the low-level and tightly controlled instructions required. The exploit breaks security to run an executable or otherwise gain control. Stage two is usually downloading a python executable to grab the goods.

There's nothing especially sinister about the selection of Python for this case over other interpreted languages. Malware authors are just regular developers - they don't want to spend hours trying to hack together a C binary to dump a database when six lines of Python will do it. Python just runs on a lot of platforms, has a lot of mature drop-in libraries, and decent documentation. They use it for the same reason we use it.

The article just makes it sound like malware developers are using modern packaging tools to turn that two-stage exploit into a single-stage. That doesn't strike me as particularly surprising. Teams tend to gravitate towards specializing in one tool when they can. I'd obviously prefer to write a bunch of python than do the same in C, when performance isn't a huge concern (It's the other guy's CPU, after all).

Just seems like a minor observation, rather than some doom trend.

Author here. Thanks for reading & the feedback. I'll try to unpack some of this.

> A lot of exploits are two-stage. Stage one is usually the vulnerability, usually written in C given the low-level and tightly controlled instructions required. The exploit breaks security to run an executable or otherwise gain control. Stage two is usually downloading a python executable to grab the goods.

This seems like a gross oversimplification & commonly incorrect. Often times a "stage one" vulnerability to gain initial access would be network code written in a high level language such as Python or Ruby (see Metasploit). And an executable payload to interact with the system would be generally written in a compiled language like C or C++. My article is detailing the uncommon rise of interpreted languages (especially Python) being used over the past ~5 years as malware dropped on an endpoint in an attack.

> Just seems like a minor observation, rather than some doom trend.

I wouldn't say this is a minor observation or a "doom trend." I'd say it's a very interesting and insightful observation that is worth keeping an eye on. Malicious actors are no longer operating in a world of slow endpoints and lack of resources. They instead are operating in a world of high-speed internet, very fast endpoints, and have a rich ecosystem of open-source tools at their disposal.

I find it highly interesting that malicious code written in interpreted languages, bundled with their interpreters into an executable, are finding their way into the arsenal of high-tier malicious threat actors over the past few years. Just as the web browser is slowly eating away at the operating system, interpreted languages are slowly eating away at compiled languages in a variety of domains- including malware.

Malware authors are just regular developers - they don't want to spend hours trying to hack together a C binary to dump a database when six lines of Python will do it.

It used to be that malware authors (virus writers in particular) were characteristically more "hardcore" than the average developer, as in preferring native code (even handwritten Asm) and clever optimisations to make their software smaller and more "tricky", for lack of a better term. But that was when it was as a whole not as commercialised, so it's not so surprising to see that aesthetic disappear with increasing commercialisation.

It's the other guy's CPU, after all

...and that might be why malware was initially more optimised than average; it spreads more easily when it's tiny and fast, doing its thing without being noticed, than if it causes a noticeable increase in system load that will prompt further investigation and lead to its discovery.

I wonder when we'll see Electron being used for malware...

> It used to be that malware authors (virus writers in particular) were characteristically more "hardcore" than the average developer, [...]

A subgroup of them still operates like that but I feel like "it used to be" might be a bit outdated. It doesn't seem new for malware authors to utilize low hanging fruits from languages to infrastructure. We've had VBA macros that are or spread malware for decades now, it used to be a pretty regular sight in the early 2000s to see low-effort payloads to be written in some high level language and utilize some random IRC server as a C&C for example. Not everything out there is some state actor level APT nightmare, with more developers in every part of the market and even more users that simply don't care enough it seems like a normal development to see stuff like this more often.

Like all software, you choose the language / platform depending on your business goals. Want it to be hidden for ages and slowly leak data? Write in optimised C to minimise footprint. Quick break in and grab everything? Pick your favourite and fastest language to code in.

Interviewed someone last night for my podcast who has written some python malware tools, and there's enough stuff out there to where you can't exactly call it a new concept.

Link? :-)

From the user's submissions, I would assume https://www.symbolcrash.com/podcast/ (but I don't know if the episode is released yet)

Yep, planning on editing it tomorrow

It's with Josh Pitts, author of this tool [1] and another payload that caused lots of go projects to be eaten by Kaspersky [2]

[1] https://github.com/secretsquirrel/the-backdoor-factory

[2] https://github.com/golang/go/issues/16292

Thanks for this, looking forward to it!

> Python just runs on a lot of platforms, has a lot of mature drop-in libraries

More than that. Easy interop with dlls/shared libs via ctypes

Previously it was Delphi, immense bloatware. With python bloat is taken to the next level.

The article mentions that Python malware comes with difficulties but in my experience it has the advantage to be an easy and simple way to write reliable malware. Packaging with PyInstaller to create a single (but large) executable is easy and helps avoiding detection as the interpreter is embedded in the PE (I never went in depth on this topic but it would be interesting to check out).

Shameless plug; I wrote a few popular articles on 0x00sec about Python malware on Windows just to show how simple and easy it is to build either using ctypes to call WinAPI functions or using pywin32 wrapper which makes the whole thing a lot faster.

See part 1 here https://0x00sec.org/t/malware-writing-python-malware-part-1/...

Definitely not the way to go if you have limited memory and need to write tiny shell code but it’s good enough for a stage 2 payload.

Author here: I've seen your guides before, they're really great! I'd say my article looks at the difficulties, but also the great benefits malware authors have by writing in Python.

> Packaging with PyInstaller to create a single (but large) executable is easy and helps avoiding detection as the interpreter is embedded in the PE

If you look down further in the article it explores detecting PyInstaller generated executable using simple YARA rules. So, I'd disagree a bit there. I personally think that Nuitka (talked about in the article) in conjunction with a packer would be the best compilation method to use in-order to evade detection. It's actually quite surprising to me that limited malware samples have been seen in the wild using Nuitka, but as the title of the articles states- it's on the rise.

You’re totally right about Yara. Unfortunately I skimmed through the article pretty fast before commenting as I was on a rush when I read it and missed this part. Just finished a second read, great article and well detailed. But my point about detection is more about the good old VirusTotal submission.

As for Nuitka, I was not able to make it work but I will try again. The alternative I also tried in the past was using Cython to generate C code then compile it but because it requires packaging Python std libs Dlls it was too much trouble and I ran into crashes when running.

I also had bad experiences when using packers because they have a tendency to trigger AV detection just for being packers, like ASProtect. Python malware is definitely a topic that deserves more in depth dive.

Good work!

> If you look down further in the article it explores detecting PyInstaller generated executable using simple YARA rules.

Which can be easily patched out with a simple sed rule as it just uses a text search of the binary.

This is a very interesting article, and it is somewhat surprising to see Python entering the malware space more in recent years. Other compiled languages with sophisticated runtimes and cross platform support fit in quite well (Go especially) but to see Python is quite interesting. I guess these actors are at a point where they can churn it out relatively quickly, and are not too worried about the code being reversed rather trivially.

What I'd want to learn more about is whether or not these Python samples tend to be very large (in terms of actual code, and not just language internals/pyinstaller/boilerplate). I expected the real life samples to be smaller than some of the larger botnets and the like written in these compiled languages, but some of the ones you go in depth on are somewhat surprising.

It's not malware, of course, but the way Dropbox was able to obscure its use of python was interesting and pretty similar to me. https://www.anvilventures.com/blog/looking-inside-the-box.ht...

In short, they shipped a python interpreter that understood rc4 encrypted pyc/opcode files.

For what it's worth Go is also gaining traction, from what I can tell. Malware authors do not appear to be particularly afraid of trying new languages.

I was going to mention Python 3.8's audit hooks[1] as a possible way to catch some of these issues, (like web requests, for example) but when I went to Google to find the link, it also came up with an article explaining how to bypass the audit hooks[2]...

[1] https://docs.python.org/3/library/audit_events.html

[2] https://daddycocoaman.dev/posts/bypassing-python38-audit-hoo...

Audit hooks have a different use case. Here, in this article, we see described a way to package up the interpreter as part of the payload. This is very traditional malware, they 'bring their own binary' so to speak.

A much more modern, and statistically far less common approach (say, top 15% of malware), wants to bring less to the system. Instead, they leverage existing mechanisms for execution - some of this is covered in LOLBAS (Living of the Land Binaries and Scripts).

Interpreters such as Powershell, bash, Python, ruby, even perl, are used by attackers to run their payloads.

There are a number of advantages. For one thing, if you're monitoring the system for new binary executions, it'll appear as just a Python interpreter - often quite normal on a number of systems. You also don't need to set any sort of execution rights, or drop executable files - just a regular, plain old python file.

But the downside is now you're using the system's interpreter, and you have to follow the interpreter's rules. Powershell really kicked this approach off since it was a favorite of malware authors, and Python followed suit with this. As a much newer implementation, with 3.8 being an extremely recent release, it's not so surprising that there are bypasses. Still, you'd be surprised how few attackers will take the time to do so (and how few orgs will monitor their Python interpreters anyways).

I think audit hooks were a great idea and are part of a defence-in-depth strategy.

In fact, I even implemented the variant of audit hooks that informed PEP-551 and deployed those to production for a major platform.

I think that if you were to use audit hooks, you might want to combine them with a code signing mechanism. I worked on two of these for different platforms, and they add another strong layer of defence.

In light of that, the bypass in the article seems somewhat contrived, especially considering that there are at least three alternate techniques you could employ to accomplish the same goal that would be meaningfully more innocuous than using `ctypes.windll`. If you're going to use `ctypes` or `pywin32` or the like, then you might as well write a C-extension module to patch out the audit hooks directly (as Batuhan Taşkaya shows using a simple trampoline toy library I wrote, `libhook`: https://speakerdeck.com/isidentical/hack-the-cpython?slide=3...).

Better techniques:

1. Joe Jevnik and my `tuple_setitem` which uses poisoned bytecode and the lack of bounds checks on `LOAD_FAST`/`GETLOCAL`. (Pure Python.)


2. My `tuple_setitem` using `numpy` raw memory access via `numpy.lib.stride_tricks.as_strided`. (Requires only `numpy`.)


3. Using the `/proc` filesystem on Linux, which gives arbitrary intraprocess read/write access (independent of page permissions.) (Requires only `open(..., 'w')`.)

https://twitter.com/dontusethiscode/status/12818444226285854... https://gist.github.com/dutc/2cc5de0d2f8877b8f463b86e8bd5231...

There are also a couple of techniques you could employ to carry one of these payloads past code signing, some of which are very well known, like the insecurity of `pickle` deserialisation, and some of which are… less well known.

(I have also prototyped using the above exploits to "lift" C code into a Python interpreter, in case there are OS-level defences around `dlopen`.)

However, even taking these into account, I'm a big believer of the value of Python 3.8 audit hooks at PEP-551, but they are technique that requires quite a bit of extra work to effectively employ.

If you're interested in trying to implement audit hooks and these other mechanisms for locking down your execution environment (e.g., you want to mitigate exploits in Python systems, which may run as PID-1 in a containers, where these exploits may try to bring in malware that could exfiltrate data…) please feel free to reach out to me by e-mail or Twitter.

I would be happy to share more with any organisations that are large enough to consider locking things down at this level.

OT: Those graphs* are interesting in that the spikes for Java/C++ seem to align with fall/spring school semesters, and Javascript tends to invert that pattern and have a spike during the summer (internships? personal summer projects?).

* https://www.cyborgsecurity.com/wp-content/uploads/2020/07/py...

I noticed the same thing and wrote essentially the same comment before I noticed yours. There's also a bit of a peak in Javascript during winter break, which is when some schools do "externships."

I do not see Powershell in there either, and we are aware of the exponential growth of the language for writing malware.

This seemed to be more about how to compile/decompile and obfuscate Python code then anything about malware. The two examples were the ability to take screenshots and make web requests were the only two actual potential malware related topics, but even those are fairly basic concepts that have a huge range of applications outside malware. These things are also fairly trivial in most even slightly mature languages.

The section on eval was a little more interesting but still nothing special.

Personally, and this is just my probably uninformed opinion, the biggest thing about Python that makes it useful for malware is its huge, mostly uncurated repository of libraries and addons that are easy to install and use without ever looking at it. This aspect of Python seems likely the most appealing for would be malware writers. The ease of making malicious code widely available without a lot of scrutiny.

I think you're talking about something very different, which is attacking Python's supply chain.

This is talking about malware payloads themselves. I don't agree that those capabilities (taking screenshots, especially eval) are trivial in other languages. Eval in particular makes things trivial since you can basically do:

eval(get_payload()), which is awesome from a staging perspective - the trend in malware is to modularize more and more for a number of reasons (less code to scan for sigantures, new monetization strategies, easier to update, etc).

So having the ability to do runtime, reflective module loading to trivially get a capability like screenshotting is pretty huge.

I think your last point and the article's points reflect two different contexts though. One is attacking python software with malicious libraries, while the article's context is python being used to attack any system, even systems that don't have an interpreter installed.

That being said, the security of PyPi and python packaging in general is certainly another interesting topic. I like to think that so far it hasn't been as bad as NPM, but there have been backdoored packages put out onto the internet. It's bound to happen with any public software repo, and with any project that trusts outside contributors without perfect review.

PyPI's architecture isn't meaningfully different than npm's. Npm has seen more high profile incidents because:

1. Packages tend to be smaller, and the transitive dependency trees of projects corresponding larger. This means there are more single points of failure.

2. More people are using it.

Python, and for that matter most language package ecosystems, have the same problems as js, but many of them have gotten away with it for a bit longer due to (lack of) scale.

This article lost me, what really confuses me is exceptions in malware with obfuscated code example (I would not call this obfuscated code) and "import cv2" example, does malware installs open-cv library on windows in background ? (this simply does not compute)

Author here. The obfuscated code example was to show what a malware author might do to make malware analysis of a Python "compiled" binary more difficult. The code might be obfuscated & turned into an executable before deployed.

As far as the opencv library goes, used by PoetRAT, you can choose to bundle third party packages inside your executable with all the executable generators I mentioned at the beginning of the article like PyInstaller or Nuitka.

Thanks for article and explanation I am less confused now.

I wonder how large malware payload size will be when packaged with open-cv :)

They can get really large, especially if it's a dependency that has many dependencies itself! Glad you enjoyed it, thanks so much for reading.

Quality work! Sidebar, it's crazy to see you on the front page of HN; hope you're doing well man! --t--

When you run PyInstaller, it bundles an interpreter with all the necessary third-party modules, so there's no need to install things. What that cv2 example shows is how with very few lines of codes, an atacker can open the webcam and obtain 10 images (roughly 1/3rd of a second) of camera footage.

However, in all notebooks that have one, that code will make your litle LED indicator right by the webcam turn on, so it's not particularly quiet about it.

Is it just me, or does this read like a friendly howto for a would-be python malware author?

“What I cannot create, I do not understand." -Richard Feynman

I do hope that those reading this will use this knowledge to do good.

Bad actors, especially competent ones, already have access to this knowledge by way of their own networks, communities, and independent research. The only thing hiding it does is prevent the good guys from finding ways to defend against it.

It's already quite easy. I made a python reverse shell in 15 minutes with basically no knowledge, and was able to obfuscate it well enough to fool every single antivirus.

Sure, but that's the standard for just about anything infosec related on the internet. Blog posts tend to reveal what techniques are in use as it's usually in the best interest of defenders.

Furthermore, red teams being kept up to date is also useful, which might be more surprising to those surprised by this post. Knowing obfuscation techniques in particular is plenty useful to developers and cybercriminals alike.

Not much a how-to. It's just talking about Python packaging, which you can use for many purposes.

This has been the case for some time. PowerShell is an even more popular malware interpreter because it is already on the target and its obfuscation options are numerous.

Python is very versatile from a malware perspective, especially toward anything *nix since python is always pre-installed. There are tons of pip packages that can be used to the bad guy's advantage that they don't have to rewrite themselves (e.g. keyloggers, video camera related packages. All they need to do is have a way to deploy it and receive the data, then it's just a matter of hiding the communication. Building loaders, reverse shells and bypassing AV is hell of a lot easier in Python as well.

See PupyRAT, a full-on multi-os admin tool mainly written in Python (2 unfortunately, also it's buggy and outdated), it's a great example. They use a C wrapper around their remote admin tool that is written in Python. Their (C) loader downloads the provided Python payload from an http link, stores it in a specific memory address that gets executed right after. Because it's in memory, it doesn't touch the disk, Unless you are using the Windows payload (which provides multiple options to hide the program using a set of windows' exploits).

If it weren't for your examples, I wouldn't have believed blackhats would waste time using Python. It seems like it would be easier to defend against Python malware isn't precompiled, and if it is precompiled, we're just back to hand analyzing trojans at the assembly level. When you said that SeaDuke was cross-platform, isn't that a huge problem that could easily be detected since the attacker requires target-side interpretation? E.g., more security built into Python about how it is invoked: requiring user interaction to determine whether the user is running something.

The article is totally mistitled. It refers to Python being used to create Windows malware using compilers, instead of using C or somethign else.

Packages are not targeted for now.

Author here. Wasn't my intention to mislead, but I also don't think the article is mistitled. What would have been a better name? This article is about actual Python malware that would affect an endpoint like a remote access trojan (RAT). If the article was about malware within the Python package index, I would have named it "Malicious Python packages within PyPI on the rise!" It could also be a confusing professional domain interlap, as I exist typically within the security world.

Fascinating read. I almost passed up on this article.

Python’s eval() function reminds me almost of Lisp’s eval/apply feature, which is supposedly at the heart of what makes Lisp so special.

I imagined building a program, that I could teach, to eventually write its own programs. But, I figured I would output it to a separate file, and run that file instead.

Most of the exploit scripts are written in python. Easy string manipulation and third party packages (angr, ctftools etc)

One issue is Anki's shared add-ons - https://ankiweb.net/shared/addons/2.1. They're written in Python, and can potentially be malware. Is auditing them the only way?

https://youtu.be/56ciki25j2I presentation at the local hacker connlastvywar on how simple it is to write some shitty but functional malware in python.

geared towards someone who has never coded before

SCYTHE's in-memory client loads an in-memory CPython interpreter/runtime, even, so you do not even have to compile .py to an exe to run on Windows, for example.

Can you tell me more? A google search isn't showing anything relevant

Yeah... I was being hesitant about what to say since I work for the company, to avoid an impression that I'm promoting the technology.

Basically, here's SCYTHE's client architecture: https://www.scythe.io/library/under-the-hood-scythe-architec...

And here's how you would load your python to run on the client: https://www.scythe.io/library/software-development-kit

Python is the new VB. Popular and easy to learn.

Yes, VB is popular back then and is very easy to learn. I wonder why VB.Net is not as popular as C# nowadays given that they have the same access to .NET Framework.

Most of the VB.Net adopters came from VB6, only to find out that VB.Net had very little in common with VB other than syntax. If you had to learn a whole new framework and ecosystem anyway, there were more popular options available for VB refugees out there like C# or Python.

It's a well known problem that many languages (Python, Ruby, Node) have notoriously insecure trust chains in their dependency management frameworks. If more malware is hitting the Python ecosystem, I think it's just a matter of time until someone manages to publish a tainted version of `requests` or some similarly popular Python lib.

I know for a fact a lot of cybersecurity automation mind share is in Python. Curious to see if this new wave of Python malware will make it into any big cybersecurity vendors. I've performed due diligence on a number of cybersecurity vendors that I wouldn't qualify as having good security posture for stuff like this.

Few years ago someone published a trojan version of ``colorama`` with a British spelling ``colourama`` that was found to be mining bitcoins on victims machines!

The article has nothing to do with package management though.

Yes, but malware isn’t limited to windows executables. Note that the article doesn’t call out methods of ingress—these tend to vary a lot depending on who or what an attacker is targeting.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact