> The moment you run LLM generated code, any hallucinated methods will be instantly obvious: you’ll get an error. You can fix that yourself or you can feed the error back into the LLM and watch it correct itself.
But that's for methods. For libraries, the scenario is different, and possibly a lot more dangerous. For example, the LLM generates code that imports a library that does not exist. An attacker notices this too while running tests against the LLM. The attacker decides to create these libraries on the public package registry and injects malware. A developer may think: "oh, this newly generated code relies on an external library, I will just install it," and gets owned, possibly without even knowing for a long time (as is the case with many supply chain attacks).
And no, I'm not looking for a way to dismiss the technology, I use LLMs all the time myself. But what I do think is that we might need something like a layer in between the code generation and the user that will catch things like this (or something like Copilot might integrate safety measures against this sort of thing).
Prompt injection means that unless people using LLMs to generate code are willing to hunt down and inspect all dependencies, it will become extremely easy to spread malware.
The problem there though, is that with PoCs like this, as an attacker you want to have a ping back to your system so that you know the attack has been successful (in this case they probably expected/hoped someone at Cursor to install the package, that's the usual objective in a dependency confusion attack). But what they could have done, is send a less sensitive thing like just the current working directory or current effective user, instead of the whole environment.
What actually changes though in your scenario? Potential bad actor gets RCE on your dev machines, it doesn't really matter what they sent home, you're rotating keys and doing your due diligence either way.
At first I was excited to see that a new tool would solve the Python "packaging" problem. But upon further reading, I realized that this was about _package management_, not so much about packaging a Python application that I've built.
Personally I haven't had many problems with package management in Python. While the ecosystem has some shortcomings (no namespaces!), pip generally works just fine for me.
What really annoys me about Python, is the fact that I cannot easily wrap my application in an executable and ship it somewhere. More often than not, I see git clones and virtualenv creation being done in production, often requiring more connectivity than needed on the target server, and dev dependencies being present on the OS. All in all, that's a horrible idea from a security viewpoint. Until that problem is fixed, I'll prefer different languages for anything that requires some sort of end user/production deployment.
> What really annoys me about Python, is the fact that I cannot easily wrap my application in an executable and ship it somewhere.
You are not wrong, but let's unpack this. What you're saying is that there is a need to make it easy for another person to run your application. What is needed for that? Well you need a way for the application to make its way to the user and to find some Python there and for that process to be transparent to the user.
That's one of the reasons why I wanted Rye (and uv does the same) to be able to install Python and not in a way where it fucks up your system in the process.
The evolved version of this is to make the whole thing including uv be a thing you can do automatically. You can even today already (if you want to go nuts) have a fully curl to bash installer that installs uv/rye and your app into a temporary location just for your app and never break your user's system.
It would be nice to eventually make that process entire transparent and not require network access, to come with a .msi for windows etc. However the per-requisite for this is that a tool like uv can arbitrarily place a pre-compiled Python and all the dependencies that you need at the right location for the platform of your user.
The cherry on the top that uv could deliver at one point is that fully packaged thing and it will be very nice. But even prior to this, building a command line tool with Python today will no longer be an awful experience for your users which I think is a good first step. Either via uvx or if you want you can hide uv entirely away.
To put a finer point on this idea, even if one could be against even this idea of an “installation flow”, work done to improve uv or rye would also likely flow into alternative strategies.
The more we invest into all of this and get it right in at least one way, the easier it will be for alternatives (like pyoxidizer!) to also “just work”. At least that’s my belief.
There are tactical reasons to focus on one strategy, and even if it’s not your favorite… everything being compatible with _something_ is good!
Exactly. And in this particular case pyoxidizer and the work that went into it originally in many ways gave birth to parts of it. The python standalone builds were what made me publish Rye because there was something out there that had traction and a dedicated person to make it work.
Prior to it, i had my own version of Rye for myself but I only had Python builds that I made for myself and computers I had. Critical mass is important and all these projects can feed into each other.
Before Rye, uv, and pyoxidizer there was little community demand for the standalone python builds or the concept behind it. That has dramatically changed in the last 24 months and I have seen even commercial companies switch over to these builds. This is important progress and it _will_ greatly help making self contained and redistributable Python programs a reality.
It depends where and what you are trying to install, but generally for every sensible deployment target there's a tool (sometime more than one) which handles building a installable binary for that target (e.g. an OS specific installer). There's now even tooling to deploy to unusual places such as Android or iOS or in the browser. Naturally in some cases specific packages won't work on specific targets, but because there are standard interfaces if the code is capable of being run somewhere, the tooling for that target should be able give you something that works.
Some of the cameras near me are clearly pointed at scenery. I won't presume anything about why they are pointed at scenery and insecure; but I also won't presume that they are intended to remain private.
Still, I'm surprised they pretty much all have such miserable quality.
I used to have one overlooking the river in my mother's house. I spent a lot of time tuning it so it looked really great, both day and night (at night using really long exposure times). I used high quality webcams at first and then the 5MP raspberry cam (later modded with a better lens).
But even the "1080p" security cams that cost $200 or more have horrible quality compared to those. They're good for security purposes yes, but I wouldn't use them to advertise the views from a hotel.
I really wish there were some really HQ webcams around the world, think 4K or even 8K (so you could zoom in even on a 4K screen), with good night performance, it would be so great to really get a feel for a city. These cams are just so poor.
1. Like you already hinted at, it is really difficult to get right, and I hardly experience any larger website (with multiple teams working on it) that implement it effectively. So while it's great in theory, I'm not sure if it's accessible enough (and therefore effective enough) for most of the world.
2. I'm assuming you're talking about Chrome's SameSite value; it's worth to note that this has been rolled back a short while ago because of compatibility issues in larger government organizations having to be accessible especially now with COVID-19. More info here: https://9to5google.com/2020/04/03/chrome-rolls-back-cookie/
1. It's not really hard to get right, it just takes a lot of trial and error. I.e. you essentially start with the default-src as 'self', and then create exceptions to other resources as you need them. You use the report-uri/report-to endpoints to get reports if either (a) you've neglected to open up a resource you need, or (b) you DO have a vulnerability that someone is trying to take advantage of. While this may sound like a bit of a pain, e.g. if you have multiple teams working on a website that all need to access their set of 3rd party endpoints, this pain is required for good security: it forces you to be explicit about the 3rd party endpoints you allow, instead of the browser just allowing any endpoint for things like script tags, imgs, etc. which is the default now.
2. Note what Chrome is rolling back is the SameSite default change. SameSite has existed for quite some time now, in all browsers, it's just that the default is currently 'None' in Chrome but is changing to 'Lax'. So you can still take advantage of this now, it's just Chrome is delaying changing the default so that it doesn't break sites who aren't prepared for the default change.
So my point is the tools currently available really tighten up the sandbox guarantees of the browser, and make it no more difficult than necessary to build a secure site.
This will keep happening, and not only will SSH And GPG keys be the target, but any interesting data will be stolen.
And the problem is much larger than these typosquatting attacks. Abandoned Github projects taken over my malicious users, rogue Maven/npm/PyPI/what have you repositories, hacked accounts on any website that is used for distributing programs, feature branches in open source projecs that are automatically built on CI servers in side corporate networks, the possibilities to grab data and send it to somewhere on the internet are endless.
One security measure that somehow grew out of fashion over the last years, is at least on application servers, to disallow any outgoing network traffic, especially to the internet (at least any cloud environment I see nowadays allows it by default). This would largely prevent these sorts of attacks from being able to actually send anything out, but also prevent XXE attacks from happening, prevent reverse connections to an attacker host from being set up, make SSRF attacks harder to verify, and so on.
I strongly recommend whitelisting only the network traffic that your application actually needs.
objectified has it already, but to reiterate: you can block outbound traffic initiated on a host without blocking outbound traffic that is a response to externally initiated traffic. This is, for example, what haproxy, iptables, and AWS security group outbound rules do.
I'm deliberately avoiding the term "connection" above because new UDP-first protocols require slightly different handling to determine who initiated what, but most routing/firewall software can deny-initiated-outbound for those protocols as well.
I'm not sure I understand your question correctly, but I'm talking specifically about outbound network traffic. Your API's application servers (where such evil libraries could be deployed) should not be able to have any network connectivity towards the internet. So on that server, you should not be able to do even `curl www.google.com` for example.
GP was asking how you would allow APIs to respond to requests if you are blocking outbound traffic.
I’m assuming if you open a connection for a sync request you’d be fine. What about an async request? I’d imagine a scenario where your API needs to do some processing first, connect to another internal system, and then respond async to the outside system.
Lots of problems are merely I/O related, which can be solved just fine with Python threads. As for number crunching (let's assume that means CPU intensive tasks), you can always still resort to multiprocessing (which can also be combined with multithreading, I fail to see why such paradigms cannot be combined).
CVSS is a common and open approach. CVSS scores that are particularly high, are often directly exploitable. If they aren't, then probably their calculation was done wrong. You can try it for yourself here: https://www.first.org/cvss/calculator/3.0
Having said that, what can lead to a serious attack for your organization is often subject to a combination of factors. Say, there is an SSRF vulnerability in your own application, because the HTTP library you use doesn't parse the URL correctly, so now an attacker can let your application perform arbitrary HTTP requests. But fortunately, the connectivity of your application server is quite limited, so that an attacker can't reach internal systems, can't go to the internet, uses strong authentication to web services it does use, all the good stuff. So now the chance of a successful, serious attack is largely diminished.
Also, it can be quite complicated to know what the exact, real dependencies of your application are. What is the transitive/recursive list of dependencies your application uses? Which of your application's dependencies actually use libraries on your system? And what are _their_ dependencies? I think that cost wise, it is cheaper to make sure your application dependencies, containers, host system libraries, container orchestration tools, etc. are always up to date.
And yeah, I agree that the post doesn't do a good job at all to provide a sane rationale on _why_ you should update. Anyone who has ever administered an operating system knows that security vulnerabilities are found in them every day. But the awareness that a Docker container is subject to the same pace is definitely not present everywhere, and it probably should be.
I'm not sure if that was the idea, but nothing you said refutes what I said. If there is a potential SSRF due to one of those vulnerabilities show that, if there is a potential but unlikely RCE show that.
Just saying that the default node image has 580 vulnerabilities helps no one actually trying to fix these vulnerabilities or assess how to prevent this in the future.