Hacker News new | past | comments | ask | show | jobs | submit login

“Our basic philosophy when it comes to security is that we can trust our developers and that we can trust the private network within the cluster.”

This is not my area of expertise. Does it add a significant amount of complexity to configure this kind of system in a way that doesn’t require trusting the network? Where are the pain points?






> Our basic philosophy when it comes to security is that we can trust our developers and that we can trust the private network within the cluster.

As an infosec guy, I hate to say it but this is IMO very misguided. Insider attacks and external attacks are often indistinguishable because attackers are happy to steal developer credentials or infect their laptops with malware.

Same with trusting the private network. That’s fine and dandy until attackers are in your network, and now they have free rein because you assumed you could keep the bad people outside the walls protecting your soft, squishy insides.


One of the best things you can do is restrict your VPCs from accessing the internet willy-nilly outbound. When an attacker breaches you, this can keep them from downloading payloads and exfiltrating data.

You’ve just broken a hundred things that developers and ops staff need daily to block a theoretical vulnerability that is irrelevant unless you’re already severely breached.

This kind of thinking is why secops often develops an adversarial relationship with other teams — the teams actually making money.

I’ve seen this dynamic play out dozens of times and I’ve never seen it block an attack. I have seen it tank productivity and break production systems many times however.

PS: The biggest impact denying outbound traffic has is to block Windows Update or the equivalent for other operating systems or applications. I’m working with a team right now that has to smuggle NPM modules in from their home PCs because they can’t run “npm audit fix” successfully on their isolated cloud PCs. Yes, for security they’re prevented from updating vulnerable packages unless they bend over backwards.


> You’ve just broken a hundred things that developers and ops staff need daily to block a theoretical vulnerability that is irrelevant unless you’re already severely breached.

I’m both a developer and a DFIR expert, and I practice what I preach. The apps I ship have a small allowlist for necessary external endpoints and everything else is denied.

Trust me, your vulnerabilities aren’t theoretical, especially if you’re using Windows systems for internet-facing prod.


This should still be fresh in the mind of anyone who was using log4j in 2021.

> I’ve seen this dynamic play out dozens of times and I’ve never seen it block an attack.

I am a DFIR consultant, and I've been involved in 20 or 30 engagements over the last 15 years where proper egress controls would've stopped the adversary in their tracks.


Any statement like that qualified with “proper” is a no true Scotsman fallacy.

What do you consider proper egress blocking? No DNS? No ICMP? No access to any web proxy? No CDP or OCSP access? Strict domain-based filtering of all outbound traffic? What about cloud management endpoints?

This can get to the point that it becomes nigh impossible to troubleshoot anything. Not even “ping” works!

And troubleshoot you will have to, trust me. You’ll discover that root cert updates are out-of-band and not included in some other security patches. And you’ll discover that the 60s delay that’s impossible to pin down is a CRL validating timeout. You’ll discover that ICMP isn’t as optional as you thought.

I’ve been that engineer, I’ve done this work, and I consider it a waste of time unless it is protecting at least a billion dollars worth of secrets.

PS: practically 100% of exfiltrated data goes via established and approved channels such as OneDrive. I just had a customer send a cloud VM disk backup via SharePoint to a third party operating in another country. Oh, not to mention the telco that has outsourced core IT functions to both Chinese and Russian companies. No worries though! They’ve blocked me from using ping to fix their broken network.


there's no need for this to be an either/or decision.

private artifact repos with the ability to act as a caching proxy are easy to set up. afaik all the major cloud providers offer basic ones with the ability to use block or allow lists.

going up a level in terms of capabilities, JFrog is miserable to deal with as a vendor but Artifactory is hard to beat when it comes to artifact management.


Sure… for like one IDE or one language. Now try that for half a dozen languages, tools, environments, and repos. Make sure to make it all work for build pipelines, and not just the default ones either! You need a bunch of on-prem agents to work around the firewall constraints.

This alone can keep multiple FTEs busy permanently.

“Easy” is relative.

Maybe you work in a place with a thousand devs and infinite VC money protecting a trillion dollars of intellectual property then sure, it’s easy.

If you work in a normal enterprise it’s not easy at all.


Their caching proxy sucks though. We had to turn it off because it persistently caused build issues due to its unreliability.

I can't be certain, but I think the GP means production VMs not people's workstations. Or maybe I fail to understand the complexities you have seen, but I'm judging my statement especially on the "download from home" thing which seems only necessary if you packed full Internet access on your workstation.

The entire network has a default deny rule outbound. Web traffic needs to go via authenticating proxies.

Most Linux-pedigree tools don’t support authenticating proxies at all, or do so very poorly. For example, most have just a single proxy setting that’s either “on” or “off”. Compare that to PAC files typically used in corporate environments that implement a fine grained policy selecting different proxies based on location or destination.

It’s very easy to get into a scenario where one tool requires a proxy env var that breaks another tool.

“Stop complaining about the hoops! Just jump through them already! We need you to do that forever and ever because we might get attacked one day by an attacker that’ll work around the outbound block in about five minutes!”


In the scenario presented, can't they just exfiltrate using the developer credentials / machine?

Let’s say there’s a log4j-type vuln and your app is affected. So an attacker can trigger an RCE in your app, which is running in, say, an EC2 instance in a VPC. A well-configured app server instance will have only necessary packages on it, and hopefully not much for dev tools. The instance will also run with certain privileges through IAM and then there won’t be creds on the instance for the attacker to steal.

Typically an RCE like this runs a small script that will download and run a more useful piece of malware, like a webshell. If the webshell doesn’t download, the attacker probably is moving onto the next victim.


But the original comment wasn't about this attack vector...

> attackers are happy to steal developer credentials or infect their laptops with malware

I don't think any of what you said applies when an attacker has control of a developer machine that is allowed inside the network.


I was responding more to "Same with trusting the private network. That’s fine and dandy until attackers are in your network, and now they have free rein because you assumed you could keep the bad people outside the walls protecting your soft, squishy insides."

Obviously this can apply to insiders in a typical corporate network, but it also applies to trust in a prod VPC environment.


That is also a risk. Random developer machines being able to just connect to whatever they like inside prod is another poor architectural choice.

What's your opinion on EDR in general? I find it very distasteful from a privacy perspective, but obviously it could be beneficial at scale. I just wish there was a better middle ground.

Not the OP but I was on that side -

They do work. My best analogy is it's like working at TSA except there are three terrorist attacks per week.

As far as privacy goes, by the same analogy, I can guarantee the operators don't care what porn you watch. Doing the job is more important. But still, treat your work machine as a work machine. It's not yours, it's a tool your company lent to you to work with.

That said, on HN your workers are likely to be developers - that does take some more skill, and I'd advise asking a potential provider frank questions about their experience with the sector, as well as your risk tolerance. Devs do dodgy stuff all the time, and they usually know what they're doing, but when they don't you're going to have real fun proving you've remediated.


EDR is not related to the topic but now I'm curious as well. Any good EDR for ubuntu server?

It's a mindset that keeps people like you and I employed in well-paying jobs.

The top pain point is that it requires setting up SSL certificate infrastructure and having to store and distribute those certs around in a secure way.

The secondary effects are entirely dependent on how your microservices talk to their dependencies. Are they already talking to some local proxy that handles load balancing and service discovery? If so, then you can bolt on ssl termination at that layer. If not, and your microservice is using dns and making http requests directly to other services, it’s a game of whack-a-mole modifying all of your software to talk to a local “sidecar”; or you have to configure every service to start doing the SSL validation which can explode in complexity when you end up dealing with a bunch of different languages and libraries.

None of it is impossible by any means, and many companies/stacks do all of this successfully, but it’s all work that doesn’t add features, can lead to performance degradation, and is a hard sell to get funding/time for because your boss’s boss almost certainly trusts the cloud provider to handle such things at their network layer unless they have very specific security requirements and knowledge.


Yes, it adds an additional level of complexity to do role-based access control within k8s.

In my experience, that access control is necessary for several reasons (mistakes due to inexperience, cowboys, compliance requirements, client security questions, etc.) around 50-100 developers.

This isn't just "not zero trust", it's access to everything inside the cluster (and maybe the cluster components themselves) or access to nothing -- there is no way to grant partial access to what's running in the cluster.


This is just bad security practice. You cannot trust the internal network, so many companies have been abused following this principle. You have to allow for the possibility that your neighbors are hostile.

Implementing "Zero Trust" architectures are definitely more onerous to deal with for everyone involved (both devs and customers, if on prem). Just Google "zero trust architecture" to find examples. A lot more work (and therefore $) to setup and maintain, but also better security since now breaching network perimeter is no longer enough to pwn everything inside said network.

It requires encrypting all network traffic, either with something like TLS, or IPSec VPN.

"SSL added and removed here :^)"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: