Hacker News new | past | comments | ask | show | jobs | submit login
Software Security Is a Programming Languages Issue (2018) (pl-enthusiast.net)
116 points by azhenley 85 days ago | hide | past | web | favorite | 118 comments

Here's the counterargument:

Regardless of language, all software has bugs; that's such a banal argument it barely even needs to be made. Offensive software security is the practice of refining and chaining bugs to accomplish attacker goals. You can make it easier or harder to do that, but in the long run, more bugs classes are going to be primitives for attackers than not.

Rust is a fine language to teach people. But what Rust accomplishes for security is table stakes. People building software in Java in 2002 had largely the same promises from their language; Rust just makes those promises viable in a larger class of systems applications. But Java programs are riddled with security vulnerabilities, too! So are Golang programs, and, when Rust gets more mainstream, so too will Rust.

Aside: I'm really frustrated by the advice to validate all input. That's not useful advice and people should stop teaching it. It begs the question: validate for what? If you know every malicious interaction that can occur, you know where all the bugs are. And you don't know where all the bugs are.

Yes, all software has bugs. But how we make things safe and reliable is not by eliminating the bugs, but by changing the process to make those bugs impossible. We cannot eliminate all bugs, but if for each one we eliminate the flaw in the process that enabled that bug, we get fewer and fewer bugs.

This works in the real world. For example, see airliners.

A large part of my job as a language designer is to look at the causes of certain kinds of bugs, and find ways to design them out of the language. There's no reason today that array overflows should still be happening.

I'm confident I didn't word my original comment well, because people are obviously reading it to say "memory safety doesn't matter". It does. It's an absolute requirement for secure software development done at any reasonable scale. But it does not produce "secure software".

It seems like a counterargument to your counterargument could be to treat memory safety as an example of a thing that languages can help with to eliminate classes of bugs. There are other such things too. We could regard memory safety as a great success in this area that programmers can easily appreciate.

Then a follow-up question would be what indications we have that other classes of bugs can or can't also be eliminated with help from languages. (Of course there's also potentially a lot of space between "it's inherently impossible to write programs that are incorrect or unsafe in a specific way", which is something that things like memory safety and static typing can help with, and "it's possible for a programmer to do extra work to prove that a specific program is correct or safe in a specific way", which has been a common workflow in formal methods.)

I would be interested in your view of this; after memory safety and perhaps type safety, what kinds of bugs do you think could be eliminated by future language improvements, and what kinds of bugs do you think can't be eliminated that way?

I don't have a clear take on this so much as a "food for thought" take. I think it's obviously good to kill bug classes. But I also think programs built in memory-safe languages are so insecure that nobody should feel comfortable running it; omnia programmae purgamenta sunt.

I remember arguing on Reddit about the value of serious type safety languages, and the notion that they could eradicate things like SQLI bugs (ironically: the place where we still find SQLI bugs? Modern type safe languages, where people are writing raw SQL queries because the libraries haven't caught up yet). I'm both not sold by this and also stuck on the fact that there's already a mostly-decisive fix for the problem that doesn't involve the language but rather just library design.

Setting aside browser-style clientside vulnerabilities, which really are largely a consequence of memory-unsafe implementation languages, what are the other major bug classes you'd really want to eliminate? Scope down to web application vulnerabilities and it's SSRF, deserialization, and filesystem writes. With the possible exception of deserialization, which is an own-goal if there ever was one, these vulnerabilities are a consequence of library and systems design, not of languages. You fix SSRF by running queries through a proxy, for instance.

Thanks! I'll take that as food for thought.

Also, the Latin plural of "programma" should probably be "programmata" (which also works for agreeing with the neuter plurals "omnia" and "purgamenta", whereas "programmae" would be a feminine plural which wouldn't agree with the neuter adjectives). Most often Latin uses the original Greek plurals for Greek loanwords.


I posted the sticker I made with this on it and Ben Nagy took me to task for my 10th grade Latin skill, and I felt like I held up okay! I remember there being a reason programmae worked, but I forget what it was; if I know me, I'll spend 20 minutes later today scouring Twitter for the conversation and then posting a 9-paragraph comment about it here. :)

Even if programmae is plausible, it would have to be "omnes" rather than "omnia" to agree with the feminine plural. (In retrospect I agree that it's going to "purgamenta" either way because that's a noun, not an adjective.)

If that's the biggest screw-up I made, I'm impressed with how well my Jesuit High School Latin has held up! I got mostly D's in it!

I found your Twitter thread, and I don't think Ben's last comment was accurate!

"programmers go home"? I'm guessing.

> where people are writing raw SQL queries because the libraries haven't caught up yet

SQL is a pretty good way of representing queries, I don't see how a library could be constructed that could do better. But then as an SQL guy I have a hammer so Omens Clavusum Est.

Edit: plural, so clavusuma? clavusa?

All software is trash.

> We could regard memory safety as a great success in this area that programmers can easily appreciate.

No one wants the overhead on current hardware but C can be made memory safe with bounds checked pointers (although not particularly type safe without introducing a lot of code incompatibilities).

It's important to distinguish actual language advantages from indulging the common fetish of wouldn't-it-all-be-better-if-we-rewrote-it.

There has been fairly little rigorous study of programming factors leading to more reliable software (esp in the kinds of environment most software we use is created and used in, as opposed to say aerospace).

Wouldn't it be sad if enormous amounts of effort were spent rewriting everything in a new language only to later discover that other properties of that language, such as higher levels of abstraction or 'package ecosystems' that cause importing impossible to audit millions of lines of code to get trivial features, lead to lower reliability than something silly like using C on hardware with fast bounds/type checked pointers?

> No one wants the overhead on current hardware

Well it depends. That overhead may be quite acceptable in many places. Certainly not 'no one'. However you've not quantified what that might be, IIRC some guy turned on int overflow/underflow checking and it only cost 3% extra time. Guess that's down to speculation & branch prediction.

I don't like any overhead in time or hardware either, but measure first.

> I'm confident I didn't word my original comment well, because people are obviously reading it to say "memory safety doesn't matter". It does. It's an absolute requirement for secure software development done at any reasonable scale. But it does not produce "secure software".

"Secure software" as an absolute, objective construct doesn't exist, so that's not really an interesting argument.

What's interesting is if it produces software which is more secure, and I think it's a reasonable argument that eliminating memory safety issues does just that.

Yes, I'm puzzled by your posts in this thread. It looks like you are alternating beween pushing a Nirvana-fallacy strawman (where "secure" is a binary variable and everything not perfectly secure is trash) and treating "secure" as a spectrum...

One can't defend against unknown unknown, but mitigating known issues (input validation) and delegating to tools that know better than you (rustc) are both valid strategies to produce software that is more secure upfront (and I know you know it, eh)...

We already changed the "process" in order to eliminate bugs two decades ago...

What security issues have been recently solved by programming languages?

Things like unsafe memory usage are an anomaly. Most other security issues can't even be represented at the PL level.

Every security issue can be represented at the language level. It's just a matter of building a more constrained, domain specific language, that encodes the requisite security properties.

And then it's a matter of begging others to use the said constrained language, because it's a PITA to use :)

Well, yes. What people forget to mention is that security always has a price. That's why less safe languages are more popular.

Less safe languages are not more popular at all. Memory safety was a huge selling point for languages like Java, Python, etc.

The cost in those cases is performance. The cost is offset massively by productivity.

It's less about direct costs and more about constraints and priorities.

LINQ in C# reduces how much SQL people write in C#, which reduces the number of sql injection attack surfaces.

It’s incredibly simple to parameterize plain ADO.NET SQL statements as well; LINQ and EF are not required to prevent SQL injection when working in C#/.Net.

I guess I wasn't clear. Most advances in security don't make it much easier to write secure code. They make it hard to write insecure code. And I'm not arguing LINQ makes it much easier to write secure code(that was already pretty easy), but it makes it much harder to write insecure code.

Imagine you tasked two rooms of 100 jr devs to query a database, one with EF, one with ADO.NET. Do you honestly think the same percentage of jr devs would have written code open to SQL injection?

string queryString = "SELECT ProductID, UnitPrice, ProductName from dbo.products " + $"WHERE ProductName = {userSuppliedName} " + "ORDER BY UnitPrice DESC;"; SqlCommand command = new SqlCommand(queryString, connection); SqlDataReader reader = command.ExecuteReader();

I've met a ton of devs who would write this code without thinking(some even with quite a few years of experience.)

I agree with your last point. A lot of programming advice boils down to "do the obvious thing to avoid the obvious issue" and sidesteps the point that it's the "unknown unknowns" that really ruin our day.

'Taint mode' ala Perl[1] turns non-validated input into runtime errors, rather than security bugs. Crashing is nicer than being pwned.

"You may not use data derived from outside your program to affect something else outside your program--at least, not by accident."

Bugs still exist (as well as bugs in languages themselves), but languages can help mitigate the surface area.

[1] - https://perldoc.perl.org/perlsec.html

The validation's still done by the programmer. If they fail to validate correctly or the validation requirements had a bug, taint mode does nothing to help.

True, they can do an incorrect validation but at least they can't to forget to validate entirely. That's a positive step.

> Aside: I'm really frustrated by the advice to validate all input. That's not useful advice and people should stop teaching it.

Do you realise this reads as "don't bother validating your inputs"? Surely that's not what you meant?

> It begs the question: validate for what?

All inputs necessarily follow some protocol. Command line options and configuration files have syntax. Network packets have sizes & fields. Binary files have some format. So you just make sure your input correctly follows the protocol, and reject anything that doesn't.

Now the protocol itself should also be designed such that implementations are easily correct and secure. In practice, this generally means your protocol must be as simple as possible.

And it's probably a good idea to make sure your parser is cleanly separated from the rest of your program, either by the OS (qmail splits itself in several processes), or by the programming language itself (with type/memory safety).

That's a silly way to read what I said.

Of course it is. But no matter how I read that sentence, I cannot parse something someone of your calibre could possibly have meant.

Unless I missed something, you couldn't have meant that input validation is useless. You most probably didn't mean that input validation is superseded by some other technique, and therefore best ignored. You couldn't possibly think that everyone already does input validation, and therefore don't need the advice.

But then I wonder. Why saying "validate all your inputs" is not useful? What would you say instead?

Why would I engage in an argument premised on how unreasonable the argument is? You said: "you couldn't have meant input validation is useless". You're right.

Loup is right. No need to be super defensive.... Your initial post reads very wrong. I also was very surprised. It would help if you clarified

I'm not feeling defensive so much as aware that I'm talking to someone whose goal isn't to understand what I was trying to say, and I'm not especially interested in trying to clarify to them.

Look, I know you and I have some history. But I assure you, I'm genuinely trying to understand.

Besides, it's not just about us: other people, (including @carty76ers apparently) would like to know what you would advise instead of "please validate all inputs".

This is a weird and kind of creepy message. I'm reacting to the comments you wrote on this thread, not some personal history you think we have. You keep writing things like "surely this is not what you mean" (and, of course, it isn't) and then continue to argue against the argument you imagine I must not? or must be? making. This doesn't seem productive and I'm not interested in continuing. Sorry.

Sorry, that was uncalled for. I guess that for you, it was Tuesday.

The reason I kept guessing, was because you kept not telling. Until then: https://news.ycombinator.com/item?id=21719065

Finally something I can argue with.

> Aside: I'm really frustrated by the advice to validate all input. That's not useful advice and people should stop teaching it.

This suggests teaching what validation means, not abandoning the idea of validating. In other words, be especially aware that you're crossing a trust boundary when accepting input, and use proper validation algorithms, like proper parsing instead of ad-hoc pseudo-parsing using regexes.

I'm hoping that parser generators get more usable. I completely concur that using parser-generators to create safe parsers that will accept all and only valid input is a way to eliminate a large class of bugs.

But getting developers to use parser generators has been a struggle because I'm finding it hard to give dev team tools they can use and integrate into their build systems. It also doesn't help that most developers have forgotten anything they learned about formal language theory, so it's hard to communicate the value of using parser generators.

But we are getting there. Slowly, but there is progress.

The counter counter argument is that while you can write bugs in any language, there are a huge number of bugs that simply wouldn't have existed in a safer language. I write ruby code at work and every time I find a bug I think "Would this bug have been possible in rust" And most of the time the answer is no because the bug is because of a typo or a nil value where it shouldn't be or should have been handled.

Your typos and nil values are only relevant if they somehow lead to security issues. Not impossible... but not the biggest concern either.

There was a bug in a php app where someone had typed `flase` which evaluated to `"flase"` which is truthy so the authentication check passed.

That last point is something I feel really needs to be driven home.

At my company we often do bug bounty and hacking events where we have people attempt to break into our systems. Some of our engineering team members always make a big deal about making sure only certain things are in scope and that we only allow hackers to break things that won't cause major issues.

I always tell them that this is impossible. Sure, we can tell the hackers to only target certain end points, but we have no way of knowing what downstream things will be effected, and if we can't be sure there is no way the hackers could know.

If we knew for certain what exactly could be broken by the hackers, we would know exactly where are vulnerabilities were. If we know that, why are we paying hackers in the first place?

Sure enough, at one of our events a system waaay out of scope ended up breaking when a hacker triggered a bug on an in scope system. A bunch of our engineers were upset, but the hacker had no way of knowing that his api request would trigger a series of further actions in systems leading eventually to the breakage. Any system connected to an in scope system is automatically going to be in scope.

Are you saying that memory safety is valuable or not? If you do think memory safety is valuable, then I don't see how you're disagreeing with the author. Nobody is claiming that memory safety eliminates all bugs.

Memory safety is valuable! I wouldn't want to implement anything in a memory-unsafe language if I could avoid it, and, thanks to Rust, I can virtually always avoid it now. There's ways to make bugs easier to chain and exploit, and memory unsafety is a big one.

But, while I know nothing is a panacea, memory safety is especially not a panacea. We have over a decade worth of experience with software built principally in memory-safe languages; it's hard to find a modern web application --- increasingly, it's even hard to find a modern mobile application --- built in a language as bad as C. And, as you know, all these memory-safe applications are riddled with vulnerabilities. They don't have memory corruption vulnerabilities, but I'm still game-over'ing app pentests, and very unhappy with myself when I can't.

I'm not sure I get what you're saying here. What do you mean you're "game-over'ing app pentests"? You organize penetration tests and are usually able to expose vulnerabilities when you do this?

Yes, I've been doing software security full-time since 2005, and was a vulnerability researcher prior to that. I speak for basically every software security person when I say that memory safe languages are a positive development, but applications remain susceptible to vulnerabilities that result in total compromise.

Makes sense, thanks. I'm not sure how anyone could suggest that vulnerabilities don't exist if you use a memory safe language. That's just demonstrably/factually not true...

I'm reacting to the actual post, not the comment thread. The post sets out to teach software developers about "software security", and its answer to that problem is "use Rust and validate user input". I mean, using Rust will at least actually help. But it's nowhere close to a synopsis of the whole problem.

It's a single course. I think no one, not even the author, expect a single course to cover enough ground to ensure that students can write secure applications moving forward. For instance, the author has a "formal verification" category on their blog: http://www.pl-enthusiast.net/category/formal-verification/

But they do think (like Daniel Bernstein and Edsger Dijkstra), that the usual "pentest & patch" does not work. http://www.pl-enthusiast.net/2015/09/30/penetrate-and-patch-...

And I'm critiquing the premises of the single course. What's your point?

I've talked to more than one university professor in the last 2 years about what a first and second course in software security should look like, and if there's a theme to my feedback, it's "not like this; this is what people thought it might look like in 1998".

You appear to critique the premise that most security comes from using the right programming language and observing a few patterns, like validating all inputs.

I was saying this is most probably not the premise of the course. That just like you, they don't believe they're teaching a panacea, just something that will help.

That assessment could be wrong, though. Unlike you, I never discussed security with university professors, and I know nothing of their misconceptions. Could you offer a gist of what they teach, and most importantly what they should be teaching instead?

And indeed, most security doesn't come from using the "right" language, nor is "validate all inputs" especially helpful advice (in fact, it's distinctively unhelpful, and is at times actively misleading). You can certainly use the wrong language, and that will harm your security, but for the most part almost nobody writes C anymore.

It's all about bug classes. We can use safety guarantees to eliminate certain bug classes at the language level, but some concepts are so high-level that they are independent of language, and their corresponding high-level bug classes are not preventable with a language-based approach to security.

For example, plan interference always arises in high-level planning, so no language is going to eliminate that bug class without somehow completely reinventing the philosophy of planning and logistics.

We can certainly ask that languages do better around features like FFI, string-building, and error-handling. We keep making the same mistakes. Memory safety is probably the biggest conceptual leap forward in our lifetimes and we are still not yet in agreement as a community that it is clearly good.

"Memory safety is probably the biggest conceptual leap forward in our lifetimes and we are still not yet in agreement as a community that it is clearly good."

It's not that we're not in agreement. It's just become standard for most developers so they don't care about it.

Memory safety is only still relevant in the context where one is using C and C++ and maybe Obj-C. And in that context many do not agree that Rust is the solution for them, even if they appreciate memory safety otherwise.

I don't think memory safety is actually a controversy anymore.

A secure language is any strongly typed language that caters well to static analysis. Enterprise (and up to some point open-source) tools exist that are able to capture a vast range of security issues in code with the help of in-depth static analysis. I'd rather consider Java, .NET, C, or to a limited degree C++ as secure because of this.

Our compilers should long have included static analysis on par with optimization efforts, because - as you say - people tend to produce errors in code. Strong type systems and memory safety are a very nice first step though.

I am certain this will change over the years, but then the trend (that I am also prone to) is to go dynamic for the sake of productivity anyways.

>any strongly typed language...I'd rather consider Java, .NET, C, or to a limited degree C++ as secure because of this.

Java, C# (not all .NET languages) are both quite strongly typed (it would be nice if there was no implicit toString conversion in C#, I think Java has this, too).

F# is a bit better than C# in terms of type system (especially exhaustive pattern matching on discriminated unions leads to a lot more potential type safety).

C or C++? They are happy to implicitly convert a lot of things, they're statically typed, but very weakly so. And in terms of memory safety C strings and arrays are the reason why static analysis is absolutely necessary here.

>Our compilers should long have included static analysis

The Rust compiler is doing a lot of what would be considered static analysis. And if you're dealing with normal strings and arrays you're not prone to C-like buffer overflow behavior. If you run in debug mode then you even get assertions on integer wrapping. This IMHO is also the biggest issue with C/C++. The fact that the compiler still allows so much to go through and only the static analyzer gives you a warning. Big companies rarely have code bases where most of the static analysis warnings are even solved. And it's hard to hire only "disciplined" programmers.

"A secure language is any strongly typed language that caters well to static analysis".

Where is this definition coming from? Seems arbitrary.

Static tools capture the same old range of security issues which is only mostly relevant for C and to some extent to C++ and also Objective-C and unsafe Rust or unsafe Swift.

The API matters enormously. Java is a strongly typed language that is reasonable amendable to static analysis.

Yet the javax crypto libraries are routinely misused. ECB defaults. People fucking up IVs. The whole shebang.

Counter counter point: SQL injections and buffer overflows now require nearly intentional effort to introduce into a codebase, whereas they used to be the default.

The question is: when we find a common class of vulnerability, what's the best way to deal with it?

We find SQLI regularly. We don't find buffer overflows very often! That bug class has been mitigated. It's better to eliminate bug classes, and memory safety is a good idea. But it's a good idea we've had in mainstream software engineering for coming up on 2 decades, and your default assumption about any piece of software should still be that it harbors grave vulnerabilities.

Yes. Buffer overflows were mitigated at the language level. We use languages instead of 'just bytes'.

Buffer overflows are best mitigated at the language level. Other vulnerabilities aren't.

> It begs the question: validate for what?

Functions must only accept input which they can properly act on.

A function serving web pages needs a different notion of String than a database function. There are Strings which are safe for a SQL use but unsafe to serve on a web page, and vice-versa.

Validation is restricting your input to only those things your function can act on.

I'm not saying SQLString is a bad idea. It's a good idea. I'm saying that the strategic advice to "validate input" didn't help you here, because to get to SQLString, you had to know specifically that there was a string-based SQL attack to mitigate. "Validate input" is like saying "know all possible bugs". Obviously, you should mitigate the bugs you know about! You just shouldn't pretend that telling someone to "validate input" is going to accomplish that. It's like telling people "don't have bugs".

What’s your alternative high-level advice then?

For generalist developers who have to get things done: learn at least a little about the major classes of security vulnerabilities that impact your development environment, and select libraries designed to mitigate them.

For students: learn a lot about the most important classes of security vulnerabilities, of which memory corruption is one important example but just one, and then take the time to learn how to exploit at least simple variants of all of them in a realistic setting, to cultivate the mindset needed to think critically about software security.

Don't write anything in C. Sure. But really almost nobody does that anymore anyways.

But it's the typechecker's job to make sure you don't give an SqlCodeString to a function which takes a HtmlCodeString. Surely then a person writing such software isn't validating anything. So what should be validated?

The transition from String to SqlCodeString is the validation.

The banality of saying "all software has bugs" to begin an argument can only serve to distract. It's software whataboutism.

Yet some languages are extremely helpful in this regard: see for instance F* and the HACL* library.

> all software has bugs

Nope. See (for example) CompCert, SeL4, ...

> Aside: I'm really frustrated by the advice to validate all input... It begs the question: validate for what?

If your software asks users to input their age, to validate that, you would discard any input not in the set 0123456789, and any valid input (a number) that is less than 1 or greater than 130.

It's not so much about input validation as it is about sanity checking the input. Is it sane? If so, then you should accept it. It may still be incorrect (user input error.. entered 24 rather than 25) but it should be safe to treat this input as an unsigned 8 bit number and manipulate it as such.

The difficulty is that some inputs have a large, varied set, but even those can be bounded (and are) in the real-world. So if someone enters a first name that is 500 characters long, that should fail sanity checks.

The problem with CS people is that they obsess over edge cases (100% correct and verifiable solutions) sometimes when they should not. I don't blame them for this, as that's a large part of what they were taught to focus on in school.

User input is not an algorithms problem that needs a 100% correct and verifiable solution, it's real-world, can be reasonably bounded and good enough solutions are sufficient. Edge cases can be handled manually and added to the existing solution, too, if they are more common than what they initially seemed.

Two malicious Python libraries caught stealing SSH and GPG keys: https://news.ycombinator.com/item?id=21701488

Raise your hand if you are executing code in your project that was written OUTSIDE your organization, and has never been reviewed by people within your organization.

This is the ham sandwich problem.

If a stranger walked up to you on the street and handed you a ham sandwich would you eat it? I would venture to guess that most of you would NOT. However many of us are all too happy to grab some random chunk of code off the internet and shove it into production without a second thought.

Personal and social information of 1.2B people discovered in data leak https://news.ycombinator.com/item?id=21606415

Is this another ham sandwich from a stranger that has been eaten? How often is the "bug" implicit trust and poor design or default behavior?

What about Specter? A hardware level exploit was bound to come up again, we have had hardware bugs before (1994 pentium math bug) but exploitable ones are "fairly rare".


The old mantra "security through obscurity" is true, but it has some serious validity issues when there aren't enough eyeballs on the software we are already running, and were generating more at a rate where people can NOT keep up! (This doesn't address the fact that we are now building opaque boxes with ML that NO ONE understands accurately).

My favourite comparison for all the 3rd party library use is, if you walked in to the CEO of your company and said

"we want to hire some people with no background checks, no interviews and no idea where they live to write code for our critical line of business application. We're then going to put that code into production without reviewing it, and regularly update it without reviewing the updates. Oh also we won't have any form of enforcable contract as come-back if something goes wrong."

You'd, at best, be laughed out of the room.

Yet that's exactly what pretty much every company does with 3rd party libs.

On the contrary, you'd be asked what the cost benefit analysis was vs doing it in house, then told to continue.

CEOs outsource critical things without meaningful oversight all the time.

The point I was flagging up was the dissonance of corporate hiring policies relative to their use of third party code.

For hiring, no chance you'd get that policy past corporate HR in any large company. Try hiring a developer sight unseen to work remotely with no contract in any large organization and see how well that goes.

Yet companies effectively do just that with 3rd party library use. the reason the CEO doesn't do anything isn't likely to be because they've made an informed risk decision on the topic, it's because no-one is telling them the risks :)

I work for a 10,000+ person public company that just outsourced critical business functions via a 5 year contract that doesn't have any meaningfully enforceable description of the work to be done.

If you're trying to talk about hiring for full time employment as opposed to contract work... what happens there is as long as people are able to get through whatever idiosyncratic hazing process was involved in hiring, they're going to be at the company for at least a year. It's perceived as hard / risky to fire people, even if they can't program their way out of a wet paper bag.

This stuff happens all the time, it's not that different from evaluation and use of third party code. "This project has 500 stars on github, it must be good." "This guy used to work for Google, he must be good." Now you're stuck.

In the first case you still have a contract, and therefore contact law in your country applies. Blatent breach like "they wrote code that stole all our SSH private keys and then they deployed cryptocoin mining software to our systems" would be covered, regardless of how bad the contract is.

Same with hiring, the person may or may not be able to code, but active malice would likely result in firing, and the code they write should be subject to review before being put into production.

Of course you can argue "hey where I work hiring is trash, we write bad contracts and have no internal standards, so this 3rd party stuff isn't much worse" but I'd suggest that's not an argument most companies would make publicly about their processes.

Whether they'd make the argument publicly or not, it's still true that it is the reality, and the company I currently work for is better than a lot I've seen.

Point taken about active malice... but I've also seen companies with mostly in-house code cover up instances of rootkits on production servers and malfeasance related to credit cards.

I'd rather companies use third party open source crypto, for instance, even if it sometimes gets compromised, because it's a lot more likely to come to light.

It comes down to incentives. I maintain that the problems for a system occur in proportion to how far the user is from the customer.

In the sandwich case, the user is the customer. You wouldn't eat the sandwich, because it might make you sick.

In the software library case, the developer sees the benefits of using a library, but rarely sees any penalty if it goes bad. They'll earn the same paycheck, and probably even work the same hours. They might not even be the same team that has to deal with security issues in production. They get the benefit of saving time, and accept none of the responsibility.

This may be a field where future machine learning algorithms might shine: Code vetting. It would be hard but much more interesting than classifying cat videos. The amount of code involved in software projects will not decrease and if we don't vet it ourselves, we need something that does it for us. Even if we wanted to vet code ourselves, we might stumble over complexity. Will my software fail in interesting ways if I use library module X this way?

Security issues, like other emergent system properties, can arise at any layer of the stack.

While code level issues should absolutely be a focus in the SDLC, it's common to find security issues crop up from:

* Hardware, kernel, OS, package, and library vulnerabilities

* Component integration / API contract misunderstandings

* Transitive trust between services and third parties

* Accumulation of access over time

* Demos, hotfixes, and workarounds that are somehow now mission critical

Even poorly designed business rules create huge number of security issues. The whole stack could be perfectly bug-free and you’d still get those.

It can get worse. How about deliberately designed features that are security bugs? I'm looking at Microsoft's "sure, we'll execute any email attachment that the user clicks on, because that's more convenient!". Implementation language wasn't going to save you there...

Credit card numbers as customer identifiers on printed and emailed documents? Seen it.

That can be summarized as: unexpected interaction between independently secure parts.

Just because you've written secure components in a safe language doesn't mean you don't have security issues when you run them together.

I loved the perl taint mode. In this mode, perl understands what data originates outside the program and puts restrictions on how you use it.

Perl also has strict mode, to tighten up your programming. Without it you don't have to declare variables, but on it requires variables to be declared before use.

More languages should help programmers like this - sort of a ladder to go beyond just low hanging fruit.

Python now has MyPy, a static type-checker which can be tuned to be more or less strict. Pretty cool to see what can be bolted on to a dynamic language.

Perl has various levels of bolted-on type-strictness too AFAIK, I don't have much experience with it due to being stuck on an older version professionally.

I personally think it's a great idea. Loose types can get a POC or even early production models up and running quickly while you're changing your opinions on the data every hour, then once it grows to a size that's hard to reason about and static analysis can really pay off, start tightening it up.

We need to kill the cliche that language/framework doesn't matter but we also need to understand, it still won't solve all problems.

I wrote about a similar topic from a web application security point of view: Why Framework Choice Matters in Web Application Security* ( https://www.netsparker.com/blog/web-security/why-framework-c... )

Also today there is enough data in the industry to prove this argument beyond any doubt for web applications.

* original article is written about 11 years ago or something this is a republished version

I disagree with the premise that Software Security is a language issue.

There are a lot of basics in this post.

However, this is really only a small fraction of the issue in application security.

Just thinking back to some recent major breaches recently in the headlines, we have

  * failure to update (Equifax)
  * unencrypted backup files (Adobe breach of long ago)
  * A long-ago root-level compromise of 90 servers of a giant bank.
  * The Target breach: vendor access to network
  * Numerous breaches related to improper setup of AWS
Also, two interesting SSL/TLS vulnerabilities had nothing to do with anything a language design can address. The GOTOFAIL and HeartBleed. In fact, someone illustrated how to make the same error in Rust (and promptly got downvoted).

A good view of front-line security problems is addressed in https://www.youtube.com/watch?v=_4vSurKPl6I (Attack Oriented Defense.)

I have audited applications in many languages, from C, Clojure, .NET, Java, Perl, and Ruby. Vulnerabilities found did not relate to langsec at all.

To me, the headline is misleading. Security is part of the daily programming job, not just of the security specialists. But as a programming language feature, I’m not so sure. Certainly some languages present a large attacking surface than the others (like generic pointers or pointers arithmetics) but in general, security is a product of process, not a feature per se. It’s naive to believe when I program in a certain language, my program will be automatically secure (or at least more secure than programs in other languages).

The programming language is part of your process. To put a finer point on it, your type checker is collaborating with you on a process that results in better security than you would have otherwise.

The article starts out talking about a programming languages course, then tries to justify including security therein. I think they're really just hammering home the PL-specific case of your point that "security is part of the daily programming job".

The defenders choices are all like this: you can fix some aspects with each, none are sufficient alone. By choosing a non-footgun you eliminate some classes of problems. It's still important to do it.

Agreed that post-hoc security cannot work very well. Hope it becomes more and more obvious industry-wide.

One thing I do to practice "continuous security", is to accompany each code review with an additional code review, solely focused on security. Else it's too many balls to juggle when doing a general-purpose code review.

Do you know of similar simple yet effective techniques?

> Why teach security in a programming languages course? Doesn’t it belong in, well, a security course?

Because most software developers have no idea what security is. By most I mean almost all. This point is easily proven. Ask any software developer what security is and compare their answer against the standard answer. All security courses and certifications I have seen define security in exactly the same way.

If people wanted to take security seriously in software they would train their developers on security or require that they be security certified.

> I believe that if we are to solve our security problems, then we must build software with security in mind right from the start.

Yes, but clearly most organizations don't take security seriously. Instead they bolt it on at the end just enough to appease the corporate attorneys the same way they do for accessibility or any other necessary requirement whose absence results in class action lawsuits.

> It turns out the defense against many of these vulnerabilities is the same, at a high level: validate any untrusted input before using it

It should be noted that type safety itself does not solve this problem. For external input, you need explicit schema validation or the type needs to be enforced at the protocol level using something like Protocol Buffers with implicit schema validation.

I think that security has nothing to do with the programming language and everything to do with the developers who are writing the code.

This excellent presentation was referenced:

Secure Design: A Better Bug Repellent Christoph Kern, IEEE SecDev '17


It's ironic that the post would say security is a PL issue, but not adress capability-safe PLs like E, Pony or Caja…

While programming languages could definitely be more secure. Programmers need to put security first. Currently it's like giving a five year old a gun and saying go play

> Software Security Is a Programming Languages Issue

Then, how could a programming language help me prevent high-level security bugs like Shellshock? Is it even possible/practical?

Shellshock was a bug in the language implementation so this is an interesting question. It'd need some background on how the bug came to be.

Security is the study of control. LangSec and PL have their role to play, but, as others have noted, cannot be expected to guarantee security.

This post isn't really about "LangSec", and I'd ask, in the years and years we've had "LangSec", what have its major contributions been? I find it hard to pin down, and not especially instructive, but I'm prepared to be schooled.

Mainly its contribution has been the linguistic study of bugs, and the recognition of the importance of well defined recognizers as opposed to shotgun parsers. That is the contribution to theory. In practice, people still write shotgun parsers, because the incentives to avoid them are not there. The market does not reward security. Also, if a programmer has an incentive to alloc() for their program to function but no incentive to check that all control paths free only once, you are left with memory bugs. I think that PL can help with that, but LangSec cannot.

I don't follow. Parsers for what? I'm never clear what they're talking about. Parsers for programming languages? You don't think modern languages are competently parsed? Parsers for file formats?

> Parsers for file formats?

Yes. As trivial example, consider a UDP packet format { u16 type,size; u8 data[size]; } fed the 4-byte packet {ECHO,0xFFFF}, which is incompetently parsed as {ECHO,"<65535 bytes of stack memory>"} because the 'shotgun parser' assumes its input is well-formed. Whereas a 'recognizer' (ie a non-buggy parser) would reject that packet as 65535 bytes too short.

Good programming language design can make it harder to write the buggy parser and easier to write the 'recognizer', especially if the language/standard library provides built-in parsing tools.

See, I get the idea, but to me it's just like saying "decent code is better than shitty code". Well, I mean, no shit.

I'm not being snarky. Compare "LangSec" to memory safety, which also kills this class of bug dead. Which approach is more powerful and forecloses on more bug classes? Which approach requires more developer effort? Introduces more jargon?

I know multiple very smart, capable people who work under the rubric of "LangSec". But I just don't get it. Is it a real thing?

Actually I was just answering the "parsers for what" question.

FWIW, I think LangSec is saying "Code that doesn't have remote code execution vulnerabilities[0], or limits them to a weak computational model[1], is better than code with RCE vulnerabilities." - which is also "Well, no shit." - and "Parsing a nontrivial data format is the same thing as executing a (not-necessarily-)very constrained programing language."[2] - which seems obvious to me, but could plausibly be a "superpositions don't collapse"-level epiphany for someone who doesn't think about parsing the right way.

0: such as javascript or stack execution

1: like FSMs or pushdown atomata

2: with the implication that you had better make sure it actually is very constrained

It's a real thing: http://langsec.org/papers/langsec-cwes-secdev2016.pdf

There are whole classes of errors related to programs that parse then validate input when it's already too late. And often the validation happens in the source code in a cloud of checks that happen at run time. It's rather difficult to verify these programs.

It is much easier to verify a parser that only produces valid values at the edge of a program, isolated from the main program.

Yes, parsers for incoming packets and file formats, for example. If they are poorly written, you end up with Heartbleed, for example. I should maybe add that the LangSec folks try to provide formal footing for what an adversary can achieve with a given set of primitives. They refer to it as programming a weird machine. The reason that is important is that their view is that access to computational capability amounts to privilege. If you have a packet coming in from the untrusted outside, you should not allow it to be parsed by a parser with unbounded computational complexity, rather you should opt for a computationally limited parser.

Could you be more specific about Heartbleed was a parsing problem?

I'm familiar with the LangSec lingo and the concept of a "weird machine", much as I hate the term itself, has value. But it's not a product of LangSec so much as a name for a concept we've had for decades.


In Heartbleed the parser parsed an incoming field specifying length, but failed to correlate it with the rest of the request. Had the parser been written to a stricter specification, that would not have happened. Typically that is what is meant by bounding computational complexity. Or at least that is my understanding of it.

I'm familiar with the bug, but not with the parser-theoretic response to the bug. Would LangSec somehow do away with the on-the-wire length encoding? Or would it simply say "your parser should check the length of incoming data"? Isn't that about as useful an insight as "validate user input"?

You raise a good point. The thing to realize is that the incoming packet data, like all data, is code. "Code written for which machine?" you might ask. Well, think of that data as code that is executed by the parser. If you think of the parser as the machine (yes, it is weird), then it becomes easier to see how the adversary tries to program it using data as code. Just like we advise people never to eval() untrusted data, the LangSec response is not just to validate user input, but to constrain the programmability of the machine. By lowering its ability to compute, you are effectively lowering the privilege given to the adversary. So, the LangSec answer would be to design the protocol so as to not require a parser that can become so easily confused by conflicting information. Hope that helps. FWIW, I think you raise a great point about the ease of use and practical utility of LangSec, and agree that its potential is thus capped.

As another example, you should never unpickle untrusted data. The machine for parsing pickles is too powerful.


I don't understand how that's a problem with a parser. A parser is a tool for turning a flat buffer into structured data. The heartbleed problem described there is that a person wrote code that read outside of the buffer it was meant to read. Why aren't they independent problems?

Also, how can you design an efficient data format that accepts data of a length determined by the user (hopefully cooperatively with the server), that is immune to buffer overflow reads.

Once you say "I want between four and one thousand bytes", haven't you just stuffed yourself?

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact