Hacker News new | comments | show | ask | jobs | submit login

Writing C++ code that works reliably in a benign setting is not a big deal. With the right libraries, it is almost as easy and perhaps only a little slower than writing in a high-level language; you can't, for instance, really believe that everyone who writes a popular iOS application is a solid bare-metal C programmer.

But writing reliable C++ code that works under adversarial conditions is very hard; hard enough that the best C programmers in the world have track records like "only one theoretically-exploitable memory corruption vulnerability in the release history", and quite a few people who set out with "no vulnerabilities" in their project charter end up with way worse records.

I've found over the last couple years that one way to get C/C++ code fit into a web application is via "nosql" databases, particularly Redis; let something like Ruby/Rack or Python/WSGI or Java Servlets soak up the hostile HTTP traffic, and use it to drive an async API over Redis.

The less C/C++/ObjC code you have talking to an adversary, the better off you are.

I'm a C/C++ programmer; C is my first language, from when I was a teenager. I respect C programmers. But if I was interviewing a developer and they told me their gut was to write a C program to do something a Rails program would normally do, and it wasn't an exceptionally specific circumstance, I would think less of them.




In this worldview, wouldn't the C programmers writing your language runtime have the same poor track record when it comes to security? And wouldn't the runtime itself be a substantially higher-value target for attackers?

I, too, would look at someone strangely if they told me they were going to write a C application where I'd use a Rails one, but security certainly wouldn't be the first reason on my mind.

As a postscript, I really like the idea of putting C/C++ apps behind a message bus, as decoupled from the web end as possible. I've had great luck using C++ for performance-critical services behind a Rails frontend talking to Redis (I've also used AMQP via RabbitMQ, but I found that to have a high enterprise brokered pain to value ratio).


They do have a poor track record with the language runtime.

You should be concerned about the quality of your language runtime.

MRI, for instance, has had many memory corruption flaws that were plausibly exposed to hostile input. When security is a priority, I advise using JRuby (it helps that JRuby is better than MRI anyways).

But either way: language runtimes for mainstream languages are high-priority targets. Your C code is not. You will not learn the worst problems in your C code until long after it's been deployed.


Linus' Law. The language runtime is shared between thousands or millions of users and has many more contributors than your single project, hence any big security bugs it might have had are probably fixed by now, or at least will way faster than you could fix yours.

And wouldn't the runtime itself be a substantially higher-value target for attackers?

That depends, but relying on security through obscurity isn't usually a very good choice.


Parrot alert!

relying on security through obscurity isn't usually a very good choice.

Real-world alert!

Camouflage paint works for tanks.


In the "real-world", camouflage paint isn't used instead of heavy armor, which is what is being proposed (using a much less tested piece of code instead of a well known runtime).

Sure, if you can afford to throw the same number of man-years (of both developers and white hackers) at your proprietary codebase as are thrown at the runtime of a popular language, then great, you can have the cake and eat it too, just like the tank builders.

Since most people can't afford that, they have to choose between camouflage paint and an armor. I don't know about you, but I'd rather be in the bullet proof tank than on the one built with balsa wood, regardless of its paint.


Relying.

By definition, if we're talking about a tank, that's merely one layer of many. Obscurity can be a fine one layer of many. It had better not be the layer such that you are relying on, though.


So do very thick firewalls.


In this worldview, wouldn't the C programmers writing your language runtime have the same poor track record when it comes to security?

This is true. But I think it is reasonable to expect a good C/C++ programmer who already understands web security to have the mental model to write secure code in (say) Ruby.

And wouldn't the runtime itself be a substantially higher-value target for attackers?

Yes - popular runtimes are some of the most heavily attacked pieces of code around. This has benefits as well as costs...


> "only one theoretically-exploitable memory corruption vulnerability in the release history",

I presume you mean qmail [1]

[1] http://cr.yp.to/qmail/guarantee.html


Yes; qmail had a (disputed) LP64 integer overflow.


> I've found over the last couple years that one way to get C/C++ code fit into a web application is via "nosql" databases, particularly Redis; let something like Ruby/Rack or Python/WSGI or Java Servlets soak up the hostile HTTP traffic, and use it to drive an async API over Redis.

I'm sorry forgive my ignorance can you explain me a bit more what you mean by "async api over Redis"? I'm always genuinely interested in understanding good patterns especially given your experience in security. Thanks!


Not the GP, but the general principal is to use Redis as a task queueing system. The front-end puts a task into a redis queue. One or more C++ programs are waiting on the queue(s) and execute the given task (like a large DB insert). If results are needed they can be communicated back to the front-end. The front-end can poll for the result or use pub/sub messaging.

This gets you a number of benefits: separation of the front-end logic and the back-end logic, better scalability - there may be a bunch of workers on distributed among different machines, and security - the C++ programs aren't as worried about unvalidated input since their input comes from the front-end.


The Redis interface is also so simple that it's very easy to hook up C code to it, and Redis is somewhat "typed", which reduces the amount of parsing you have to do.


Can your recommend any particularly good resources/tutorials or further information on this? Thanks!


The web app I have been working on has the same architecture.

1. A request comes in 2. Request handler parses the request 3. Handler determines which Queue the request should go into based off the URL 4. Request handler queues the request as well as how whoever handles it can get back to them 5. Waits for response

There are then multiple workers living possibly on other machines listening on the queue. They handle the request and return a response to the original request handler and pull the next bit of work off the queue.

I like this because I feel like it is rather robust. I use a STOMP message queue which is very trivial to hook up other languages to. It is fast enough for my needs. It lets me do simple things like specify how many queued items a handler can handle concurrently. My web app is then broken into components that each run independently. They can run in the same process or be split into separate processes or even across computers. My web app is not particularly high demand but we run it on fairly light resources so the queuing also keeps our app from becoming overwhelmed if a lot of requests happen at once. They just get queued and a handler will get to it when it can.


I'm not sure I understand why one particular language would lend itself to more vulnerability than another. The less skilled someone is at a particular language, the more bugs/vulnerabilities he is likely to produce. It is a function of technical skill rather than a quality of the language.

For example a Ruby interpreter or a Java runtime that you trust to handle all your HTTP requests are prominently written in C/C++.

I think what makes popular packages like Ruby/Java/Rails (etc.) more secure is the sheer number of users they have. Those technologies have been hammered out over several projects and by a plethora of users and developers. Writing a component that rivals that number of interactions is tough, but certainly doable.


There's one (mainstream) C Ruby that needs to be audited. But every C CGI program needs to be audited.

C programs are susceptible to memory corruption. Programs written in practically every mainstream high level language are not susceptible to those problems (until they start using third-party C extensions). That's the security win of not using C code.


"I'm not sure I understand why one particular language would lend itself to more vulnerability than another."

http://en.wikipedia.org/wiki/Buffer_overflow

From that page: "Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array."


I would call that a programmer error. The language certainly does make it harder to write "safe" code, but it is certainly doable.

I guess my point is that the tools/libraries/frameworks on top of the language are what make it useful or not useful, independent of the language itself. For example, writing a web app in Ruby may not help you against SQL injection (http://en.wikipedia.org/wiki/SQL_injection) unless you have a well designed query language on top of that.


Everyone calls it programmer error. But when you make the same error of copying arbitrary-sized inputs from attackers into a Java program, you do not enable that attacker to upload their own code into the JVM process and run it.


But doesn't the use of Java's JNI invalidate any security the JVM offers? As far as I know, any protections the JVM puts up are invalidated once you inject native code, which would potentially enable an attacker to potentially inject malicious code that hijacks the JVM. Then again, one could argue that the JNI is no longer a "Java" program.


Yes, when you write C code and attach it to JVM processes, that puts the JVM process at risk. More C code, more problems.


"""I would call that a programmer error. The language certainly does make it harder to write "safe" code, but it is certainly doable."""

Everything is "certainly doable" in a turing-complete way, but that fact has not mattered at all in the evolution of programming languages.

It doesn't matter if it's "certainly doable", what matters is how easy it is.


Things like memory protection by default do make some languages safer than others.


Yes, however, there are libraries in place that can circumvent some of these issues: e.g., boost::shared_ptr in C++.


No, Boost shared_ptr exacerbates the issue by creating a second regime of reference counting that, if contravened anywhere in the program (for instance, in any third-party piece of library code, which every C++ program of any real size is replete with) creates use-after-free conditions.

I invite you to continue coming up with examples of ways to reliably mitigate memory corruption flaws in C/C++ programs, because I enjoy this topic very much, but as your attorney in this matter I have to advise you that you're going to lose the argument. :)


After doing some research on the topic, I will have to say that I concede my point. ;-)


I agree with you on the gist of your argument (I think), but there are 'ways to reliably mitigate memory corruption flaws in C/C++ programs'. For example, using std::string rather than malloc()'ing a char* every time you do something that works with strings is certainly a way to reliably mitigate memory corruption flaws in C++.


True as far as it goes; std::string is safer than libc strings. If all your program does is manipulate strings, and not marshaled binary data structures or protocols, and your data structures are simple and you're very careful with your iterators (which themselves often decompose to pointers) and your object lifecycles are simple enough that you can reliably free things and know you're not going to accidentally touch the memory later, and-and-and, you can write a safe C++ program.


Every boost::shared_ptr I see is a cringe-inducing experience. It's not just the atomic memory operation that happens whenever you copy it, it's the programmer who thought that he could just put things in a boost::shared_ptr and it would solve his problems. Now the code is less readable, because you don't know what the lifetime of your resources are! The worst thing is when they get shared across threads, and suddenly you don't know what thread your object's going to be destructed on.

One better alternative to a shared_ptr is a noncopyable shared pointer type. You have to explicitly copy it with a call like

    x.copy_from(y);
That this makes the use of reference counted objects more verbose and uncomfortable is not a downside.

Really this should be a noncopyable version of intrusive_ptr, not shared_ptr. Either the object is owned by references on multiple threads and you'll want to be careful about what thread you destroy it from, and perhaps then you'd want to send a release message over some message queue system, or it's a single threaded object and you don't need the overhead of atomic memory operations.


> Now the code is less readable, because you don't know what the lifetime of your resources are!

You're certainly making a valid point; however, as far as how important this is, the experience of a lot of people out there points in the other direction.

Consider: the lifetime of a Python object is essentially the same as that of a C++ dynamic object owned by a shared_ptr. But you don't see Python programmers complaining that they can't figure out when their objects are going away. In Java it's even worse; an object's lifetime is pretty much whatever the JVM thinks it ought to be. I have seen complaints about this, but not many, and the JVM's reputation as a platform for serious languages remains pretty strong.

On the other hand, memory leaks in C (and C++) programs have been a major thorn in all our sides for decades.

So, yes, when you get assured destruction by using an object whose lifetime is managed for you, you do lose something. But the experience of programmers all over strongly suggests to me that, for most applications, what you get, is much more valuable than what you lose.


At the risk of sounding like flamebait, it's because Python and Java developers don't know what they are missing without deterministic destructing. Of course there are way to 'code around' it, but knowing the exact lifetime of objects is very often very useful, and often makes for much easier to understand code and easy ways to avoid resource leaks.


try..finally blocks have 80% (or more?) of the advantages of deterministic destructing without the costs, especially for avoiding resource leaks.


What about objects who are referenced by more than one object, and which are linked to resources? try... finally blocks are just a different way of freeing your resources at the end of a block, and won't help with objects which outlive the block they are guarding.

Actually, here is your choice: either you'll have to manage every kind of resource except memory (garbage collected languages), or you'll have to manage only memory (C++).


Yes, that's true, but at the cost of syntactic noise. It's a preference, and one gets used to it I guess, but to me all the try blocks are harder to read than code where variables have a fixed lifetime, so where you in many cases can avoid the extra indent level.


> You're certainly making a valid point; however, as far as how important this is, the experience of a lot of people out there points in the other direction.

That's because I'm talking about C++ and you've somehow decided to talk about something unrelated. Python and Java programmers still care about all resources that aren't object lifetimes.


After doing some thorough research on the topic, I would like to say that I concede my point about the vulnerabilities of C/C++. Thanks tptacek.


Happy to help! :)


While I mostly agree with your primary point --- that C and C++ are extremely prone to memory corruption vulnerabilities --- I think there's an important distinction you're glossing over here, between C and C++/ObjC.

Both C++ and ObjC have a string class and containers in the standard library, and support for some automatic memory management in the language. This turns out to make a big difference in practice in reducing those vulnerabilities. There are people in this thread claiming that they do as good a job in reducing those vulnerabilities as using Java or Ruby or Python. I can't really evaluate that claim, but it seems plausible to me. Barely.


Some things to remember:

* std::string (or NSMutableString) eliminates the stdlibc strxxx() vulnerabilities --- iff you use them exclusively. But lots of C++ code (and, especially, ObjC code) drops to char-star strings routinely.

* Most C++ code still uses u_char-star for binary blobs. ObjC has (to its credit) NSMutableData, but there's still u_char-star handling code there too (I also feel like --- but can't back up with evidence --- ObjC code is more likely to call out to C libraries like zlib).

* Both C++ and ObjC have error-prone "automatic" memory management: shared_ptr and retain/release, respectively. shared_ptr is risky because every place it comes into contact with uncounted pointers has to be accounted for; retain/release because it's "manu-matic" and easy to make mistakes. In both cases, you can end up in situations where memory is released and pointers held to it, which is a situation morally equivalent to heap overflows.

No, I don't think C++ and ObjC do an equivalent job in reducing memory corruption flaws. The MRI Ruby interpreter has had memory corruption issues (it being a big C program itself), but Ruby programs never have memory corruption issues (except in the native C code they call into). C++ and ObjC programs routinely do.


"Why do C++ folks make things so complicated?": http://www.johndcook.com/blog/2011/06/14/why-do-c-folks-make...


Kind of orthogonal. You'd face the same risks writing in C as in C++.


Well, after starting using memory-management classes exclusively, and never again raw pointers, I have coded lots of C++ and I have never experienced any memory issues (leaks or access violations). I think C++ has issues much much worse than memory management.


Wow is it ever not my experience that C++ programs that avoid raw pointers don't have memory corruption issues. Note that my job involves looking at other people's C++ code, not just having opinions about my own, so I'm a bit more suspicious than most people.


Therefore your experience is that compilers and/or mainstream libraries are irredeemably broken, right? Because I can't think of any reason why your code should break havoc as long as you are following your language's and your libraries' guidelines and safe-code practices . And yes, I agree that's a PITA: I've ditched C++ for that reason. However, to me, as long as my profiler didn't show any memory leaks, and there were no crashes, then I assumed everything was fine. Maybe I was blessed in having discovered and mastered Design by Contract. And AFAIK, Python and Ruby interpreters are written in C... what makes them safer than average applications, then?


"Best practices" is a no-true-Scotsman argument. Any example I come up with of something that blows up a C++ program is going to contravene some best practice somewhere. A long time ago someone said buffer overflows were easy to avoid, and so I did a little math, counting up the revenues of companies that sold products that mostly mitigated the impact of buffer overflows. Billions and billions of dollars, lost to "inability to count numbers of bytes".

In any case, my pointier response to you is, "allocation lifecycle and use-after-free", which doesn't care how many layers of abstraction you wrap your pointers in.

"Irredeemably"? No, just very, very expensively. I suppose I should thank them.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: