But writing reliable C++ code that works under adversarial conditions is very hard; hard enough that the best C programmers in the world have track records like "only one theoretically-exploitable memory corruption vulnerability in the release history", and quite a few people who set out with "no vulnerabilities" in their project charter end up with way worse records.
I've found over the last couple years that one way to get C/C++ code fit into a web application is via "nosql" databases, particularly Redis; let something like Ruby/Rack or Python/WSGI or Java Servlets soak up the hostile HTTP traffic, and use it to drive an async API over Redis.
The less C/C++/ObjC code you have talking to an adversary, the better off you are.
I'm a C/C++ programmer; C is my first language, from when I was a teenager. I respect C programmers. But if I was interviewing a developer and they told me their gut was to write a C program to do something a Rails program would normally do, and it wasn't an exceptionally specific circumstance, I would think less of them.
I, too, would look at someone strangely if they told me they were going to write a C application where I'd use a Rails one, but security certainly wouldn't be the first reason on my mind.
As a postscript, I really like the idea of putting C/C++ apps behind a message bus, as decoupled from the web end as possible. I've had great luck using C++ for performance-critical services behind a Rails frontend talking to Redis (I've also used AMQP via RabbitMQ, but I found that to have a high enterprise brokered pain to value ratio).
You should be concerned about the quality of your language runtime.
MRI, for instance, has had many memory corruption flaws that were plausibly exposed to hostile input. When security is a priority, I advise using JRuby (it helps that JRuby is better than MRI anyways).
But either way: language runtimes for mainstream languages are high-priority targets. Your C code is not. You will not learn the worst problems in your C code until long after it's been deployed.
And wouldn't the runtime itself be a substantially higher-value target for attackers?
That depends, but relying on security through obscurity isn't usually a very good choice.
relying on security through obscurity isn't usually a very good choice.
Camouflage paint works for tanks.
Sure, if you can afford to throw the same number of man-years (of both developers and white hackers) at your proprietary codebase as are thrown at the runtime of a popular language, then great, you can have the cake and eat it too, just like the tank builders.
Since most people can't afford that, they have to choose between camouflage paint and an armor. I don't know about you, but I'd rather be in the bullet proof tank than on the one built with balsa wood, regardless of its paint.
By definition, if we're talking about a tank, that's merely one layer of many. Obscurity can be a fine one layer of many. It had better not be the layer such that you are relying on, though.
This is true. But I think it is reasonable to expect a good C/C++ programmer who already understands web security to have the mental model to write secure code in (say) Ruby.
Yes - popular runtimes are some of the most heavily attacked pieces of code around. This has benefits as well as costs...
I presume you mean qmail 
I'm sorry forgive my ignorance can you explain me a bit more what you mean by "async api over Redis"? I'm always genuinely interested in understanding good patterns especially given your experience in security. Thanks!
This gets you a number of benefits: separation of the front-end logic and the back-end logic, better scalability - there may be a bunch of workers on distributed among different machines, and security - the C++ programs aren't as worried about unvalidated input since their input comes from the front-end.
1. A request comes in
2. Request handler parses the request
3. Handler determines which Queue the request should go into based off the URL
4. Request handler queues the request as well as how whoever handles it can get back to them
5. Waits for response
There are then multiple workers living possibly on other machines listening on the queue. They handle the request and return a response to the original request handler and pull the next bit of work off the queue.
I like this because I feel like it is rather robust. I use a STOMP message queue which is very trivial to hook up other languages to. It is fast enough for my needs. It lets me do simple things like specify how many queued items a handler can handle concurrently. My web app is then broken into components that each run independently. They can run in the same process or be split into separate processes or even across computers. My web app is not particularly high demand but we run it on fairly light resources so the queuing also keeps our app from becoming overwhelmed if a lot of requests happen at once. They just get queued and a handler will get to it when it can.
For example a Ruby interpreter or a Java runtime that you trust to handle all your HTTP requests are prominently written in C/C++.
I think what makes popular packages like Ruby/Java/Rails (etc.) more secure is the sheer number of users they have. Those technologies have been hammered out over several projects and by a plethora of users and developers. Writing a component that rivals that number of interactions is tough, but certainly doable.
C programs are susceptible to memory corruption. Programs written in practically every mainstream high level language are not susceptible to those problems (until they start using third-party C extensions). That's the security win of not using C code.
From that page: "Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array."
I guess my point is that the tools/libraries/frameworks on top of the language are what make it useful or not useful, independent of the language itself. For example, writing a web app in Ruby may not help you against SQL injection (http://en.wikipedia.org/wiki/SQL_injection) unless you have a well designed query language on top of that.
Everything is "certainly doable" in a turing-complete way, but that fact has not mattered at all in the evolution of programming languages.
It doesn't matter if it's "certainly doable", what matters is how easy it is.
I invite you to continue coming up with examples of ways to reliably mitigate memory corruption flaws in C/C++ programs, because I enjoy this topic very much, but as your attorney in this matter I have to advise you that you're going to lose the argument. :)
One better alternative to a shared_ptr is a noncopyable shared pointer type. You have to explicitly copy it with a call like
Really this should be a noncopyable version of intrusive_ptr, not shared_ptr. Either the object is owned by references on multiple threads and you'll want to be careful about what thread you destroy it from, and perhaps then you'd want to send a release message over some message queue system, or it's a single threaded object and you don't need the overhead of atomic memory operations.
You're certainly making a valid point; however, as far as how important this is, the experience of a lot of people out there points in the other direction.
Consider: the lifetime of a Python object is essentially the same as that of a C++ dynamic object owned by a shared_ptr. But you don't see Python programmers complaining that they can't figure out when their objects are going away. In Java it's even worse; an object's lifetime is pretty much whatever the JVM thinks it ought to be. I have seen complaints about this, but not many, and the JVM's reputation as a platform for serious languages remains pretty strong.
On the other hand, memory leaks in C (and C++) programs have been a major thorn in all our sides for decades.
So, yes, when you get assured destruction by using an object whose lifetime is managed for you, you do lose something. But the experience of programmers all over strongly suggests to me that, for most applications, what you get, is much more valuable than what you lose.
Actually, here is your choice: either you'll have to manage every kind of resource except memory (garbage collected languages), or you'll have to manage only memory (C++).
That's because I'm talking about C++ and you've somehow decided to talk about something unrelated. Python and Java programmers still care about all resources that aren't object lifetimes.
Both C++ and ObjC have a string class and containers in the standard library, and support for some automatic memory management in the language. This turns out to make a big difference in practice in reducing those vulnerabilities. There are people in this thread claiming that they do as good a job in reducing those vulnerabilities as using Java or Ruby or Python. I can't really evaluate that claim, but it seems plausible to me. Barely.
* std::string (or NSMutableString) eliminates the stdlibc strxxx() vulnerabilities --- iff you use them exclusively. But lots of C++ code (and, especially, ObjC code) drops to char-star strings routinely.
* Most C++ code still uses u_char-star for binary blobs. ObjC has (to its credit) NSMutableData, but there's still u_char-star handling code there too (I also feel like --- but can't back up with evidence --- ObjC code is more likely to call out to C libraries like zlib).
* Both C++ and ObjC have error-prone "automatic" memory management: shared_ptr and retain/release, respectively. shared_ptr is risky because every place it comes into contact with uncounted pointers has to be accounted for; retain/release because it's "manu-matic" and easy to make mistakes. In both cases, you can end up in situations where memory is released and pointers held to it, which is a situation morally equivalent to heap overflows.
No, I don't think C++ and ObjC do an equivalent job in reducing memory corruption flaws. The MRI Ruby interpreter has had memory corruption issues (it being a big C program itself), but Ruby programs never have memory corruption issues (except in the native C code they call into). C++ and ObjC programs routinely do.
In any case, my pointier response to you is, "allocation lifecycle and use-after-free", which doesn't care how many layers of abstraction you wrap your pointers in.
"Irredeemably"? No, just very, very expensively. I suppose I should thank them.
* Language hipsterism
* Being disturbed by modular code
* Dismissing high-level code that might have leaky abstractions
* Plain CGI
* Being turned off by callbacks and reinventing the wheel instead
* The usual silliness about all tools being capable and therefore equal (they're all Turing complete, yes, but we still want to know which ones are more productive for some use case.)
This is grumpy posturing and C++ is now blub. Am I being trolled?
Disclaimer: I worked with the OP and she is a ferociously productive engineer.
Dismissing newer technologies as "shiny" instead of evaluating their merits*
Or, evaluating technologies based on how useful hey are rather than the amount of hype they are generating
Huh? I don't even know what this means.
Being disturbed by modular code*
The OP was disturbed at the thought of building a program by gluing together "modules". In this sense she is using "module" to mean (essentially) black-boxes of unknown origin and quality. If you had ever seen her code you would have a hard time convincing anyone that it wasn't modular.
The OP says she hates the design of cgic, not that she's against callbacks. She made a practical decision that cgic didn't do enough for her to warrant the pain of using it, and discovered that creating something that she actually liked using wasn't that much work.
To put it more generally, she is saying that - given the choice between using some existing code that isn't quite right (or outright sucks for some reason or another) and taking some time to roll her own, she has found that it is often worthwhile to spend a little time to create something that she knows works and that she enjoys using.
The OP said no such thing. She said that she has found a language she is highly productive in (C++) and she hasn't yet seen a newer "shinier" language that would make enough of a difference to warrant switching to it.
Right, but then she goes on to mention libxml2, which for someone unfamiliar, could be a "black box of unknown origin and quality". In this case, of course, I wouldn't dare criticize libxml2: it's an impressive piece of software with a great track record and excellent test coverage. But I just happen to know that. To someone who doesn't, they'd have to learn. Just as, for someone who doesn't know anything about some Ruby XML library, they'd have to learn.
So basically her argument boils down to "I'm fine using modules written in my language of choice that I already know about, but I don't feel like learning about new modules written for a different framework".
Regardless, C/C++ is just not a safe language when writing code that has to survive adversarial conditions. Maybe the OP always writes perfect, vulnerability-free C code (though I doubt it), but most people do not. Advocating a language that doesn't have some kind of memory protection for web apps just strikes me as irresponsible.
A buffer overflow or use-after-free() in C can let an attacker run arbitrary code on your server. A buffer overflow in Java throws an exception and terminates the program (or is handled gracefully). Not saying that there aren't other classes of vulnerabilities, but using a language/runtime such as Java or Ruby eliminates some of the trickiest sources of security bugs.
It does sound like she's ferociously productive. It's also possible that she could be ten times as productive and that C++ is holding her back. The programmer credo is to work hard and work smart.
> I have no practical experience with them
Casual dismissal without evaluation of merits.
> it's all been linked to frothy people clamoring for the latest shiny thing. I've seen that pattern plenty of times before, and it's always annoying.
This is what I mean by hipsterism: Dismissal based on an excited fanbase.
> that somehow you would need to abdicate almost all of your code to modules where you just hope they work.
I had a problem with this sweeping generalization. If cgic isn't great, that's fine. Certainly you still have to evaluate your dependencies' merits.
> that creates an awful design where you give up main() to that program's code and then just wake up when it decides to call back to you
This did read like an indictment of callbacks as a whole, but I see now that she's not happy about libraries that hijack main() and have some secondary entry point. I concede that.
> It might not be wonderful, but you could build something relatively quickly, at least in theory.
High-level languages have proven productivity gains and known pitfalls. Here, she is hand-waving them away without giving them a chance.
> Better still, since I"m not chasing the shininess in that realm, I can spend those cycles on other things.
> she hasn't yet seen a newer "shinier" language that would make enough of a difference to warrant switching to it.
As far as I can tell she didn't even give them a fair chance. She dismissed them based on supposed hype and blubbiness.
I would think it's more about trying not to dismiss older technologies just because of the merits of something newer.
Being disturbed by modular code
The C++ code described in the article sounded pretty modular to me...
As I see it, it's the old low-level vs high-level discussion all over again. I'm sure you have seen it before.
Low-level programmers argue that they have control and know how everything is working. High-level programmers argue security, problem isolation, modularity (i.e. not only using your own code!), readability, maintainability.
I used to work for a company that used C (not C++) for its large web site. They ran the entire site from a monolithic C file. To say again, the code for the entire web site (including product listings, shopping cart, coupons, etc.) was in one .c file. To make things more "interesting," it was, at the time controlled by a single developer who didn't work on site and wouldn't let anyone else touch his code. This barrier was backed up, as I understood, by both tradition and management.
The initial decision for C was, I think, the correct one. The site had been around from the early days of the web, and speed was important. However, the architecting and personnel decisions didn't keep up with the company's growth. Another consequence, though, was switching to CSS (which came along later) from, for example, spacer gifs, took a great deal of developer time and testing, as did adding new features. It's the trade-off I think we all understand well today: fast code or fast developers.
It was interesting to see the changes while I was there—the site began (slowly) to get recoded in Java, and broken into more manageable chunks at the same time. So far as I know, the whole thing is now in Java.
Also, if I had to guess, I suspect another factor in the transition was that C programmers were getting more expensive and harder to find.
It's just bloody inefficient to get things done in! Coding in it requires girnormous amounts of text shovelling, typing attempts to be strict, template meta-programming requires hand-coding type inference schemes, and on top of that, the library/package management is a nightmare.
Well, I didn't think these harsh things when I 'left' C++, but as I worked with more high-level languages, I realized that I was doing more, with less code and dynamic duck-y typing, and with nice libraries only a `cpan` away.
So when I hacked together a C++ program a year or so ago, I got punched in the face by all these issues. It was a pain. So I said, okay, this is stupid. I need to use C for low-level work like drivers, and use Common Lisp for other things. Like what everyone else does.
Fundamentally, C++ has a number of flaws, of course - that's typical for a pragmatic language - but the key flaw in my opinion is that it's a pain to get higher-level stuff done in until you build the libraries that other language constructs/libraries give you out of the box.
 For a very small subset of everyone else.
In a single request, the RoR part 'does' about a two dozen 'things', which takes it around 2 seconds to complete.
The C++ part 'does' around 1.2 million 'things' which takes it between 0.1 to 0.2 seconds to complete. Building the the component that does 1.2 million 'things' is simply impractical in Ruby.
Admittedly, very few web applications have this type of requirement, but perhaps in the future many more will. The two dozen or so 'things' the RoR does would take a LOT of code in C++. Well, compared to RoR it would be a lot of code. And the 1.2 million 'things' that the C++ application does would be absurdly difficult to program in Ruby, and as mentioned, would take an absurdly long time to execute.
So at the risk of sounding cliché, perhaps it is less a matter of Ruby vs C++, and more one of using the right tool for the job. Once that decision has been made, programmer productivity has more to do with the programmer than the language. I am extremely productive in C++, even though I have to write a lot of code to be productive. If I was a LISP expert, I would be ridiculously productive, but only after 10 years of learning and gaining experience in LISP. Most projects don't have 10 years for you to become productive in the language of choice.
 Yes, I am exaggerating. It would only take 9 years to become highly productive in LISP.
That being said, I never felt particularly inclined to write the whole things in C++.
Agreed, however one of the problems we are encountering is passing messages efficiently between RoR and the C++ components. Currently using XML until another method that is more efficient, but no less flexible, presents itself.
Personally I think the debate should be less focused on which language to write your web application in, and more on improving the glue between different languages so in developing web applications we can have the best of all possible worlds.
And this distrust of other's libraries is odd. You're already running on an OS that's providing millions of lines of code to you in the form of api's and services, and using a compiler that's going to do all sorts of modifications to what you've created at the machine code level. You're already well into trusting a lot code that's not yours. But something that's a million times easier (parsing HTTP headers), and now they're worried about other peoples code? Seriously?
The one thing is, if someone does write c web apps - having your own set of libraries can be useful exactly then. The web has all these crazy encodings and protocols and knowing exactly which way your library does them is really useful when you use a language with no safety net or safety switch or anything.
Spent the afternoon figuring out, sort-of, the url encoding of reddit's "api" and how it interact with Qt's QURL class.
You do know that the compiler isn't allowed to change the behaviour of your code, right?
(If it did, it would be a bug in the compiler, and C compilers are pretty damn mature.)
The medium selects those who are both loudest and have an adequate factual basis. Peppering statements with "In my limited experience," makes you seem meek, unfortunately. Blame the culture of needing traffic and thriving on controversy. It feels like it is a race to the bottom. :(
But students tend to interpret that as weak and muddled, and prefer profs who "tell it like it is", even if that means in a fairly biased and opinionated way. It seems they particularly like it when you make black-and-white statements that would be controversial, roughly equivalent to "there's a debate on this but you don't need to know about the other viewpoint because it's wrong".
Having said all that, it does seem hard to find intellectual honesty on the internet, but I had more trouble finding a decent newspaper in London so it's not an isolated problem.
It is the same reason why there was conflict in the Robber's Cave experiment between the two groups. As a species we group together with people we mentally associate with and have a natural prejudice against any external force.
I've just started looking into writing Nginx modules, which are generally written in C. Now that I've started to understand it, I can write secure content handlers in a reasonable time and reasonable effort, whilst being able to use the exhaustive libraries on offer, as well as my experience with the language. Couple this with great performance, and I can't justify writing apps in PHP anymore. We're moving more and more processing to the client , as we have realised that networks are pretty slow. Well then, how do you justify writing slow code behind the slow network?
I highly encourage the author to look at writing Nginx modules, possibly not on a hackathon, since you get security and fantastic performance for free!
Then there's the library/tools situation. I did all the work for my honours dissertation in Ruby in part because the Ruby ecosystem is vast and vibrant (sometimes inconveniently so).
I imagine that the C++-for-the-web ecosystem is going to be a bit more spartan.
Since in C++ you can use STL classes for buffers where they are automatically resized as appropriate, streams for IO, etc. So you'd effectively have the same resizing and safe objects to work with as in a higher level language.
As far as I see it both C++ and Ruby have a similar amount of libraries that can be used, they both have a major web framework, Rails for Ruby and OKWS for C++, and there's probably a plethora of smaller frameworks and helpers available for each language.
Rails may seem like it would be winning (versus C++) on the ecosystem, but that's because there's so many hip gems for Rails out there, but if you take a look at all libraries it probably won't be so.
Proper use of carefully-written classes will reduce the risk of a buffer overflow -- but you will need to exercise additional vigilance ("Make sure you use STL! Why isn't this code using STL?" etc), over and above addressing the problem, to ensure string safeness is assured.
That's a dead loss that programmers in memory-safe languages simply don't have to pay for the same level of confidence.
Essentially a string is std::string, a resiable buffer is std::vector<unsigned char>, and for a string or a resiable (or unknown size) buffer to be anything else you need to provide a very good reason.
I would also be careful with Ruby since a lot of (performance specific) gems and plugins for Rails are written in C, and hence potentially suffer the same problems that one is trying to avoid by using Ruby. Or at least you end up in the same boat if you were using C or C++ to begin with.
For extremely expensive values of "free."
Claiming that some languages solve these problems for you, for free, is disingenuous.
I pointed out that languages with safe string handling and memory management have a higher security baseline. As in, what you start with, for "free".
Achieving comparable string and memory security in C/C++ is an expense. You must take active steps to prevent those exploits; and even then human error means that you will have a higher risk than the "free baseline" languages.
Avoiding a technology by applying a frivolous standard like this is a good way to miss out on useful learning experiences.
"Doing" is great, but stretching your boundaries and trying new things outside of your comfort zone is an essential part of being a well rounded programmer.
I've used plenty of things used by "foamers" out of necessity. I don't enjoy it, but it does get the job done.
An unpleasant subset of a technology's userbase has no relationship to the usefulness or learning experiences relating to the technology and is of no relevance to the well-rounded, life-long learner.
Being distracted by the who in technical pursuits is a burdensome impediment against learning the valuable whats and whys.
Does that help?
This should have been the beginning, middle, and end.
I agree that the reliance of rails apps on gems is unfortunate. They're rediscovering DLL hell for themselves. But the original DLL hell was all C++ apps. You can shoot yourself in the foot with over-reliance on libraries in any language.
Firstly, std::unordered_map is officially part of C++ now, and it's been available in all major compilers for a long time. Secondly, there's no substantial evidence provided to support the claim that the module system in a dynamic language is "more flexible" - anytime you want to include someone else's code in your source tree in any language you can just drop the source in as though it was your own. Whether there's a "package manager" to hide the references for you is beside the point; it's not as if the concept of shared libraries is lost upon C and C++.
And "ahem" does not establish a valid argument as to why dynamic type checking is "easier" (whatever that even means). In fact, I believe a strong argument could be made that dynamic type checking is the worst aspect of dynamic languages, and the interest that Haskell has been brewing up lately would tend to support this. Static type checking brings bugs to the forefront when the developer is in the room instead of the user. One could argue that runtime reflection provides more flexibility and thus makes dynamic languages "easier", but it's a stretch to extend that to dynamic typing.
I'm unsure of how DLL Hell would even apply to a SaaS application, unless we're referring to completely different things; you have complete control of your application's environment in a web app. If there's a package already installed that you don't want or something is missing, you remove it or install it. With the prevalence of virtualization and virtualization-as-a-service, there's no reason to be trying to run two different applications that require different dependencies on the same virtual box.
I've worked in C++ for ten years, but I think scripts in perl or python are more convenient for many tasks. I haven't met anybody who disagreed about something so basic, so I'm not sure how to respond. I use unordered_map all the time, but I wouldn't claim it's the same as support for literal hash-tables in the language. Implicit typing is useful if I am happy to allow my tasks to die at runtime because I screwed up. After all, C++ apps often segfault when I screw up.
In particular, it seems worthwhile, if for example one is well versed in say Ruby, to learn C to get at the root of what is occurring "under the hood". I really enjoy Ruby, but I'm learning C to fully immerse myself in what is occurring with the bits and bytes (or at least the bytes).
> "C programs are suspectible to memory corruption." (tptacek)
Yes, they are, because in C you can do memory corruption, in many other languages you can't (even if you'd want to).
But where do these corruptions most likely occur, when speaking in context of web applications? Yes, in I/O and string operations. And all these can be mitigated with somehow "safe" classes - by these I mean not a home-brew string-class, but something like STL (which has proven stability).
However, is memory corruption the only security risk? In my opinion, an average C/C++ programmer creates more secure code than an average PHP programmer - just because a C programmer is used to the intrinsic security issues, while the PHP coder won't produce an buffer overflow by not validating input, but leaves eg. XSS or SQL injection holes.
Writing a web application in C without preparing for safe I/O & string operations is as bad as writing dirty script code in PHP/Perl/Ruby/...
At my company we've written a really big web application (a hosting control panel) completely in C/C++, but for other reasons than execution speed: the runtime dependencies of a sellable web application are pure horror. Neverending CPAN-depencencies in Perl, incompatible function changes in PHP, and so on. With a monolithic app (web server & application logic all-in-one) you just need a libc - that's all. Easy to roll out, and thus easy to sell. :)
Having written a C(++) web-app (using CGI), I'm ok with writing in C++ (it's not that bad), but the advantages of using a language more suited for web development are not negligible. The author would probably find this out the hard way (if/when (s)he gets to making stuff).
I don't say that because I think it makes me special -- it's what we ALL used to do because that's the only thing we had.
After you've done that for a few years, you become VERY efficient at banging out code.
A lot of us did what this article mentioned - but probably in the 1990s when desktop programmers were moving to the web. Most probably settled on Perl after trying C and finding development was much faster with Perl.
> that somehow you would need to abdicate almost all of your code to modules where you just hope they work
Well, you are using jQuery, jQuery UI, jQuery Player and jQuery Scroll-into-view in your example project, that is actually a big pile of modules if you ask me.
www.google.com remains C++ to the best of my knowledge, see http://web.archive.org/web/20110708015633/http://panela.blog...