Now I know that there are claims that C# and Java are performance competitive with C++. You see some of that in the benchmark game at http://shootout.alioth.debian.org for simple problems a wide variety of results. One interesting one, the n-body problem, which involves a lot of hard-core computation, is the winner C++ or Java or .net? Well, http://shootout.alioth.debian.org/u32/performance.php?test=n... says that the winner is . . . Fortran!! (Not my favorite language either). g++ comes out at a factor of 1.3, the best Java at 1.5, and the best C# comes out at 2.1. That is, it takes the best C# program 2.1 times as long to compute this problem than the fastest program.
The stackexchange post said But in this era, the performance of a program written in a language based on frameworks such as C# and Java can be pretty close to that of C++ A factor of 1.6 is not pretty close, in my book. If you are in high speed markets and all other things being equal, placing your order at 1.6 ms when the other guy places it at 1.0 ms means what? It means you are further down in the order book. You lose.
So we all know that these toy benchmarks don't really represent what happens in a large, useful program. It would be interesting to build a more extensive benchmark set, don't you think?
Having worked on a very high-performance stock options feed, I can share some of my experience. The development goes something like this. You start out in C++, cause you get objects, and other useful stuff. You see that during peaks of the day, say around 0900 cst that you are beginning to fall behind. So you begin to tune.
See, lots of people decry the C++/C combination saying that they are different languages. Well, sort of. If you are on a theoretical or dogmatic bent, sure. But if you have a C++ program that is taking too long, you can relatively easily, bit-by-bit, turn it into a C program. So I am fine with the C/C++ designation in practice.
You tune this thing, making sure that you allocate your objects, say, at 0829 in the day and tend to leave them there until 1500. Then you up the -O count, hoping you are not pulling a Heisenberg. If you still need a little headroom, you learn that you can turn off exceptions in g++. Yes, even thought you don't ever throw an exception, having that not disabled costs you CPU. (What were they thinking?). So while the compile flags and the source program extension says C++, what you are executing more closely resembles C.
But there are those who fervently say that in such an environment that Java is competitive with C/C++. If you look at the cost to build the program, I am likely to agree that that part of the effort is faster with Java. But can you tune Java as much as C++? Or .net? I am suspecting not.
I think we should have a more broad-based real-life example. I am thinking that a simulated financial exchange repeatedly implemented in competing languages might be a more interesting example. In fact, I think I will go off and give that a try. Say maybe Lisp (note that it whips Java server in some examples, take that!), Java, C++ and certainly not Fortran (due to personal prejudices).
I'll let you know how that turns out.
But, is the end all be all the execution speed? Does flexibility and robust recovery not matter?
Everything I'm hearing is that, it's better to gut the car down to the frame with no safety measures then even a little bit of coverage. If that program crashes or cross wires some data, how much damage can it do to your financial position?
From what I understand, billions of dollars are on the line everyday, but everyone is racing towards the bottom of that nanosecond mark, generally at the expense of risking instability etc.
Things that would never fly in other high-risk environments seem to fly in the stock system, and I'm going out on a limb, but it seems that's because if there was there is an adult watching the kids play in the stock market sand box and a fuse will blow if an HFT feedback loop triggers some real idiocy. (http://www.zerohedge.com/article/hft-fat-digital-finger-brea... )
Now, there is still a market for advanced quantitative modeling in which reliability is important, and the ease of maintenance, troubleshooting and extensibility make up for microseconds or nanoseconds. A lot of statistical arbitrage can be categorized in this way, although some of the ideas in stat arb are making their way to HFT.
Basically, different applications have different performance requirements. And even though performance is now measured in low microseconds, the battle for nanosecond performance is underway, and extremely lucrative.
Now as to the argument of whether any of this actually provides any value to society, I think the answer is a definitive "no" (wrt HFT). But that's just my opinion - I used to be in the industry but left because of that. The majority of arguments used to justify HFT are really just self-serving rationalizations by people who are only interested in getting rich. It's another example of a screwed up incentive system that is increasing risk in the market rather than decreasing it.
Your analogy to a car is exactly the one that I've used for years: a race car doesn't generally have air bags or electronic stability control; it's stripped down to 4 wheels, an engine, and a steering wheel. I don't really trust "robust recovery" as a idea in this sort of environment anyway because most error conditions are extremely rare and usually involve errors in external systems such as the exchange so the recovery code is impossible to test in a normal sense and is thus very likely to be incorrect. My philosophy (developed after a fairly long career in this field) is: "when in doubt, print a descriptive error message, call abort(), and let the support team sort it out." I am extremely risk averse when it comes to the systems I design, so it's not the case of lack of adult supervision but more an understanding of the limits of what you can do in an extremely complicated environment and that writing lots of error-handling code gives only the illusion of safety.
This question was pretty contentious when it was asked. Worse still is that the current accepted answer came from someone who doesn't even work in quantitive finance.
Any argument about performance is totally incorrect. There are indeed a few areas of quant finance that require performance, but those are in the minority. Really, C++ is the top language because of culture. Ie, it's what everyone knows.
Just about every major programming language is used in finance, and each firm has its own preferences. But almost all of them will still interview in C++ because it's so widespread.
Some of my co-workers use Java because their models aren't as sensitive to latency as mine. My best friend does options pricing in VB/Excel. And I know tons of competitors who use R, MATLAB, OCaml, and Haskell.
There are tons of languages used in finance.
IMHO, there are many resources for learning q on code.kx.com . I also went on a training course arranged by First Derivatives (they are the only vendor who offer formal training in q). I would say the best way of learning it is to practice! Use code.kx.com as a reference and download an evaluation copy of the runtime. You should be able to find open source / Free editors for q (QInsightPad, there is also an Eclipse plugin). Set up your environment and try to tackle the Project Euler question set in q.
Alternatively, get a linear algebra / machine learning text book and attempt to solve the exercises in q.
q may seem a little terse, but it is extremely expressive and once you get the hang of the syntax and error handling, it is a joy to use.
Arthur Whitney didn't do APL or J; those were from Kenneth Iverson with Roger Hui helping out on the later. A+ was Arthur's implementation of APL, from what I understand. K is entirely ASCII (none of the special APL characters) and q added reserved words plus the integrated kdb+ database.
> How did you learn Q?
I learned q as a quant for a trading desk that used it for most tasks. I've been using it ever since because it's very expressive and has great performance.
I am familiar with the history. I meant "journey" as in "progression through APL and APL-like languages". Iverson showed APL to Whitney when he was only 11 years old. Whitney created the first version of J, but then moved on, leaving it to Hui.
"Work began in the summer of 1989 when I [Ken Iverson] first discussed my desires with Arthur Whitney. He proposed the use of C for implementation, and produced (on one page and in one afternoon) a working fragment that provided only one function (+), one operator (/), one-letter names, and arrays limited to ranks 0 and 1, but did provide for boxed arrays and for the use of the copula for assigning names to any entity. I showed this fragment to others in the hope of interesting someone competent in both C and APL to take up the work, and soon recruited Roger Hui, who was attracted in part by the unusual style of C programming used by Arthur, a style that made heavy use of preprocessing facilities to permit writing further C in a distinctly APL style. Roger and I then began collaboration on the design and implementation of a dialect of APL (later named J by Roger) ..." - from Hui's "Remembering Ken Iverson", referencing Iverson's "A Personal View of APL" (http://keiapl.org/rhui)
In Appendix A, on that same page, you can find Whitney's code. It shows how differently he thinks about coding, and is likely a good example of Q's roots.
* quantlib: aside from (proprietary, very expensive, and damn slow) Matlab, no other language has a library of quant-related functionality that is so vast
* A lot of 3rd party libraries and APIs that do not have .NET, Python, Ruby, R (you name it) wrappers and you do not have time, expertise or resources to write them
* Like wglb mentioned, quants are obsessed with performance. Even aside from obvious things like high-frequency trading where you try to squeeze out every milliseconds. Let's say in middle office you run risk measurement calculation daily and it finishes in 9 hours for your portfolio. Well, if you happen to triple your portfolio (not unheard of in boom times), you cannot run it daily anymore, so a factor of 1.6 gets in your way here too.
* .NET programmers (on average, of course) tend to have less experience in dealing with algorithms and data structures
Personally, I find this idea very compelling. The quantitive finance profession isn't exactly open and taking a constant influx of new ideas from the rest of the industry. They're a much more secretive bunch. Is it any surprise, then, that their tools stagnate somewhat because of their reluctance to engage openly with the rest of the industry?
I know that some people are engaged in high-speed trading applications where they require a language with a close analogue to machine instructions for the purposes of performance. I haven't seen any evidence that this is all quants everywhere, or that these suites of quantlibs actually provide all their functionality to that segment of the community.
Meanwhile, the CLR and JVM actually generate remarkably fast code from remarkably high-level specifications and LLVM is a real thing. Haskell and Ocaml take functional definitions and often generate better code than longer C++ definitions. I suspect that there is an under-served market here that is reluctant to adopt new tools for social reasons rather than for technical reasons.
Just to answer your point about CLR/JVM code performance: one early comment in the original article that stood out for me was "Pure computational performance (ignoring memory allocation/deallocation) under .NET runtime (ignoring vectorization) is pretty close to the performance of raw C++", which is all well and good except for the fact that you are going to end up with worse code than someone who doesn't ignore memory allocation and vectorization.
That's fine. We disagree on what's the right direction for software engineering. I welcome debate on the subject. I don't consider migrating to customized hardware a cutting edge technique.
It seems like every second justification I hear for the toolchains I see seems to revolve around performance complaints that are only reasonable in a hard-realtime situations or 2001.
> which is all well and good except for the fact that you are going to end up with worse code than someone who doesn't ignore memory allocation and vectorization.
Actually, that's exactly not what happens. Modern GC is good, man, really good. The vectorization scene is even better for the FP world.
As an example, for big scientific computations, Fortran seems to be about 30% faster than C, its next fastest competitor (at least this is what some physicists who did huge jobs to process imagery data to look for planets told me once). If you are running a job that takes 15 minutes in C, this doesn't matter, but if you are running a job that takes 10 days it matters immensely, especially when you consider that you still have to debug. 30% is three more days you have to wait for output before tweaking your stuff and trying to beat the other guys to publication.
With interactive programming, performance means you either get under the perceived instantaneous threshold or not, which is often the difference between "this app is cool" or "this app sucks". If you can get a 2 fold gain using C versus Java or C#, you can do a LOT of stuff "instantaneously" that otherwise make the user tap their fingers impatiently. Not too mention the high speed work of quants.
In the work I do -- processing moderately large datasets into summaries on 12 hour deadlines -- 20% here or there doesn't really matter. I think this is kind of interesting
I used various techniques to get around these such as forcing GCs to happen during quiet periods and making kernel modifications to default socket parameters, but it's definitely non-trivial.
With that said, the sort of C++ failure modes that you're talking about tend to mostly occur when working in optimizations, and if you program with a bunch of STL containers and shared pointers (i.e. program C++ like it was Java) then you don't tend to see these problems very much.
And, just so we're clear, that's multiplicative, not additive. So you're using 2^7 times as much memory as you were using before.
That being said, the development performance (speed, iterative ability) become more important, and most quants are fluent with C++ or Java.
Has this person worked with modern JVM runtimes? JIT compilers? Does he know that they don't actually need to test the access each time but are able to move the test out of the loops? Or use the Unix signaling system to avoid having to create an exception, except when something goes wrong?
Why do people use Bloomberg rather than email?
Why is Sybase the standard?
Why is so much done in Excel?
There are a half dozen reasons for each of these (and a dozen more that they shouldn't be) but the common link is, "It's the market standard, and it's hard to change the standard."
(fyi, I mostly program in C and C++ these days, but used a lot of Java when I worked in telecommunications)
Edit: I wanted to add some evidence, but the job looks to have been filled. The current opening lists C++ but doesn't mention Boost. Last week though...
(these guys are across the street from my old office - good people, from what I hear)