New tech lead comes in, swaggers about, declares all the street now uses Java for their trading code, we should too. I got the desk heads to listen and be wise to the impending issues, and they told him fine - but if the new Java based code had direct impact on PnL, then his budget, ultimately, he would pay for it. Cocksure, he agreed.
Despite throwing bucket loads of Java devs at it, spending fortunes on "tuning" consultants, performance suffered, GC did affect at critical trading conditions, and eventually he was exposed and kicked out.
The C++ code came back out of retirement, was updated for C++11/14, and still serves them well to this day.
The trick is to segment your critical path code (whatever is executing the actual trades and is highly susceptible to delays) from any business logic. Then you can focus on making critical path components fast - disable GC completely, keep object allocations way down, audit every line of code, etc. With such a setup you can even do better than typical C++ because you avoid virtually all object allocation/deallocation costs that C++ has. Java object allocation is dirt cheap, it just hurts when you GC, which you can basically avoid or schedule for out of hours. If you want better I would not bother with C++ at all personally, but use C / assembly or even look at FPGA type setups.
Yes, you can emulate a young generation in C++ or even multiple young gens (i.e. arenas), obviously, as you can do anything in C++ that you can do in Java and vice-versa (java has the Unsafe class that allows you to do low level programming with pointer arithmetic etc). The question is not, can it be done, but rather what's easier and lower cost?
With the Java approach, the only developer overhead is making sure you don't do too much allocation on the hot path, then sizing your young gen to avoid collections (one command line flag), and doing a GC at night. Not nothing but not too hard either; after all, you can allocate if you really need to, and some allocations will be optimised out anyway.
The C++ equivalent would be to allocate as much as possible on the stack, and then make sure to do a restart every night to eliminate any heap fragmentation that is left. Not that different, until you screw up and try to delete something on the stack or forget to delete something that isn't. That's when the robustness the GC is giving you starts to pay off.
I thought it was about speed.
> The C++ equivalent would be to allocate as much as possible on the stack, and then make sure to do a restart every night to eliminate any heap fragmentation that is left.
The C++ equivalent would be to do heap allocation and never free, then shut the process down which will return its memory to the OS. Heap fragmentation wouldn't matter here since you aren't freeing any memory.
Fundamentally if you can avoid freeing any memory all day then you should just allocate it up front an be done with it. Anything else is ignoring the more fundamental point that you are using a limited amount of memory and pretending that you aren't.
Yes you could also pre-allocate, but that's often more awkward than just allocating at the time of need, and besides, Java can turn heap allocations into not only stack allocations, but fully scalarise them (i.e. break down an allocation into a set of local variables and then delete any that are unused, move them around, etc).
This could not possibly be further from the truth. Getting rid of heap allocations can speed something up by ~7x. Cache locality can speed something up by 50x. Compiler optimizations do very little compared to accessing memory in a predictable way.
I think that's a very brute-force approach. You could instead use an actual allocator which behaves the way you want.
For example this is how arena can work without any restart/stack magic: https://github.com/cleeus/obstack/blob/master/arena_test.cpp...
Same happens in google's protobufs https://developers.google.com/protocol-buffers/docs/referenc...
There's no need (almost) for manually wiping heaps at system level if your language allows you to change the allocator at any point of the app.
jmap -histo:live <PID>
So to have Java perform better than C++ you just have to avoid allocation in Java and leave it in in C++?
> If you want better I would not bother with C++ at all personally, but use C / assembly or even look at FPGA type setups.
There is nothing in C that gives it a speed advantage over C++. Basically you've discovered step 1 of 3 when optimizing software (step 0 is to profile) -
1. Minimize heap allocations
2. Rework memory access for cache locality
3. Use SIMD
1 and 2 can be done with C++. 3 really needs to be done with something like ISPC so that SIMD use isn't fragile and compiler dependent.
Where compilers shine is allowing broad process-wide optimization.
such as whom?
That's the point of using Java (or any interpreted language), isn't it? Avoiding manual memory management?
If you're going to go through all that horrid FactoryFactory Factory = new FactoryFactory(); bullshit, might as well get the benefits of the GC.
And if you don't want the GC, why not write your code in something else?
I get the impression that your only exposure to java is through jokes and blogs from dynamic typing evangelists.
Garbage collection is just one of the things that make languages like Java comfortable to work in, but certainly not the only thing. Disabling automatic garbage collection in Java or C# I feel is a very effective way of gaining most of their advantages while limiting the effects of the runtime on the variability of your execution time.
Java is not an interpreted language by the way. FactoryFactories are a choice, not necessarily forced by the language.
Also, Java is interpreted at the byte code level by default. The way the hotspot JIT works is by detecting hot areas of code and then doing JIT to be compiled.
The only popular languages without a garbage collector are either from the 70s (C) or based on a language from the 70s (C++xx). Both have extreme discomforts when compared to modern languages (Ruby/Haskell/C#/Rust).
Also, technically, that would make Java bytecode an interpreted language, not Java itself ;)
edit: Ok, apparently according to wikipedia I am wrong and Java generally is considered to be an interpreted language, I had a slightly different definition of interpreted in mind.
Why were we programming in (little more than) assembler as our main language???
When I came to learn C, I was already quite an expert in Turbo Pascal, version 6.0 by then.
Comparing C to Turbo Pascal 6.0 in terms of features and safety just felt like "meh!".
Thankfully I got someone that around the same time gave me a copy of Turbo C++ 1.0 and I learned how to get some of the Turbo Pascal features back, while having the C portability and joined the C++ ranks for a few years.
THEN, I got a copy of TC++ a few months or a year later. WTF?!?
The light came on when I read Scott Meyer's "Effective C++" book in 96 or so. "Effective" meant "not stepping on one of 50 or so common land mines". I realized what a bad joke C++ was, and decided never to go back to it.
C is useful as a substitute for assembler, but not most applications. Too bad we didn't get to see more of Smalltalk in school in the 80s. And maybe Scheme instead of just a little bit of Lisp.
* Objective C
* Objective C++
Making a new Turing complete language is ultimately just an exercise in how to do the same things differently rather than better unless you find a way to construct a language that can use hardware more effectively than the existing languages can. Languages that lack garbage collection provide no incentive to wipe the slate clean for easier garbage collection.
There is an enormous difference between expert level use of C and languages like it and beginner use. There are static analysis and dynamic analysis tools that have been built to assist with catching bugs such as misuse of memory or undefined behavior. C and several others that I listed above are flexible enough that you can implement design patterns that you would expect to see in more "advanced" languages and with structured programming, you can use them fairly easily once they are written. Functional programming is doable:
Object oriented programming is also doable:
Generic programming can also be done using a mix of macros and void pointers. A rather powerful design pattern that I have seen in C is a function that encapsulates iteration on some data and invokes a callback that takes an accumulator that had been passed to it with the callback. It is great for iterating over things that are not expected to always fit in system memory. The non-accumulator version of that pattern is used in the POSIX standard ftw C function. The acculumator version feels much like generic programming as the type of the accumulator is known only to the caller and callback. The iteration function has no clue about the acculumulator's type. The same goes for plenty of in-memory data structures implemented with void pointers like lists and trees where the memory describing each node is encapsulated inside the object.
There is definitely a greater learning curve to C and languages like it, but once you are faililiar with the right patterns/abstractions, such languages are a joy to use and the advantages of more "advanced" languages look more like trade-offs rather than be killer features.
Also, people who are familiar with such languages tend to program differently. At work, I have been asked to write some userspace code in Go. I wanted to make some directory traversal code use as few CPU resources as possible (which is a design goal), so I asked for tips on how to do system calls from Go and horrified at least one Go programmer in #go-nuts on freenode in the process. Using the syscalls directly enabled me to take advantage of SYS_gentdents64's d_type to avoid doing a stat to determine whether an entry is a directory or not, increase the buffer to read more directory entries per syscall fewer calls per directory and detect when the end of directory by checking when the buffer has 65535 bytes of free space remaining, which reduces the verse the invocations from 2 to 1. A programmer that does everything the way that garbage collected language authors recommend likely would have had a far less CPU efficient traversal. The superfluous stat syscalls alone would have increased the number of syscalls by at least 1 order of magnitude.
I wrote a patch to glibc last night that enables readdir() to skip the second getdents64 call on small directories and I plan to submit after I have what I consider to be the final version. That ought to accelerate GNU find. I might give the Go OS package similiar treatment, although doing fast directory tree traversal the way I am doing it (which is similiar to what GNU find does) requires that to go OS package provide type information from getdents. That is a non-portable BSD extension that Linux adopted and consequently is something that I would not expect Go to provide.
Java compiles to native machine code?
Apparently many seem to think the OpenJDK is the only option available.
Wikipedia says it best: "[The] Factory pattern deals with the instantiation of object without exposing the instantiation logic."
- Can help rid of excessive switch and if/elseif statements and splits up your logic
- Eliminates new keyword to improve testability
Java's prominence in the corporate world means sturgeon's law is more apparent.
* What a great feature function pointers / procedural types / other call-by-name/reference-mechanisms were. (J8 lamdas are used like function references, but are kind of heavy)
* What a source of mystery "annotations" are.
Java started out like an interpreted version of Delphi or UCSD Pascal disguised as a cleaned up C++, but since 1.1 it evolved slowly and/or in the wrong ways.
Actually, as somebody who has been programming since 1983, in quite a few languages, Java brought little to nothing new to the table. Nothing personal against you, I'm just sick of the Java language/syntax and how it failed to live up to the hype.
* Best IDEs around. (Yes better than .net ecosystem)
* JVM monitoring (Flight Control, VisualVM)
* Performance Sampling/Profiling <1% perf cost
* actually being cross platform (yes, mono, bla)
People forget that Java has the best ecosystem of libraries bar none especially when you get into the more enterprise areas e.g. Big Data.
There is also a huge amount of features that are supported by the JVM that Java still doesn't make use of.
There is also a great JIT and garbage collector.
You CAN write GC code that does very little collection, but you end up being allocation sensitive anyway, so you may as well write c++ code.
For instance I wrote some .NET where instead of logging with strings, I'd use StringBuilder and try to get some reuse. Problem is SB wasn't written for that, and will allocate again if your new string gets too big. In the end it ended up being a pool of stringbuilders of various lengths. C++ would not have had this issue.
The problem with GC tends to be something alluded to in the article. When trading is brisk, a lot of data is coming in. So memory pressure is higher, leading to more GC. That's exactly when you don't want it. And debugging managed code I always find is more complicated than debugging C++. You need a whole suite of memory debuggers, all with their own idioms, and then it only really gives you a hint. With C++, you can valgrind et al where you can override new() and put in some accounting code. And it's normally quite clear where you allocated.
You can also just tell the OS you'll deal with memory yourself. Just get a big piece, and keep track of what's what using various arenas and such constructs. That way you're not even exposed to the variance in the OS allocation time (time to find space for a new object).
It seems like high performance systems should just be coded like microcontrollers, no OS or minimal embedded real time OS and you manage all your memory yourself.
Of course just using c++ is not a solution. There's a bunch of guidelines you need to stick to, especially with string.
For example, any scenario where Rust doesn't even need a lifetime annotation is a scenario where a garbage collecting language compiler still could manually manage collection at almost zero cost. It could, very simply I might add, make inferences of the most efficient way to collect block scoped objects...the vast majority of which end up getting collected in the short lifetime pool. Long lifetime objects might be more efficiently reference counted, so why can't the compiler decide that singletons should be reference counted, or even statically allocated instead of pinged during every large collection? And why can't we have an actual disposable interface that actually performs a manual collection?
For a language that is as memory heavy as Java is, with as much boxing that it does, and the high performance demands of its user base, it blows my mind that their best idea for improvement in garbage collection in the last decade was to naively go after more concurrency. I realize that is what everybody thought the future was for everything, but something something Amadahl's law. There are so many better opportunities that involve allocating less, destroying without a mark stage, etc.
Modern collectors like G1 don't do full mark/sweeps of the entire heap in normal operation - only in rare cases like if you request a memory profile, or if the entire process runs out of RAM and needs to do a hard stop. And reference counting isn't something you can just automatically do due to the presence of cycles.
Notwithstanding these things, you may be interested in Graal, and specifically this talk on automatic region allocation:
Not free by any means in $$$ or performance, but they should have avoided any pauses due to GC.
Why not do as they actually did: use C++ ? Sounds simple enough... and much cheaper.
The problem with Java is that it's not as deterministic in terms of performance as C++, so in areas where that is a must, it has to be taken into account. Even after saying that, the GC can, and indeed should, be tweaked and optimized. The JVM gives you more tweaking options and tools to optimize the garbage collection than any other platform I've worked with, but most developers don't know about them or don't care.
In cases like the original comment mentioned, it may have been better to continue using the C++ implementation, or it might have been useful to spend a bit less in consultants and developers and a bit on Azul's Zing (and maybe a few people that understood the platform). It might have saved a lot in the long term and indeed there are plenty of algorithmic trading solutions written in Java.
* it tends to be harder to write good C++ code than Java. For example we think that an average person can write better java code than C++ code because java is simpler to get going with. We also think that the average developer makes fewer mistakes in java than C++ (because C++ allows you to do 'crazy things!')
* that means that you need (on average) a better developer for C++, and that tends to cost more
* java tooling is exceptional (probably the best there is?)
* making small changes and releasing is much quicker in Java; builds tend to be shorter in duration for example
* once you have the framework for your system, the differentiator is then the business benefits delivered. If you can push small changes out very quickly you're getting rewards quicker and you can experiment more
* once you get down to certain performance points, it becomes similar writing java code as writing C++ code, but the JVM is also doing various work that you can't control so much. That can then inhibit you.
* Azul isn't just h/w development it's also now in kernel plugin form iirc
* Have a look at Aeron (https://github.com/real-logic/Aeron) and I think the C++ and Java versions are fairly similar performance and they're designed by super with decent (industry renowned) developers writing it
* A lot of people write high performance code making extensive use of templates. That's a totally different way of programming than usual C++
C++ costs more, but that's often a non issue for a trading floor. Security is much simpler in the Java world, but also possible in boring C++ code.
Making small changes is really more a question of your code than anything else IMO. There is a lot of crazy C++ and Java code out there.
Now for basic CRUD apps Java is a clear win. For high preformance trading that's an open issue. But, IMO there are much better languages to use.
Having bounds checks (with line number error reporting, rather than secondary, tertiary... damage followed by a core dump), and the use of non-null references (as well as pointers for the initial allocation from the heap) goes a long way towards eliminating much of the time wasting bullshit that C/++ brings into your life.
C is good for portable assembler. Otherwise, I want something less wacky than C++ to be "effective" with at a medium low level, and something higher level than Java for most other apps.
And tools to support them ?
Ignoring the fantasy that I would get to use something higher level (than Java) for most work...
Not everybody panics if they have to write code outside of Eclipse or Visual Studio.
Once upon a time, there was an expectation that programmers knew, or could learn, more than one language.
Sorry about the rude response, but I'm really sick of the dumbing down of everything for (counterproductive) business or "risk management" purposes. To paraphrase, "average performance" in this industry is pretty poor.
Zing is priced on a subscription basis per server.... The annualized subscription price for Zing per physical server ranges from $3500 (for several hundred servers) to $8000. Higher volumes and longer subscription terms will reduce the per-server price for Zing. Pricing for virtual servers is also available upon request.
Since you're likely to be running it on a server with a hardware cost approximately the same or larger, next time I'll only use two dollar signs ($$) ^_^.
And I've indeed heard it's popular with financial trading firms for the obvious reasons. Up to 1 TiB heap, no problem, and no 1,024 second pause which other collectors require for a full GC.
I remember when all C++ compilers worth using were commercial.
Finance is software and has been for a while. Competent players in this field wouldn't base entire implementation strategy based on one expert view. More likely, they would have more substantial trade off discussions.
I would this shop had much bigger problems than choice of stack.
- avoid allocating objects on the heap which you do not have to allocate. The less fresh allocations you have, the less the GC has to do. That does not mean you should write ugly and complex code, but if the tool described in the article was for example grep-like, then one should not have to allocate each line read separately on the heap just to discard it. If possible use a buffer for reading in, if the io libraries allow it.
- generational GCs try to work around this a bit, as the youngest generation is collected very quickly, assuming the majority of the objects is already "dead" when it happens, only the "survivors" are copied to older generations. Make sure that the youngest generation is large enough, that this assumption is true and only objects are promoted to older generations which indeed have a longer lifetime.
- language/library design makes a huge difference how much pressure there is on the GC system. Less heap allocations help, also languages, which try not to create too complex heap layouts. In Java, an array of objects means an array of pointers to objects which could be scattered around the heap, while in Go you can have an array of structs which is one contiguous block of memory which drastically reduces heap complexity (but of course, is more effort to reallocate for growing).
- good library design can bring a lot of efficiency. At some point in time, just opening a file in Java would create several separate objects which referred to each other (a buffered reader which points to the file object...). My impression is, "modern" Java libraries too often create even larger object chains for a single task. This can add to the GC pressure.
Of course, all these practices can be used equally "well" to bring down a program with manual allocation to a crawl. So in summary I am a strong proponend of GC, but one needs to be aware of at least the performance tradeoffs different factorings of one program can bring. Modern GCs are increadible fast, but that is not a magic property.
Had a language like Eiffel, Oberon dialects or Modula-3 taken its role in the industry, I bet we wouldn't be having this constant discussions of GC vs manual, in terms of performance.
Another issue, is that many developers even in languages with GC and value types, tend to be "new() happy" sometimes leading to designs that are very hard to refactor when the the need arises, given the differences in the type system between reference and value types.
Eiffel is probably the only one I can remember, where the difference is a simple attribute.
Haskell doesn't quite have a distinction between reference and value types, but there are boxed and unboxed values, which I think fits the bill for the purpose of this discussion.
Perhaps difference between normal values, and values wrapped in eg IORef counts?
However, these types (excepting unboxed types) can also be considered reference types since values are technically passed by reference. Boxed values are stored in thunks, and only accessed through these (at least conceptually). Thunks in turn have GC overhead.
Boxed values have even associated internal state, but there is not much control over it: the state of their evaluation.
So another viewpoint is that boxed Haskell types are just like Java reference types, but from the outside they seem immutable.
This is an implementation detail, and abstractions shall never be conflated with their possible implementations. What matters to the user of the abstraction is that you're passing around (a computation that evaluates to) the value itself, not some mutable entity whose current state is the value.
> So another viewpoint is that boxed Haskell types are just like Java reference types, but they are immutable.
This is wrong. In Java, all objects have a distinct identity, so, for instance, `new Complex(2,3) == new Complex(2,3)` evaluates to `false`, even if both objects represent the complex number `2+3i`.
 Since Haskell is non-strict.
We're talking about performance here, right? So unfortunately this statement is totally untrue in that context.
Java world wants value types largely to avoid pointer chasing that hurts cpu cache effectiveness. Haskell suffers the same issue, then. You can make what are effectively value types in Java as long as you don't use == to compare but rather .equals, make all the fields final, override toString/equals/hashcode etc, and more sensible JVM targeting languages of course convert == into calls to .equals by default.
You don't need implementation details to discuss performance. Abstractions can come with a cost model. (C++'s standard library concepts are a great example of this.) And Haskell's cost model says that values can be passed around in O(1) time.
> Java world wants value types largely to avoid pointer chasing that hurts cpu cache effectiveness. Haskell suffers the same issue, then.
That's not the whole story. When the physical identity of an object no longer matters (note that immutability alone isn't enough), the language implementation can, without programmer intervention:
(0) Merge several small objects into a single large object. (Laziness kind of gets in the way, though. ML fares better in this regard.)
(1) Use hash consing or the flyweight pattern more aggressively.
> You can make what are effectively value types in Java as long as you don't use == to compare but rather .equals, make all the fields final, override toString/equals/hashcode etc, and more sensible JVM targeting languages of course convert == into calls to .equals by default.
But the JVM itself is completely unaware that you intend to use a class as a value type, and thus can't automatically apply optimizations that are no-brainers in runtime systems for languages where all types are value types.
Also, `.equals()`'s type is broken: it allows you to compare any pair of `Object`s, but it should only allow you to compare objects of the same class.
As you point out, Haskell could in theory merge things together and lay out memory more effectively, but in theory so could Java with e.g. interprocedural escape analysis. But in practice there's a limit to what automatic compiler optimisations can do - learning this the hard way is a big part of the history of functional languages (like, how many functional language runtimes automatically parallelise apps with real speedups?). Java sometimes does convert object allocations into value types via the scalarisation optimisations, but not always, hence the desire to add it to the language.
I am not sure of that. The compiler is free to inline expressions, or otherwise evaluate them multiple times.
And you switch sides just to disagree with your last quote.
My objection was also a technical one: Haskell doesn't give you access to the physical identity of any runtime object that isn't a reference cell (`IORef`, `STRef`, `MVar`, `TVar`, etc.).
> (and performance-centric) one
The ability to pass around arbitrarily complicated values in O(1) time is an intrinsic part of Haskell's cost model. This is the most natural thing in the world, unless you've lived all your life subordinating values to object identities, in which case, yes, non-destructively transferring a value from one object identity to another (aka “deep cloning”) might be an arbitrarily expensive operation.
I am more of a Swift/Rust guy than Go, in terms of features.
Even Oberon eventually evolved into Active Oberon and Component Pascal variants, both more feature rich than Go.
To be honest, Niklaus Wirth's latest design, Oberon-07 is even more minimalist than Oberon itself.
Can you elaborate? Thanks.
Hence why I rather see the appeal of Go as a way to attract developers that would otherwise use C, to make use of a more safer programming language.
As many turn to C, just because they don't know other AOT compiled languages well, not because they really need any C special feature.
Regardless of the discussion regarding if it is a systems programming language or not, I think it can be, given its lineage. It only needs someone to get the bootstraped version (1.6) write a bare metal runtime and then it would be proven. Maybe a nice idea for someone looking for a PhD thesis in OS area.
Me, I would rather make use of a .NET, JVM or ML influenced language as those have type systems more of my liking.
- Enumerated types
- Enumerations as array indexes
- Classic OO with inheritance
- Untraced references in unsafe packages
- Unsafe packages for low level systems programming
- Reference parameters
- Bit packing
- Since Modula-3 was a system programming language for SPIN OS, the runtime library was richer, including GUI components
There a few other features.
All the languages that I referenced have value types in the same lineage as Algol derived languages.
This means that you can make use of the stack, global statics, structs of arrays, arrays of structs and so on.
The GC only comes into play when you make use of the heap, of course. Also the GC has a more rich API, given that besides Eiffel, the other languages are system programming languages. So you can let the GC know that certain areas aren't to be monitored or released right away (manual style).
So given that you have all the memory allocation techniques at your disposal the stress on the GC isn't as big as in Java's case.
But sadly none of those earned the hearts of the industry.
The closest you have to them are D (which deserves a better GC implementation) and the improvements coming to .NET Native via the Midori project.
You will need to decompose the classes across multiple arrays, thus leading to very hard to maintain code.
Uhm, aren't those "primitives"? Or did you mean user-definable value-types?
As powerful as the JVM is, it can't magically fix your broken programs. Unfortunately many memory pressure problems remain hidden until you encounter production workloads. What I'd like to see most is practical advice on how to avoid these problems in the first place, how to debug them if they occur, and how to effectively test your code for leaks/memory bugs.
If the program is written in C++ instead, what'd happen is it'd keep allocating memory beyond the 4 gig limit she imposed on the JVM, until it hit swap and the entire machine bogged down and became slow, or until the kernel OOM killer randomly killed some other (possibly important) program on her desktop to try and make space for it.
If she tried to fix that with ulimit, then she'd get different behaviour - the program would die quickly without slowdown, but before actually using 4 gigabytes of heap, due to fragmentation.
In the latest Java release there's a flag that makes the JVM exit as soon as there's an OutOfMemoryError (or do a heap dump for diagnostics), and there's also -XX:+UseGCOverheadLimit which makes the JVM give up sooner if it's spending more than 98% of its time garbage collecting (i.e. it's reached the limit of its heap but not quite).
I think it's worth pointing out that, if my experience with c++ vs c# translates at all to c++ vs Java, if the program was written in C++ instead it's memory footprint would have been somewhere between 1/10 and 1/4 of the Java version.
I disagree about manual memory management though, you don't have to traverse the heap, unless you use a semiautomatic scheme like reference counts.
Complex heap structures often aren't a problem for manual memory management.
The scenarios that kill GC aren't ones you typically worry about in manually managed code.
GC frees you from some concerns and gives you much safety but makes performance harder and less predictable. For many programs this is a good trade off.
Sometimes I see this as a side effect of all of the available libraries for java. Abstraction gets to the point where it's not easy to predict the behavior of something you're using.
Like, you're using Tomcat, with CXF client libs, which...under the covers, uses HttpURLConnection. Your app works great with http, but you need to switch to https. Unknown to you, when you switch to https...your object count doubles. Because the design decision all the way down at the bottom was to spin up a new object (per connection) to handle an SSL handshake.
My cynical self views garbage collection as technical debt for memory management. Sure, it's unfair because modern GCs will be way better at managing memory for medium complexity projects than any home-grown solution. But when the project gets complex—as many mature ones do—memory management which was so blissfully delegated to the GC becomes a sore issue. But by that time, it's in the context of a lot of complexity going on so it is not only harder to troubleshoot but harder to remediate.
Often time bad allocation practices will survive until the system is pushed hard and by that time it gets harder to change those things. Its much easier if you just have a sense of bad allocation at the start of the design. It gets specially bad with libraries doing heavy allocation and with the level of abstractions & dependencies that we usually have today you can get very bad allocation even if your own code is neat.
I'm eventually gonna have to learn me some Erlang, I fear.
: Which is relevant if you're trying to turn a 100ms pause into a 15ms pause, or get rid of the 2sec pause you get once every few hours.
Or running out of memory: The application keeps pointers to objects that will be used later.
Both problems are solvable, you remove the pointers or change the algorithm respectively. (If you simply can't add more memory.)
The real hard problem is that the jvm takes a long time to report an OOM error. But it's not unique to java; Who have not seen servers that have become unresponsive in a low memory situation.
I used to work on a network management software that used ICMP polling to detect if network devices were down. We had a SEDA architecture, requests were put on a queue, timers were set and if the device did not respond within a timeout, we would mark the device as down.
Problem was, it so happened that in a high load system after we sent out the request, the garbage collector would kick in and take eons to return the system to running state. When the system returns, the timer events would fire and the handlers would note that the timeout has expired and mark the devices as down. The device could have responded in time to the requests but the system would not have detected it.
This is why I am weary of languages with mandatory garbage collection. I feel it should be a library in any serious systems language.
It has threads which concurrently collect as other threads mutate, uses clever VM tricks such as bulk operations with only one TLB invalidation (or at least they did that with an earlier version of the current collector, they couldn't get it into the mainline Linux kernel and now use a DLKM). It's the only non-toy currently maintained pauseless/concurrent GC that I know of.
That said, there are other ways to avoid writing "free" than using a garbage collector. Regions (https://en.wikipedia.org/wiki/Region-based_memory_management) are faster at allocation than malloc (you just increment a pointer to allocate) and faster at freeing than a GC (you throw away the entire region when you're done with it). It seems tricky to base a general-purpose programming language around them, though.
That is precisely why I mentioned alloca(3C): it automatically frees the memory for you, if you do not want to do it yourself. From the Solaris / illumos alloca(3C) manual page:
void *alloca(size_t size);
The alloca() function allocates size bytes of space in the
stack frame of the caller, and returns a pointer to the
allocated block. This temporary space is automatically freed
when the caller returns. If the allocated block is beyond
the current stack limit, the resulting behavior is unde-
I got my start on MOS 6502 / MOS 6510 / MC68000 assembler, so for me making malloc(3C) and free(3C) calls when programming in a functional style is completely normal. I have no problem with that whatsoever.
Did you write your 6502 code with closures, higher-order functions, and so on? My point is that can be hard to figure out when to free an object in this kind of environment, where a value can be captured by multiple closures and may not have a clear owner.
If the allocated block is beyond
the current stack limit, the resulting behavior is unde-
int main(int argc, char *argv)
struct rlimit limit;
printf("Stack start at %p, end at %p.\n", &argc, &argc - limit.rlim_cur);
% cc limit.c -o limit && ./limit
Stack start at 0x7fff51094ba8, end at 0x7fff4f094ba8.
GC languages require a certain amount of extra memory beyond the live memory as an overhead, but beyond that they should not have more impact on the memory footprint. In some use cases, this extra memory is not acceptable, but for reasonably sized applications, this is acceptable (not talking about Java here...) and the benefit is the correctness and often cleaner code (no complex protocols for object lifetime).
If your problem domain is fine with that, that's great. But I will never use Java for something latency sensitive again.
After I left that job I worked on the other side of RTB, on the exchanges themselves. They were both written in C++, and performance was reliable and awesome.
I would only use something like C++ or Rust for this purpose.
It fixes a lot of problems that used to be introduced, and as long as you let it run wild with memory, and you are doing some parallelized work like everything real world, you should not see this problem.
There should be either no more, or less, stop the world.
There are a few things you can do to also avoid this problem all together:
The biggest improvement in speed vs memory will come from not passing primitive as function parameters. When you are doing this, you are passing-by-value, not passing-by-reference. If you wrap a bunch of ints to a function that you are using you can save a lot of allocation cycles.
Another good change that you can make would be an object pool. There is a really good and fast implementation in JMonkeyEngine/LWJGL. They have a low level, thread happy, object pool.
That's a pretty scary bug. Who knows how stuff like that will trash your data if you aren't properly checksumming everything.
CMS and other GCs have the advantage of years of bug-squashing and tuning. G1 is exciting, but I wouldn't personally use it on anything important for quite some time.
Of course, that's a totally unfair comparison: CMS has had a decade of bug squashing...I'm sure it had equally scary bugs when it was new. But that's the point. Don't use new, shiny GC's because they are still squishing bugs :)
(Sorry, preaching to the choir, I just get frustrated by everyone claiming G1 will solve all their problems without investigating potential downsides)
G1 addresses certain problem areas of CMS and replaces them with others. Honestly I hope in ten years from now we have better choices in HotSpot than CMS or G1 but right now it doesn't look like it (if you don't count Shenandoah with has other issues).
Having that said I have recently seen G1 performing exceptionally well in production: 120ms GC pauses with a 120 MB/s sustained allocation rate with basically default settings (apart from GC logging).
[ParOldGen: 232384K->232244K(485888K)] 243136K->241951K(628736K)
-> parallel old
A full GC frees 140k of old 1045k of young memory. He's almost running out of memory with parallel old. He needs to run a way larger heap. G1 isn't going to be any better with a live set the size he has in combination with his max heap.
All this explained in Java Performance 
My indexers are designed to automatically shutdown, if they are spending more than 30% of their time doing garbage collecting, in the last 10 minutes. If they shutdown on purpose, my background Perl script will restart them.
However, if they shutdown X number of times in a row, my Perl script won't restart them. Multiple consecutive shutdowns, usually means I'm pushing the system too hard and I'll need to tweak my indexers thread settings.
It touches the basic aspects of garbage collecting, and dives into the different kinds of GC available for java at this time.
In C++ you get automatically managed memory on the stack.
You get RAII and smart pointers to help with heap allocations/deallocations.
Most importantly, you get to _use_ the system's malloc implementation whilst you have to _implement_ your own malloc with the Java off-heap solution you suggest.
Java, in another way, it has an asynchronous GC -- use different generations, and not release object right away just because out of scope.
Anyway, I am saying 'like C++', I meant the possible way C++ can manually manage the memory directly, learnt from what you mentioned, I should say: 'like C'. Thanks.
In my view, sun.misc.Unsafe is extremely unproductive to work with. But I will say that it may be the right solution for adding a small very spcialized feature to a larger Java application.
For those who are looking to read the arcane output of a Java GC log, you can grab a 7-day free trial of Censum (https://www.jclarity.com/product/censum-free-trial/) - it parses GC logs (Java 6-9 all collectors) and gives you a host of analytics and graphs to help you figure out what's going on. We've also got blog posts on GC (https://www.jclarity.com/blog) and our slideshare http://www.slideshare.net/jclarity
This is a super valuable tool, which I recommend people take a look at should they have the misfortune to need to read a Java GC log.
Great work, jClarity!
Incidentally, this is great: https://www.mozilla.org/en-US/firefox/46.0beta/releasenotes/
> Allocation and garbage collection pause profiling in the performance panel
He wrote 0.09 seconds. In any case, if you're generating megabytes to gigabytes of garbage in a single frame you probably deserve a GC pause.
Sadly the whole set of DEC, Compaq, HP acquisition processes killed it.
Though megabytes of garbage are easy to "achieve" if you don't pool objects, call functions with temporary objects as parameters or return values, etc. It really just depends on how much you do that way, which can easily be a way that works fine during execution, or even seems really elegant, but bites you in the ass at GC time. Especially since 60 frames a second doesn't mean you're not running some other loop for the logic more often than that, and it's not like you ever have the full 16ms for rendering and logic either.
Say you have a game that might spawn and kill hundreds or thousands of enemies and/or projectiles per frame, and those enemies are more than just a pair of coordinates. You simply can't use something like new/delete for that, period. If you haven't even tried, be it in the context of game or something comparable, and dismiss what I say based on some notion of what I "deserve" or what one should criticize about a language (when I was simply talking about what you need to do today, to get a specific thing done in a specific environment), you're really missing out.
And since as of yet there is no way to control when GC occurs, and no way to enforce it not taking longer than X ms, you have to know how to avoid it. It's not that you always have to avoid it, but you do need to know how to, at least if you write a smooth game, a physics library etc., anything that runs for a while in a quick loop and does more than a handful things which should not drop frames when avoidable.
Working in Java for 10 years make me realize that so many solutions Java/ JVM ecosystem provides are to the problems that Java ecosystem created in first place.
There are lots of companies making a living selling tools to track down and fix memory corruption issues in C and C++.
Java is not alone.
This allows it to shuffle things around more effectively while it is cleaning up, doing things like copying into a compacted area.
The author mentioned reading in files with x "number of lines". If they are then parsing the lines into some structured format, there are likely many opportunities to look for low cardinality aspects and to reduce object tenuring by pooling strings using either String.intern or a hashset.
They should also consider increasing the eden size.
thanks for making me feel young again.
If you have a substantive criticism to make, please make it neutrally, so those of us who don't know what you know can learn something. Putting others' work down distracts attention from the subject matter, makes you sound like a jerk, and makes HN a bad place to be.