Hacker News new | past | comments | ask | show | jobs | submit login

If you have a constrained situation such that the iteration is known over a range of fixed-type consecutive data, any language should be able to generate the exact same optimized assembly as a manual C++ vector iteration.

Also, if it's large and you're bound by compute time rather than memory bandwidth, iterating a single vector will be slower than splitting it up between threads.

Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment.

So, no, C++ std::vector isn't a silver bullet except for microbenchmarks, where other languages can (or should be able to) match. And vector iteration has nothing to do with GC.




"any language should be able to generate the exact same optimized assembly as a manual C++ vector iteration"

This is absolutely, massively untrue. If you try making compilers sometimes, you will see how very hard it is for compilers to be sure about anything.

For example: Are you calling a function anywhere inside that iteration? Is this a copying collector? Could anything that function does (or anyone it calls) possibly cause a GC, or cause us to be confused enough that we can't tell whether a GC might happen or not? Then you need read barriers on all your operations in this function, i.e. your iteration is going to be slow.

"Also, if your data to work on is being streamed in, having to make the choices in managing the allocations & uses of std::vector buffers is much less useful than having the system heuristically balance in a more managed environment."

Also absolutely, massively untrue. Your application knows more about its use cases than the generic systems it is built on (which must handle many many different kinds of programs). Because your application knows what is meant to happen, it can make much better performance decisions.


GC as such requiring read barriers in the mutator is nonsense.

You may need to require read barriers if you do concurrent compaction of the heap, as a lot of the "real-time" collectors do.

But state of the art collectors like G1 can do even concurrent marking (but stop-the-world compaction) without mutator read barriers.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63....


Regarding iteration:

First, if something within an iteration calls out and performs memory allocation, well the first thing is that microoptimizations are likely to be dwarfed anyway. :) But most of the GCs I've worked with do not use read barriers. If a GC occurs which moves the iterated structure, all pointers to that area are modified and the iteration continues unaware that the movement has happened.

There certainly are benefits to read-barrier systems, for instance in idealized fully concurrent non-pausing GC, but they're certainly not universal. In particular, they're the vast minority in terms of Lisp systems. (Of course, they can be zero-overhead in a LispM.)

Regarding heuristics:

Yes, I should convey a bit more context. In Lisp development, it's hard to draw a line where the environment stops and user code starts. Plus, my thinking is from heavy server deployments, not tightly fixed-path systems like games & supercomputing, where the universe of possibilities through a code path tends to be smaller and manually manageable.

If scalability, workloads, and execution environments wildly vary, predicting best performance via hand-tweaking and enumerating particular situational decisions quickly diminishes returns and even ends up regressing performance. Unfortunately, this reflects a lot of C family programming styles. I've moved way too many systems (even in C++) away from such designs into a "just code your application, let the system worry about how to optimize it" and have seen performance improvements, as well as code size collapse and far better status awareness.


Look, what you are saying just doesn't work. What happens when the pointers are in registers? What happens when the loop is occurring in a thread running on another core?

Yes, you can make GC work in these situations, but you are going to pay for it. In perf.

I have to say frankly I do not believe any of the words in your last paragraph at all.


It is conceptually very simple: the GC stops the mutator threads while compaction is in progress and objects are moved around; throughput-focused GCs generally do that, better ones do it in parallel with multiple GC threads.

This has an obvious impact on performance that needs to be compared with the mutator calling malloc/free in a non-GC system. The function isn't called "free" because it runs in zero time, you know.


> What happens when the pointers are in registers? What happens when the loop is occurring in a thread running on another core?

These are all ancient-solved problems. When any interruption occurs, CPU state is spilled onto the stack. Depending on the architecture, the current register usage is either part of the language ABI or determined by the code location indicated by the PC.

> Yes, you can make GC work in these situations, but you are going to pay for it. In perf.

I'm not a big fan of Java the language, but the JVM gets incredible performance doing stuff exactly like this. That and optimization features like inlining virtual function calls (a distinctly faster-than-C++ feature) should be a huge inspiration and impetus towards higher level machine optimization, rather than the human hand-holding the machine through its execution details.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: