I had restrained myself from responding with a 'this' to @userbinator's comment. What you say is true but it is not trivial. This not to say that you implied it to be trivial.
The key is that with some thought and change in approach one can sometimes transform between the two regimes
"do little" with "much data" --> "do much" with "little data" __at_a_time__
Often it is not at all obvious how to do this. It is fairly common in discussions on parallelism for naysayers to appear and claim most things / algorithms aren't parallel and that's the end of it.
Creativity lies in rethinking algorithms so that the transformation above applies. Sometimes this is reasonably easy, for example for matrix multiplication. It is quite detrimental to write it out as a loop over large rows x large columns. The speedup will be abominable. The solution is also fairly obvious here and that is why it is everyone's favorite example, in general however solutions are rarely that obvious. Here we only reordered the data access so that it touches the same things often, there was no fundamental change in the algorithm. These are the easy cases.
Once I had to fight with my co-worker about how to structure code. I was arguing against building the operation as a huge matrix times vector and offload it to architecture specific BLAS. I lost that argument, it was not clear at that time which approach would be better. When the implementation was done the speedup was missing in action. Large Mat x Vecs are just bad primitives to use on modern CPUs (too little computation per data touched): one lesson learned. After all BLAS cannot do magic, its still going to run on the same CPU (unless your BLAS offloads it to some other sub-architecture like GPUs, even then bandwidth is a major concern).
Profiling showed that this was choking on memory bandwidth. The bus just cannot keep up with 1 core hammering it with read requests, let alone 4. At this point one could have washed one's hand off and said that the algorithm is memory bandwidth limited and that's the end of the story. But those limitations are true only if you have to stick with that particular algorithm or the specific order of the data access. Usually those restrictions are artificial.
I had far better luck by breaking it up into pieces so that the next time around most of the data was pulled from cache.
This required changing the algorithm a little bit, and also proving that the new algorithm converges to the same thing.
It still pays significantly due to synchronization on de-allocation. My next target is to switch to stack based allocation/deallocation because the allocation pattern for the most part follows a stack discipline for this application. For multi-threaded allocations that is the best kind of allocation pattern you can have. Wish alloca was included in POSIX and that it returned usable error values, otherwise its pretty much a case of call and pray.
For compute heavy stuff hyperthreads are rarely useful. The bottleneck is typically floating point operations and FPU becomes the scarce resource. Hyperthreading actually ends up hurting because all the threads want to do the same thing: some floating point operation (ideal for GPU, not for hyperthreading). Sometimes it is possible to make one hyperthread take care of streaming in the data and uncompressing it in memory while the other takes care of floating point operations. I have not tried this out myself but ought to work out (in theory :)
Thanks for this comment. That's really interesting. It seems like caching is becoming more and more important at every level of the system. Even GPUs are getting DRAM caches now. So "do much with little data at a time" is the way to go.
P.S. I work on Hadoop. "Do little with much data" is a pretty good high level description of what we do. Our single-node efficiency is not what I'd like it to be, but it's improving with next generation SQL and other execution engines.
> Secondly, who cares about "wasted" resources of a whole underutilised server when reliable dedicated servers are so cheap, and in such plentiful supply?
The people bankrolling Google / Facebook / Twitter's electricity bills seem to care quite a bit.
There is another trope that gets repeated often, (and this is not even remotely directed at you, just a digression hopefully somewhat on topic) "performant languages runtimes are an anachronism, a bog slow language in which a programmer can code fast is way more useful than any of the performance bull crap". Typically the person repeating that would a be a webdev. However, in these large scale scenarios core infrastructural code can save orders of magnitude more in money in running costs than saving days in software development. So yeah at the interesting places algorithms and efficiency continue to matter. A reason that Google always managed to be ahead is partly due to how successful it was in minimizing running costs.
Totally agree on that front. Efficient software = fewer servers. You can't know when to invest in new capacity if you can't tell the difference between hitting the limit of the hardware, and a fixable performance problem. Seemded like Twitter wasn't sure what was going on there for a few years :)
We do a lot of finding ENORMOUS performance problems with customer servers - simple stuff like a thundering herd, a vital missing index or a filesystem that's being overtaxed That's the kind of scale we work at. But those sorts of insights can make the difference between "help we might need a new server" and "oh thank god it's all working again".
At Google scale it matters, sure. But most companies aren't Google scale. Writing your single-company 100-user CRUD webapp in Java or C++ "for performance" is the ultimate false economy.
/I used to work for a company that had a big Java app. We laughed at our client who needed 60 Rails servers to deliver worse performance than our single-instance app. But they probably saved more on dev costs than they spent on servers.
Upvoted. In fact while reading the article I was just thinking to myself that the HN comments would be all about Plan 9. Kind of surprised that it has not been mentioned enough.
What I am really keen to find out in the coming years is what MirageOS makes of this. If you are not familiar this article http://queue.acm.org/detail.cfm?id=2566628 explains it way better than I could. I wouldnt claim it is there yet but seems to be sitting right at an envious position full of realizable potential. Its written in OCaml to boot.
Yes and now C99 supports variable length arrays anyway. The storage gets allocated (deallocated) as control enters (exits) the scope. Jumping in I think is not allowed, dunno if that gets compile time checked or results in a run time error. alloca is officially not portable but its present on platforms I care about. It is living dangerously, sure, but quite effective nonetheless.
Would have loved a portable way to query how much stack space the process has left. The problem some architectures need not even have a stack.
> then the problem is with the program's structure
....or with the choice of the programming language. Some styles goes better together with some programming languages. CPython is possibly the worst offender in terms of function call overhead, amplified further by its lack of inlining / macro capabilities.
PyPy makes it better, but were they ever fighting an uphill task !
Python, on retrospect, seems to have been designed almost as a snarky challenge thrown at someone: "You think you are too smart, do you ? Let me see how you can make this go fast. No I want elide tail calls".
Spot on, and this is precisely what I find fascinating about it.
Quoting from an old comment of mine:
To me striking things about its body of thought are its philosophical roots, the fact that they have been thinking deeply about such questions since antiquity,.. and that not believing in any form of god is perfectly acceptable.
... for people coming from an Abrahamic religion, its a difficult thing to grasp. Hinduism is not 'a religion' if one goes by the notion of a religion in Abarahamic religions. It is worse than trying to map git commands to subversion. Its a very different beast, it is a meta religion (or more accurately a diverse collection of a very large body of thought and introspection, originating from a geographical region and built over time, that visitors clubbed into a single pool because they werent sure what to make of it). It is more like a religion factory pattern for building your own religion that includes questions you should keep visiting in that process, and a more fundamental one, why at all (and when) should one even consider building one. It lays down thought processes, questions that one should consider and critique when one is forming ones own parameterized religion. People get confused whether they are talking about the polymorphic class or the object instance.
> a lot of advaita scholars maintain that the original Gita was much smaller
Not only that, you may completely ignore everything that is in the Gita and still be a Hindu. As long as your belief system is not in conflict with the Vedic philosophy you have a perfectly legit claim to be one. And this gives you enormous amounts of personal space to form your own set of rules and live by them. I am sure you would know but for the benfit of those unfamiliar with the details, the 'Bhakti' that the article talks about is very much a post Gita thing, had little to no role in Hinduism prior to it. This "one book" obsession is very much an Abrahamic response to try and map something unfamiliar to something familiar. I find it quite amusing.
tn13 is correct actually. I think he is saying the ISIS arent violent because of Koran but because of their political aspirations (at least the way I parsed it, only th13 can clarify that part).
About the rest, the source of the confusion is Hindutva, a political narrative that champions Hindu supremacy started formally by Savarkar (who snitched like canary to the British, excuse my hashed metaphor) and Hinduism the religion. The Godses were motivated by the former. [I am going to burn some serious karma here, you see Savarkar is a golden hero to the RSS]
Again, agreeing with you wholeheartedly in every bit except the last part. I am just uncomfortable about endorsing / invoking Tilak. That man was a bit cuckoo when it came to things about Hindu supremacy. In his eagerness to show racial purity he frigging claimed Hindus were Aryans originating from the North frigging pole and backed it up with his 'scientific' reasoning. Honest, I kid you not, wish I had something I could cite, but am pretty sure it wont be too hard to Google the right source.