
1024cores - kung-fu-master
http://www.1024cores.net/
======
chrisaycock
The "many core" problem is this decade's C10K. I look forward to more expert
discussion on scaling across massively multi-core architectures.

~~~
zerothehero
Is it really? Scaling today means running on more than one machine (google,
facebook, twitter, etc.)

That means no shared memory. He helpfully makes this distinction on his front
page ("I'm mostly interested in shared-memory system, so if you are looking
for information about clusters, web-farms, distributed databases and the like,
it's the wrong place")

According to Google's Jeff Dean, "to Google, multi-core computers look like
separate servers with really fast interconnections" (i.e. memory).

So if you are running your applications on many machines anyway, you might as
well _drastically_ simplify your code by writing it "single-threaded" and
running #cores copies on each machine.

~~~
dvyukov
> Scaling today means running on more than one machine

Of course it's not. Hundreds of millions of people use just a single computer
for a lot of tasks.

>you might as well drastically simplify your code by writing it "single-
threaded" and running #cores copies on each machine.

It's not the worst approach. However there may be significant penalties in
terms of performance and latency in some contexts. You are definitely don't
want to use your approach for games and browsers. As for server software it
depends on performance/latency requirements. For example, if you will use it
in High-Frequency Trading marker, count you loose all your money.

------
sgt
Interesting, but seriously, take those ads off. Are you expecting to make
money off of this? Even if you did, it would be pocket change at most.

~~~
hmottestad
I have ads on my own site, never made much off of them, not even enough to
partially cover my hosting costs.

However I leave them there just as a reminder that someone is footing the
bill.

If someone doesn't like ads, then they can use an ad blocker :)

~~~
dvyukov
I am very close to covering a first year of hosting :) As for consulting and,
perhaps, paid libraries, well, first I need a good site, right?

------
codex
So far, nobody has mentioned the author's C++ based race detection tool.
Conceptually it is similar to Corensic's commercial product, Jinx
(<http://www.corensic.com>) but practically Jinx supports more languages,
doesn't require recompilation, and is most likely much faster.

~~~
dvyukov
It's "similar" to a lot of software out there: Intel Thread Checker, Chord,
Zing, Spin, RacerX, CheckFence, Sober, Coverity Thread Analyzer, CHESS, KISS,
PreFast, Prefix, FxCop. However, what you are missing is that most of these
tools (and Jinx as far as I see, can't find clear description on the site,
mostly vague marketing stuff) are of help to you if you are an application
developer who writes in term of mutexes. While RRD is of help to you if you
are implementing mutexes itself. Can you verify involved mutex algorithm down
to possible memory access reorderings? I doubt.

> and is most likely much faster. Or an order of magnitude slower.

~~~
Kaya
Jinx can help verify mutex implementations themselves, although the example
code that ships with the product is a little more advanced (lock-free stack).
Some of the underlying technology is described here:
[http://s3.amazonaws.com/corensic/whitepapers/DeterministicSh...](http://s3.amazonaws.com/corensic/whitepapers/DeterministicSharedMemoryMultiProcessing.pdf)
and here:
[http://www.corensic.com/WhyYouNeedJinx/CorensicHasaUniqueTec...](http://www.corensic.com/WhyYouNeedJinx/CorensicHasaUniqueTechnologyforFindingBugs.aspx).
Because it's a hypervisor, it can aid in verifying synchronization primitives
that are a mix of userspace and kernel code.

~~~
dvyukov
I do not see anything about memory fences. If Jinx does not support them, then
it's pretty much useless for verification of synchronization algorithms. I've
implemented dozens of advanced synchronization algorithms, and I may say that
it's crucial. Also, if it works on binary level (does not require re-
compilation), then it also renders it useless, because on that level you lose
information about order of memory accesses, memory fences, atomicity. For
example, if you see plain x86 MOV instruction, what is it? non-atomic store?
atomic relaxed store? atomic release store?

------
viraptor
> ..., atomic-free synchronization algorithms

Actually I'm not sure if it was supposed to be funny or serious. I see the
funny "everything-free" list, as well as can imagine that there is some action
you can do not atomically (relative to other actions) that gives you
synchronisation.

Anyone?

~~~
dkersten
I'm not quite sure what you mean, but synchronization without atomic
operations is possible.

An example of mutual exclusion, without any atomic operations, taken from the
book " _The art of multiprocessor programming_ "[1] is (paraphrased) as
follows:

Two threads, A and B, want to access some memory. Each thread has a flag.

When thread A wants to access the shared memory:

    
    
        Set flag A
        Wait for flag B to become unset
        Access memory
        Unset flag A
    

When thread B wants to access the shared memory:

    
    
        Set flag B
        While flag A is set {
            Unset flag B
            Wait for flag A to become unset
            Set flag B
        }
        Access memory
        Unset flag B
    

Obviously this isn't a general purpose solution, but rather an easy to
understand example demonstrating that atomic operations are not required.

[1] [http://www.amazon.com/Art-Multiprocessor-Programming-
Maurice...](http://www.amazon.com/Art-Multiprocessor-Programming-Maurice-
Herlihy/dp/0123705916)

~~~
tedunangst
That only works with coherent in order memory operations. Once you add the
appropriate memory barriers, it looks a lot more "atomic".

~~~
dkersten
I chose that example because its easy to understand, obviously in modern
processors with out of order execution and whatnot, you would need something a
lot more elaborate.

 _Once you add the appropriate memory barriers, it looks a lot more "atomic"_

Well, they force in order memory access. That doesn't look terribly "atomic"
to me, but I understand your point.

