

Data Processing: PLINQ, Parallelism and Performance - stsmytherie
http://msdn.microsoft.com/magazine/gg535673.aspx

======
nathanwdavis
Interesting that on a six core machine there is very little gain after a
Degree of Parallelism of 3. The problem does seem highly data parallelizable
(is that a word). So why isn't it able to better utilize the 6 cores?

~~~
Aaronontheweb
Law of diminishing returns - the associated overhead with additional
parallelization starts creeping up on the benefits of said parallelization.The
problem is parallelizable, but it might not be a big enough problem to need
access to every core in order to achieve maximum performance.

------
ot
From the article:

> Each place name is represented by a UTF-8 (en.wikipedia.org/wiki/UTF-8) text
> line record (variable length) with more than 15 tab-separated data columns.
> _Note: The UTF-8 encoding assures that a tab (0x9) or line feed (0xA) value
> won’t occur as part of a multi-byte sequence; this is essential for several
> implementations._

What? I guess that they use a longer (2-byte?) encoding for those codepoints,
but from the very same wikipedia page that they link:

> a sequence that decodes to a value that should use a shorter sequence (an
> "overlong form") [is invalid]

...

> Implementations of the decoding algorithm MUST protect against decoding
> invalid sequences

Are they advising to use an invalid and potentially broken UTF8 encoding?

~~~
ori_b
I think you misread - "assures", not "assumes". UTF8 guarantees that any valid
ASCII char is NOT part of a UTF8 encoded multibyte char.

~~~
ot
I think you are right.

I thought that the author was meaning that it is possible to use \n and \t in
values because UTF8 would encode them in multibyte sequences (like Modified
UTF8 encodes \0 as 0xC0,0x80).

What he was actually meaning is that if a codepoint is > 127 its multibyte
encoding won't contain any \n or \t.

Sorry for the confusion.

