More

askee · on Jan 2, 2021

It's also about performance. The snippet assumes the byte order of the buffer or byte stream is little endian so only covers two of four cases. If you want to read a little-endian byte stream on a big-endian machine, of course you need swapping. But, considering portability, if that byte stream had been written by the big-endian machine before (or came directly from network) it would be in big-endian ordering and swapping would be wrong.

Assuming LE ordering on the writer side, you would need to swap again on a BE machine. Usually, however, you want to store in your target byte order, though. One could argue this is irrelevant as long as the memory doesn't leave your application, of course. Yet, by explicitly having macros _to_cpu, _to_be, _to_le and so on you make all of this explicit.

sgerenser · on Jan 2, 2021

The snippet assumes the data being read is little endian, yes. You can make a different snippet if you want to read big endian. You use an equivalent snippet for the writing side, and boom. You never need any #ifdefs in those functions. Is it actually better than using functions like you describe? Maybe. A part of me does love the elegance of using the exact same code on any platform regardless of the native byte order. This construct was suggested/encouraged by Rob Pike: https://commandcenter.blogspot.com/2012/04/byte-order-fallac...

askee · on March 10, 2018

Wasn't the mostly unanimous agreement in this thread that this paper was of low quality and its conclusions are questionable at best?

askee · on Jan 20, 2018

Also, it's "deus absconditus" not "abscondidus" in 1.

askee · on March 10, 2017

The k'th largest element in an array can also be found by the nth_element() algorithm already included in the STL. It has linear complexity as opposed to the O(n log k) and O(k log n) variants proposed here.

asafira · on March 10, 2017

It is only linear in std::distance(first, last), so it very well might be (and likely is) nlogk .

Edit: see comment below for why it actually is O(n) and not O(n logk) as I had thought.

lorenzhs · on March 10, 2017

But std::distance(first, last) is how n is defined.

std::nth_element is typically implemented using a fancy version of quickselect called introselect. You can imagine quickselect as a version of quicksort where the recursive call is only executed on the side which contains the element. Introselect adds some fanciness to ensure worst-case linear running time if quickselect takes too long (bad pivot selection).

asafira · on March 10, 2017

very cool! I stand corrected, I didn't think of this when I considered it. Sorry for the mix-up

bogomipz · on March 10, 2017

Its O(n logk) then? It maintain a heap I'm guessing?

lorenzhs · on March 10, 2017

No, as per my previous comment, it t does not use a heap. It's like quicksort, but only doing one of the recursive calls:

- choose a pivot p randomly from the elements

- partition into elements <= p and > p. This takes time O(n).

- if the number of elements <= p is at most k (the rank of the element we want), do a recursive call on these elements. Otherwise recurse on the other elements (those >p).

Because a random pivot splits the elements well enough most of the time, this has an expected running time of O(n): 1 + ½ + ¼ + ⅛ + … < 2 in the best case, and similar constants if the partitioning is a little worse. But in the worst case, when the pivot is the smallest/largest element in each iteration, it's O(n²). That's where the fanciness of introselect comes in: if quickselect doesn't converge, it switches to an algorithm with worst-case O(n) time, which is slower in most cases but saves the day when quickselect has trouble.

username223 · on March 10, 2017

See "median of medians algorithm": https://en.wikipedia.org/wiki/Median_of_medians

It's hairier than a partial quicksort, but (if you erase the proof and weigh the chalk dust) guaranteed linear time.

gpderetta · on March 10, 2017

and median of medians can be used to implement quicksort with O(NlogN) worst case. It is usually not worth it as other hybrid quicksorts like introsort are faster in practice.

On an unrelated note, I was once asked to prove the upper bound for median of median on an interview...

bogomipz · on March 10, 2017

Thanks, right, the idea to is to only recurse down one side on each pass right? This is the (sub)family of "selection" algorithms?

This is interesting I had not heard of introselect before, I will have to read up on this.

lorenzhs · on March 10, 2017

Yup, exactly. It's a fun area that gets even more interesting when you look at distributed systems, where every node only stores a small portion of the overall data and you want to keep communication volume low.

bogomipz · on March 10, 2017

>"It's a fun area that gets even more interesting when you look at distributed systems"

Can you elaborate a little bit, specifically how selection algorithms help reduce "chatter" in distributed systems? I am not familiar with this and this sounds like an interesting context.

I have heard of Merkle Trees for similar but that's obviously hashing and not selection algorithms, or is there some connection I am not making?

lorenzhs · on March 11, 2017

Well if you'd like to select the k-th biggest element from a distributed array it's impractical - if not impossible - to send everything to one machine so another approach is needed. A fairly simple algorithm would select a pivot randomly on a random node and broadcast it to all other nodes. Each node then does the partitioning locally and counts partition sizes. Now you do a sum all-reduction on the sizes to decide which side you need to recurse on. This takes logarithmically many rounds in expectation just like the sequential algorithm, but it could result in very unbalanced work if the data isn't distributed randomly. It also has even more trouble if the pivot selection is bad because of the communication overhead. There are various techniques to overcome these challenges but it's a bit much to go into here.

bogomipz · on March 10, 2017

Is the nth_element() function an implementation of "quick select" then"

askee · on Oct 10, 2016

How good does it work with non-natural images where rows/columns are not similar to each other? White noise for example should prove rather difficult.

askee · on May 22, 2016

Self-hosted owncloud works for me except for the gruesome woes every single time I try to update it.

giancarlostoro · on May 22, 2016

Does OwnCloud support email too? I wasn't aware of this.

askee · on May 22, 2016

No it doesn't, unfortunately. I misunderstood OP's questions at first. You can connect owncloud to a mail server (a la Roundcube) but nothing more.

giancarlostoro · on May 23, 2016

Maybe one day they'll make a plugin that does the same that Roundcube and other PHP email clients are capable of doing, that would at least make it a little smoother to have as an all in one email client.

askee · on May 12, 2016

You would solve e.g.

min_x* ( ||A x* - b|| (+\lamba regularization(x*) )

askee · on May 12, 2016

In general there are no shortcuts, and when the matrix A is not small (e.g. m x n with m,n << 100), you don't want to invert A, especially if you are only interested in x (from A x = b). Instead, use numerical minimization schemes like conjugated gradients and variants thereof or Quasi-Newton methods (BFGS). Combined with preconditioning as well as regularization or denoising this usually yields good results. Compressive sampling methods (total variation regularization) have received lots of attention in the last ten years.

mturmon · on May 12, 2016

This comment seems to be a diversion from the question asked. The main point of OP is that solving A x = b is commonly done with factorizations of A, not by explicit computation of inv(A).

Resorting to a numerical minimization (like MINRES) would be unusual unless the dimension is much bigger than hundreds. Certainly it's not relevant in a graphics computation with dimension of 3, 4, or 6. My laptop inverts a 4Kx4K general matrix in 1.5s. It solves a 4K linear system in 0.5s.

askee · on May 8, 2016

I find the wording somewhat peculiar: 'the resulting address will change dramatically, by (4 - 2^(32+2)). In 32-bit mode (all addresses modulo 2^32), this is just an increase by 4 as usual; the "wraparound" part is a multiple of 2^32 and hence invisible'

Why would it matter that the address will change "dramatically"? Even if it didn't (and only change by say 16 bytes or whatever) you'd have to deal with it. The problem is that int overflows at 2^31 while long/a larger type doesn't. In order to cope with the overflow you'll have to "simulate" an overflow in the larger type as well (or at least do what the smaller type does) which is achieved via a sign-extend.

askee · on Jan 7, 2016

Rasterization also is of highly-parallelizable nature and this parallelism has been exploited to great effect in the last decade(s). Given that Ray Tracing's only advantage over Rasterization is specular reflection it seems unlikely it is really superior in terms of efficiency. Especially since specular reflections and many other effects can be achieved via cheating in Rasterization.

For real GI, Path Tracing, Photon Mapping and so on are the methods of choice and not Ray Tracing.

namuol · on Jan 7, 2016

Global Illumination, Path Tracing, Photon Mapping and Ray Tracing all belong to a category of algorithms that are better suited for multi-core general processors -- not the SIMD hardware of today's GPUs.

dogma1138 · on Jan 7, 2016

GPU's haven't stuck to SIMD parallelism for quite a few years now. NVIDIA is not there yet SMT/DPT isn't true parallelism but it's good enough even for path tracing. AMD GPU's are pretty much truly parallel with every execution unit capable of executing any instruction asynchronously.

elihu · on Jan 8, 2016

Ray tracing has other advantages besides shiny reflections: it can handle refraction, shadows, non-triangle-based geometric primitives (including CSG), and has less sensitivity to the total amount of geometry in a scene.

Global illumination is where we want to go to surpass the realism of current games, but path tracing and photon mapping are both fundamentally based on ray-tracing. Any hardware that makes ray-tracing go faster (especially for incoherent rays) should also help to speed up global illumination.

texelsoup · on Jan 8, 2016

Their hardware actually combines hardware rasterization and raytracing.

Depending on the implementation, raytracing can have many benefits over rasterization, such as: - ability to produce pixel-perfect shadows - sub-linear complexity on the amount of primitives (vs. linear for rasterization) - rays need not be coherent, i.e one can render non-linear projections or lots of small views

Path Tracing also is just another form of Raytracing. They demonstrated that their hardware can be used for it (just read the link).