It looks like this was done at global scope. It's important to wrap things in a function for maximum performance - AFAIK, because method dispatch and precompilation happen at the function level, functions are much much faster than not functions.
Julia also stores its arrays column major, so put the outer loop over columns for another order of magnitude performance increase. (I'm not sure I love this feature, but presumably there's a reason...maybe for better performance on matrix operations?) My reply to your grandparent comment has a simple implementation that runs in 0.4 seconds on an aging machine.
Julia also stores its arrays column major, so put the outer loop over columns for another order of magnitude performance increase. (I'm not sure I love this feature, but presumably there's a reason...maybe for better performance on matrix operations?) My reply to your grandparent comment has a simple implementation that runs in 0.4 seconds on an aging machine.