Perhaps it's obvious to a Fortran programmer that it will be O(1) because Fortran has slicing built in, I dunno, but still, documenting the general space and time complexity would be worth doing. (Plus even if Fortran does slices natively, it's still nice in a library like this to guarantee to your users that you're using it correctly and it won't copy n-1 elements on every call, and to clearly say whether they can safely modify the results. As I assume Fortran is not big on the immutable arrays.)
tail = x(2:)
I wrote a test program to use foldl to calculate the sum of an array, and it seems to have n^2 time complexity. Also a test program summing an array of length 10000 of 64bit integers uses 395816 kbytes of memory, which is close to 8 * 10000^2 / 2 / 1024 = 390625.
While the array slicing in Fortran is O(1), the implementation of tail does make a copy. Though a good compiler should be able to replace tail(x) with x(2:), the Fortran standard does not guarantee it to do so. The first step forward that I see would be to replace tail(x) with x(2:) in the internal implementations of fold[lr].
In any case, the time complexity for these functions is definitely something I plan to document soon.
Tried this for foldl with gfortran. Totally helps. Summing 10 000 int64's, memory consumption drops from 400M to 4M. Also time complexity becomes linear.
There is, or was, a definite problem with recursion in gfortran (when I looked some time ago). You can't (or couldn't, then) get tail call elimination in functions. I'd have hoped GCC would have been able to see through the structure of the assignment/return.
I realize that Fortran compilers have focused on optimizing traditional explicit loops and other imperative programming constructs. Nonetheless, I'd want to see benchmarks before concluding this is a lost cause.
And of course for a lot of code it is better to write correct code quickly than it is to spend a week cutting the running time of a function called twice from 1.2 seconds to 0.9 seconds.
Fortran since f90 has had syntax for and encouraged array level operations rather than looping over arrays. Surely there has been work on improving the performance of vectorised operations?
Interestingly, Fred Chow and the SGI compiler/Open64/Path64 etc. optimizes implicit loops by expanding them into explicit ones, and then making sure that the regular optimizer does a great job on that.
So does GFortran, FWIW. That being said, there is a lot of work to that is done before this lowering pass (as I'm sure you know, but for other readers who might not be as familiar with Fortran and compiler internals). In particular, the Fortran array expression semantics are that the RHS is evaluated before assigning to the LHS. So a simplistic approach would require allocating a temporary array every time. So there needs to be quite a lot of logic to handle various special cases in order to avoid the temporary arrays.
It's interesting that I keep hearing functional techniques referred to as modern despite the fact that many of them date back to the 50s in LISP, in some cases (first class functions) back to the 30s in the Lambda Calculus. Sometimes it really does take a long time for good ideas to spread.
The first approach would be significantly faster to compute, but also require a multiple times more lines of code than the second approach.
It's really unfortunate that modern Fortran has no facilities for generic programming. That's why I'm personally moving more and more to Julia...