
LFortran: Modern interactive LLVM-based Fortran compiler - kergonath
https://lfortran.org/
======
waynecochran
I use C++ Eigen extensively. I wonder if it feasible to write a C++ programs
that hands off blocks of linear algebra code to Fortran subroutines. Eigen
does a wonderful job of late binding and lazy evaluation so that something
like the following is fairly efficient:

    
    
         (L.transpose() * P).diagonal().array().square().mean();
    

which computes the squared average of column by column dot products between
matrices L and P. I assume Fortran, since matrices are native types, does this
kind of thing even better.

Aside: Fortran 77 was the first programming language I learned in 1985 and I
hated programming for awhile after that. We called it the "F" language. Now
that I am a bit older and do a lot of linear algebra coding... I wonder if I
would like it now.

~~~
celrod
I think Eigen's template strategy is probably hard to beat from the
perspective of combining operations. How well Fortran does is probably mostly
up to the compiler implementation. In some of my benchmarks, gfortran
sometimes seems to fuse the operations and end up much slower than if it has
performed them separately in succession, but I wouldn't be surprised if
compiling with `-fexternal-blas` (and linking MKL) would've solved that.

If you want to try external BLAS/LAPACK with Eigen, I'd look at:
[https://eigen.tuxfamily.org/dox/TopicUsingBlasLapack.html](https://eigen.tuxfamily.org/dox/TopicUsingBlasLapack.html)

I have `A * B`, `A * B'`, `A' * B` and `A' * B'` small-single-threaded-matmul
benchmarks here:
[https://chriselrod.github.io/LoopVectorization.jl/latest/exa...](https://chriselrod.github.io/LoopVectorization.jl/latest/examples/matrix_multiplication/)
I compared triple nested loops with Clang, icc, ifort, gfortran, Julia, and
LoopVectorization.jl with matmul routines from ifort, gfortran, OpenBLAS, MKL,
and Eigen.

While gfortran's builtin hit over 40 GFLOPS with `A * B` and `A' * B'`, it
failed to get half that if only one argument was transposed. I'm supposed
awkward fusing at the start of this post because if it had done one after the
other, it should have still hit >40 GFLOPS when only 1 matrix was transposed.

~~~
kergonath
These benchmarks are very interesting! From my experience gfortran’s MATMUL is
very good for small matrices, but OpenBLAS gets better from ~10x10 elements.
Not quite sure that’s the same as your benchmark; its colour code is a bit
confusing. Would you mind making the data available?

Certainly, gfortran’s default implementation works well as a quick and easy
solution for small vectors and matrices outside of hot paths. Also, I remember
discussions about improving matmul(transpose(A),B) somewhat recently. I don’t
remember for which version it was, but it’s the sort of things that is
improved regularly.

I would love an option to align arrays in gfortran. It’s very important to
take advantage of automatic vectorisation but I haven’t found a reliable way
to do it.

~~~
celrod
My color coding is bad (and there's an open issue about it), but I haven't
come up with a great solution. One idea I should add was to make everything
relative to LoopVectorization, but that wouldn't help for those interested in
other comparisons like Eigen vs gfortran.

I'm on gfortran 10.1.1, so I'd only be missing out on very recent improvements
(i.e., to trunk).

Note that these benchmarks were dynamically sized. If you write a fortran
program like

    
    
      real(kind=8), dimension(10,10) :: A, B, C
      ! fill A and B somehow
      C = matmul(A, B)
    

the compiler will take advantage of the known dimensions and inline the call,
making it much faster.

> I would love an option to align arrays in gfortran. It’s very important to
> take advantage of automatic vectorisation but I haven’t found a reliable way
> to do it.

I wish more compilers could also use masks to vectorize without padding. If
multiplying 7x7 matrices with AVX512, the obvious solution is to just mask the
loads/stores of columns, without the need for padding. Of course, padding
would also ensure all your loads/stores are aligned, which can be nice.

~~~
kergonath
I think what the plots need is an option to show or hide each curve. It’s a
rather large increase in complexity compared to just generating images,
though.

------
yodelshady
Modern fortran is surpisingly pleasant to write.

The last time optimisation came up, I tried the author's problem - multiplying
two 4096x4096 matrices - in numpy, FORTRAN and Rust, on a standard laptop.
Supposedly it took 9 hours in naive Python (I didn't try). Results were:

Python3 (numpy 1.18.4) 1.3s FORTRAN (gfortran 9.3.0) 6.0s Rust (1.43.1, debug)
>60s Rust (1.43.1, release) 4.0s

What surprised me is that python and fortran solutions were written in
minutes. The Rust solution took _hours_ and multiple forum posts for help.
There's no obvious way, and _every single way_ I tried had "gotchas" and
utterly astonishing behaviour in it, particularly with simple, statically-
allocated arrays.

Fortran could do with proper namespaces, and properly dropping some of its
older conventions, though. Reading it feels like archaeology.

~~~
MaxBarraclough
So to be clear, numpy gives you a highly optimised matrix multiplication
algorithm, and your FORTRAN solution was just something you threw together
yourself, right? Little surprise that numpy roundly outperforms it, especially
if it's using a different algorithm, such as Strassen rather than naive matrix
multiplication. [0]

> every single way I tried had "gotchas" and utterly astonishing behaviour in
> it

As someone who doesn't know much about Rust (or FORTRAN for that matter), what
kinds of gotchas? I'd expect numeric code like this to be quite
straightforward, as it presumably doesn't lean heavily on memory-management
cleverness in the language, or anything like that.

[0]
[https://en.wikipedia.org/wiki/Strassen_algorithm](https://en.wikipedia.org/wiki/Strassen_algorithm)

 _edit_ For clarity, I made a table of your data. Seems surprising that Rust
outperformed FORTRAN.

    
    
        ┌──────────┬─────────────────┬──────┐
        │ Language │     Detail      │ Time │
        ├──────────┼─────────────────┼──────┤
        │ Python3  │ numpy 1.18.4    │ 1.3s │
        │ FORTRAN  │ gfortran 9.3.0  │ 6.0s │
        │ Rust     │ 1.43.1, debug   │ >60s │
        │ Rust     │ 1.43.1, release │ 4.0s │
        └──────────┴─────────────────┴──────┘

~~~
Someone
The main reason for numpy’s speed is that it uses highly optimized code
_written in another language_ (for matrix multiplications, quite possibly
FORTRAN).

~~~
MaxBarraclough
Yes, of course numpy's heavy-lifting code isn't written in Python.

My point was that in comparing numpy to yodelshady's FORTRAN code, we're
probably just comparing two different FORTRAN implementations, one much better
than the other (perhaps using a different algorithm, perhaps just better
optimised).

------
pantalaimon
Is this related to flang?

[https://github.com/llvm/llvm-
project/tree/master/flang/](https://github.com/llvm/llvm-
project/tree/master/flang/)

~~~
certik
LFortran author here. As kergonath said below, both "new" Flang that you
linked above, and LFortran started at about the same time. I know the Flang
developers well and I am hoping we'll be able to collaborate in the future
more. It looks like MLIR would be one option: they are going to use it for
Flang and LFortran can have a backend that could also use it.

LFortran's main advantage is that it is interactive, so the parser and
semantics has to be a little different. And also we designed it so that it
will be quite approachable for people to write tools on top.

~~~
C1sc0cat
Interesting to see an interactive Fortran as I started with Fortran in my
first Job - on my first day I was told there is a book in the company library
go and learn Fortran

For the really big Fortran systems I worked on in the 1980's (Map Reduce) the
edit compile / link process could take several minutes for a single module -
we did have a build system written in JCL to help automate this as well.

------
mootzville
Nice, I want to play with this, but I see on the Development page there is
no/limited support for arrays?

~~~
certik
LFortran author here. There is a Python prototype version that you can see in
the notebook and it has very limited support for arrays. I stopped spending
more time on it once I got further enough to validate that the idea works. And
I am now spending all my time on the production LFortran implementation in
C++. I am very close to make it usable for some very simple things, and then
we'll go from there.

------
dang
If curious see also

2019
[https://news.ycombinator.com/item?id=19795262](https://news.ycombinator.com/item?id=19795262)

[https://news.ycombinator.com/item?id=19788526](https://news.ycombinator.com/item?id=19788526)

~~~
kergonath
These are very interesting discussions. I don’t know how I managed to miss
them when I submitted...

~~~
dang
The cutoff point for reposts is about a year, so it's good that you didn't see
them and then not post.

[https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html)

------
baggy_trough
Not much of a Fortran yet without arrays, complex numbers, or strings.

~~~
dm319
fortran - one of only a handful of languages that natively supports
multidimensional arrays, maybe the only low level one?

~~~
CyberDildonics
That's ridiculous, you can make a C++ class and overload operator() like Eigen
to easily make a multi-dimensional array.

~~~
baggy_trough
"natively"

~~~
CyberDildonics
What is different to fortran's multi-dimensional arrays? The interface can be
almost identical and the memory layout in C++ can be whatever you want it to
be.

------
sangfroid_bio
Looking forward to differentiable programming in Fortran.

~~~
dnautics
I believe that's called "Julia".

~~~
ChrisRackauckas
There's a library for Julia coming out soon that can differentiate Fortran
code. So, while tongue and cheek, it's not that far off!

------
rarepostinlurkr
Support for arm?

------
dng88
Wish lisp have these like jupyter integration.

And my first computer lecture many decades ago is really about how he hated
fortran and why FORTRAN is still not dead (Mainly negative view due to
haveGOTO that time). And it is still not dead.

~~~
pjmlp
Lisps were at the genesis of the original concept, it was called Lisp Machine,
and the experience can still be replicated when using commercial Common Lisps
like Allegro.

Fortran has been continuously updated, supports OOP, modules, and even
generics.

~~~
waltpad
Sadly, AFAIK, BLAS hasn't been updated to use generics. It would be so nice to
have support for integer vector/matrix operations in there (not that it would
require generics to have that, but it could be easier to implement integer
support with it, though I suspect that for efficiency reason it might not be
used in the end).

