
Fast high-level programming languages - stanislavb
https://lh3.github.io/2020/05/17/fast-high-level-programming-languages
======
chadash
> "I don’t see Julia a good replacement of Python. Julia has a long startup
> time. When you use a large package like Bio.jl, Julia may take 30 seconds to
> compile the code, longer than the actual running time of your scripts."

I think this is the crux of it. Python may not be the fastest language out
there, but it's definitely one of the fastest languages to develop with, which
is why it's such a hit with the academic research crowd.

Static typing and compilation are great features for many use cases, but if
you are just prototyping something (and most academics are _never_ going to
write "production" code), it's nice to be able to try something, test it and
then iterate and try again. When you have to compile, that iteration loop
takes longer.

A few people here mentioned GoLang as a good alternative. Personally, I love
the language, but it's really not very good for quick prototyping. As an
example, the code won't compile with unused variables. This is great for
production code but very impractical when you are testing some new algorithm.

~~~
idoubtit
A "prototype" software is sometimes a nice way to name some junk code. When I
read that most academics are never going to write production code, they're
prototyping, they need something to develop fast... it looks like most
academics value fast writing more than correctness, which is not my
experience.

As an anecdote, a friend of mine is an applied mathematician specializing in
PDE and fluid dynamics. He used to code with C++, because that was what was
used in the labs where he worked. Then he discovered Python and he thought it
was so much easier, and still quite fast thanks to optimized libraries
(written in C++ with a Python interface). But after a few months his
enthusiasm had disappeared because of the runtime errors of his Python code.
He didn't want to go back to full C++, though, and he still codes mostly with
Python.

~~~
chadash
> "He didn't want to go back to full C++, though, and he still codes mostly
> with Python."

Exactly. It seems like he understands the tradeoffs and prefers quick
iteration over "correctness".

In my experience, for computational "academic" code, the hard part is often
coming up with the correct algorithm, not implementing that algorithm
correctly. In software engineering, it's often the opposite.

------
kthielen
Shameless plug, I made hobbes and used it in high volume trading systems at
Morgan Stanley:

[https://github.com/Morgan-Stanley/hobbes](https://github.com/Morgan-
Stanley/hobbes)

It’s kind of a structurally-typed variant of Haskell, integrates closely with
C++, produces very fast code.

~~~
oddthink
That looks super fun! Are they still using this? I was at MS a few years
before that, still dealing with a few legacy things in A+. There were hopes
that we'd switch to something like F#, but we ended up just re-doing most of
the mortgage analytics in kdb/q, which was fun in its own way.

~~~
kthielen
Yep still used (pre and post trade, processing millions of orders and billions
of market data events per day).

I'd like to grow it into a successor to kdb, though kdb is very entrenched in
finance and the company is extremely litigious (they threatened to sue me).

F# is great too, and all of the other ML variants. We'll get there eventually,
it's inevitable, but there's a lot of institutional inertia.

------
ibiza
Note, Heng Li[0] is a significant figure in Bioinformatics software. Most
notably, he is the author of BWA[1] (Burrows-Wheeler Alignment). BWA performs
a large percent of all sequence alignments worldwide.

[0] [http://www.liheng.org](http://www.liheng.org)

[1] [https://github.com/lh3/bwa](https://github.com/lh3/bwa)

------
wh-uws
We've been using Crystal on an open source project I work on.

[https://github.com/cncf/cnf-conformance/](https://github.com/cncf/cnf-
conformance/)

Its been a dream.

You can pretty much write the ruby you are used to but it performs like a
compiled language and you can make binaries.

Was SUPER skeptical at first when the team decided on it but its been a
pleasant surprise.

------
mindB
The computer language benchmarks game[1][2][3] may be of interest here. It
benchmarks C, Python, Javascript, and Julia, in several tasks involving FASTA
input (regex-redux, k-nucleotide, and reverse-complement), but implementations
are bespoke rather than relying on libraries. Relative timing is much more
favorable for Julia in these benchmarks. Looks worse for Python outside regex-
redux.

[1] [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/performance/knucleotide.html)

[2] [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/performance/revcomp.html)

[3] [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/performance/regexredux.html)

~~~
anonymoushn
I recently spent some time in the Lua section of the benchmarks game. It is a
sad place for a few reasons:

\- Lua programs cannot use shared memory concurrency or subprocesses with
2-way communication with the master process.

\- Lua programs run on a very slow runtime compared to the fastest Lua
runtime.

My impression after this is that for languages that aren't super fast and
don't include all the primitives one could want, benchmarks like reverse-
compliment are mainly measuring whether the language's standard library
includes some C function that does the bulk of the work.

~~~
igouy
> … shared memory concurrency or subprocesses with 2-way communication with
> the master proc

Isn't that the same situation as Python, Perl, PHP, Ruby… except for those
languages, programmers _have_ converted the programs to use multicore ?

~~~
anonymoushn
No, for example Python subprocess lets you talk to the subprocess on stdin and
stdout. Lua popen only gives you one of those two things.

This means that the proposed work sharing model at [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/description/fannkuchredux.html) cannot be
used, because it requires workers to submit results to be aggregated and
accept new chunks of work.

~~~
igouy
Does it mean that none of the other Lua programs could be written to use
multicore? spectral-norm?

~~~
anonymoushn
No, it does not mean that. I'll submit my multicore pfannkuchen-redux and
reverse-compliment when I get around to it, and look at other problems after
that.

The pfannkuchen-redux is just a bit hampered by uneven work sharing.

For reverse-compliment, it's a bit more trouble to work around the lack of
2-way communication. My implementation writes the entire input to stdout, then
workers use fseek on stdout, which only works if you are piping the output of
the command to a file. That is, it generates correct output if you run "lua
blah.lua > out" but not if you run "lua blah.lua | cat > out" Additionally,
since there's no pwrite and no way of getting a new open file description for
stdout, I must cobble together a mutual exclusion mechanism to prevent workers
from seeking while another worker tries to write.

~~~
igouy
Just curious, hasn't multicore been enough of an opportunity for this to be
addressed by the Lua community?

~~~
anonymoushn
A design objective of PUC-Rio Lua is to be pure ANSI C. I'm not certain, but
my impression is that this imposes some unreasonable restrictions on the
implementation. An additional design objective is to be small.

I think people don't usually write Lua programs intending to run them inside
the binary you get when you build PUC-Rio Lua without any additional C
libraries. Libraries like LPeg and lua-gumbo are Lua wrappers around C code.
For C libraries that do not have Lua wrappers, people can more or less paste
the preprocessed C header file into their Lua source file and use Luajit's FFI
to use the library. This last approach is similar to how the Python regex
program mentioned elsewhere in these comments works :). It's also common to
use frameworks like Openresty or Love2d that provide the innards of some
complex threaded program to user Lua code.

Outside of benchmark games and work, I'm working on some code that uses
threads and channels, but the threads and channels are provided by liblove.

So I guess I can say, it has been addressed, but it won't be addressed in the
standard library.

------
hpcjoe
Looking through the julia code, I saw quite a bit of memory management, which
did surprise me. Generally with Julia, you want to allocate memory, once, to
avoid penalties of reallocating memory and gc.

I pulled the code down, placed the data on a ramdisk so as to not impact
benchmark measurements, with physical issues.

I built the C code, and ran the two julia codes. My timing looked like this:

version t(raw) t(gz)

c1 1.47s 8.31s

jl1 3.80s 15.82s

jl2x 5.85s 17.86s

py1 fails

py2 6.92s 29.62s

I don't have lua, nim, or crystal on my machine. This is Julia 1.4.1 BTW.
Running Linux Mint 19.3 on my laptop, 5.3.0-51-generic kernel.

Beyond putting the data and compressing it with pigz for the compressed
version, no optimizations were done. Putting the data on ramdisk optimizes
_all_ codes.

My thoughts: The author noted that Julia has long startup/run times. This is
true, for the first compilation of modules you use. As a reflex these days I
(and I am guessing most Julia users) do a "using $MODULE" after adding it.
This makes the startup times less painful, for most modules used. Plotting,
with the Plots module, is still a problem, though it has gotten dramatically
better over time.

Basically, if you run your code more than once, with the modules being
compiled into your cache, the nice part is that startup time is significantly
better. Such that Python reverts to its lower performance than Julia. If the
startup time on first run is important (think of it like a PyPy compilation
step along with a run of the code), and you'll only ever run a code once, and
for the less than 1/2 minute that it will take for this example, use whatever
it is you are comfortable with.

FWIW, the author noted, with implied disdain, that Julia users are telling
them that they are "holding the phone wrong." Looking over the code,
specifically all the memory allocation bits, I could see that. Basically, I'm
not sure how much, if any of that, is actually needed.

That said, very limited critique of the tests. I like to see "real world"
examples of use. Kudos to the author for sharing!

[edited to "fix" table ... not sure how to do real tables here]

------
charlesdaniels
I would be curious to see what the performance is like in Go. I've been
experimenting with it a bit lately, and coming from a mostly C and Python
background, I have thus far found it easy to pick up. I haven't tested the
performance for my own use cases very much yet, but I am told it compares very
well.

I think the real challenge for scientific computing (I'm a graduate student,
so this is most of the programming I do) is that there is already a huge
network effect around NumPy + SciPy + matplotlib and friends. Golang just
doesn't quite have the library ecosystem yet, although gonum[0] shows some
potential.

In my limited experience so far, I think Go is good in an environment where
most people are going to have experience with C and/or Python. It also makes
it much harder to write truly crappy code, and it's much easier to get a Go
codebase to build on n different people's workstations than C.

Having written a _lot_ of Python, and relatively little Go, I think I would
prefer to write scientific code in Go if the libraries are available for
whatever I'm trying to do.

It's also much easier to integrate Go and C code, compared to integrating C
and Python.

0 - [https://www.gonum.org/](https://www.gonum.org/)

~~~
orbifold
Personally I think numerical code should still mostly be written in C++, right
now it still has by far the widest choice of options for doing so. It is also
relatively easy to interface with python. For example both xtensor,
libtorch/ATen, arrayfire have straightforward python interoperability via
pybind11.

Finally no other language except for maybe FORTRAN has seamless
parallelisation support and first class low level numerical primitives
developed by vendors. Sometimes you will get a massive performance increase by
#pragma omp parallel for.

Even for visualization some python libraries will suddenly fall off a cliff
(Altair) once you reach a moderately large number of datapoints.

~~~
charlesdaniels
I would definitely agree that it depends on what kind of scientific computing
you are doing.

For big numerical stuff and things that need to run on supercomputers,
C/C++/FORTRAN are definitely very relevant and I don't see that changing.
Likewise for edge stuff that has to run on bare metal or embedded, I think
we're still going to be using C/C++ for a long time to come.

"Scientific computing" is a huge range of different use cases with very
different levels of numerical intensity and amounts of data. I doubt very much
that there would ever be a one-size-fits-all approach.

However in the context of the OP, I'm arguing that Go would be preferable to
Python for the purpose of writing bioinformatics models, and certainly more
suitable than Lua or JavaScript.

Of course Python can sometimes be very performant if you leverage NumPy/SciPy,
since those are ultimately bindings into the FORTRAN numeric computing
libraries of yore. But if we're talking about writing the inner loop, and the
choices are Go, Python, Lua, and JavaScript, I think Go is going to win that
on the performance and interoperability fronts handily (I omit Crystal, as I
am not familiar with it).

~~~
orbifold
Even in the context of bioinformatics my comment applies. With modern C++
libraries you are able to replicate the numpy user experience almost line by
line. A baseline FASTQ parser in modern C++ would look nothing like the
fastq.h C parser the author presented. Naive versions of sequence alignment
algorithms like Needleman-Wunsch are easily implemented in C++ aswell and you
can even do most of your development in a Jupyter notebook with cling/xeus.

~~~
charlesdaniels
I'll take your word for it; I haven't worked in that field.

I do still think it would interesting to see a comparative benchmark though. I
know the Go compiler tries to use AVX and friends where available. I doubt it
will ever beat a competent programmer using OpenMP to vectorize though (though
Goroutnes might be competitive for plain multithreading).

A relevant consideration too -- OpenMP seems to be moving in the direction of
supporting various kinds of accelerators in addition to CPUs*, so your C++
code has a better change of being performance-portable to an accelerator if
you need it.

~~~
friday99
Note the go compiler is actually fairly immature compared to C++ compilers. It
does not have any AVX autovectorization. Any AVX optimization is manual
assembly by a library author.

------
zwaps
This seems to be a very specific benchmark. The second Julia benchmark is
based on a truly obscure lib that has not been updated for two years?

~~~
jryb
Should libraries like these need constant updates though? What change are they
adapting to?

~~~
zwaps
This does not seem to be a "real" library. It has a github, but no readme. It
seems to be a julia implementation of some other c library, probably
experimental.

------
lmilcin
I know many people don't consider Common Lisp as particularly fast language,
but actually in my experience you can make it quite fast depending on what you
need.

When using SBCL, for example, you get your application to be compiled to
native instruction set. Moreover, you get control over optimization level for
each piece of the code separately. Even more, you get complete control over
the resulting native assembly. Something you don't get to do with other higher
level languages.

One of the production applications I did in Common Lisp was parsing a stream
of XDP messages
([https://www.nyse.com/publicdocs/nyse/data/XDP_Common_Client_...](https://www.nyse.com/publicdocs/nyse/data/XDP_Common_Client_Specification_v2.0.pdf))
with a requirement for very low latency. I made the parser in Common Lisp so
that it generates optimal binary code from XML specification of message types,
fields, field types, etc. using a bunch of macros.

The goal of application was to proxy messages to the actual client of the
stream. The proxy was there so that it was possible to introduce changes to
the stream in real time without requiring to restart any components. Using
REPL I was able to "deploy" any arbitrary transformation on the messages. The
actual client of the messages was a black box application that we had no
control over but we sometimes had problems with when it received something it
did not like.

I liked Common Lisp in particular because it does not force you to make your
performance decision upfront. You can develop your application using very high
level constructs and then you get the option to focus on parts that are
critical for your performance. Macros allow you to present DSL to your
application but then have full control over code that is actually working
beneath the DSL.

If everything fails, calling C code is a breeze in Common Lisp compared to
other languages.

~~~
logicchains
I explored Common Lisp for high performance code once (I work in HFT), the
biggest issue was that it didn't have native support for "arrays of structs"
(where all the structs would be values stored "unboxed" adjacent in memory). I
know it'd probably be possible to write a library to do that, but that'd be a
huge amount of work compared to just using a language with existing support
for unboxed arrays.

~~~
lmilcin
Well, I did some algorithmic trading (not really HFT but still monitoring the
market and responding within 5 microseconds of incoming messages).

I would not use Common Lisp on the critical path because I would end up
basically rewriting everything to ensure I have control over what is happening
so that some kind of lazy logic does not suddenly interrupt the flow to do
some lazy thing.

A large part of the application was basically about controlling the memory
layout, cache usage, messaging between different cores, ensuring branch
predictor is happy, etc which would be really awkward in Common Lisp
(technically possible, practically you would have to redo almost everything).
We have also experimented with Java with the end result being that the code
looked like C but was much more awkward.

I have, however, successfully used Common Lisp to build the higher layer of
the application that was orchestrating a bunch of compiled C code and also did
things like optimizing and compiling decision trees to machine code or giving
us REPL to interact with the application during trading session.

~~~
logicchains
>giving us REPL to interact with the application during trading session

This to me is what the big appeal of Common Lisp for a trading system could
be, particularly if it allowed live recovery from errors (dropping somebody
into the debugger, rather than just core-dumping), which could save a lot of
money by reducing downtime. But as you say it would require redoing everything
to make the code fit latency constraints and be cache friendly, which would be
a lot of work.

------
enricozb
Would love to see an FP language on there, preferably OCaml

~~~
dunefox
F# might be interesting as well.

------
ryan-allen
Crystal is really exciting, it somehow 'fell under the radar' but when I heard
'fast, and ruby-like with types' I became very excited.

There's even a Rails-like [0] framework being developed!

Last weekend I spend a couple of hours getting Lucky up and running (it took
some doing, I had to borrow a lot from people's Docker images to get it booted
and working).

It's a big 'watch this space' situation. The macro system for metaprogramming
is very easy to understand in Crystal [1] as well.

Can't wait for 1.0!

[0] [https://luckyframework.org/](https://luckyframework.org/) [1]
[https://crystal-lang.org/](https://crystal-lang.org/)

~~~
etherio
I'd recommend you check out
[https://amberframework.org](https://amberframework.org) too. I also like
Crystal but Lucky kind of turned me off considering the steep learning curve
coming from Rails whereas amber is much more similar.

------
outlace
I think it would be much better if someone asked a group of experts for each
language to come up with their best implementation for the tasks. If I just
spent a few hours implementing a benchmark in a handful of languages I wasn't
proficient with then I think the benchmark rankings would be nearly random or
merely dependent on my priors.

------
georgeg
Would be nice to see your take on D-lang
[https://dlang.org/](https://dlang.org/)

------
carapace
I wonder how Common Lisp compiled to native code would stack up?

~~~
dunefox
Probably really well.

------
vmchale
> A good high-level high-performance programming language would be a blessing
> to the field of bioinformatics. It could extend the reach of biologists,
> shorten the development time for experienced programmers and save the
> running time of numerous python scripts by many folds.

The author might like to look at J for specific calculations (and Futhark for
similar tasks).

I use Haskell, it's not quite on par with C though. Other MLs are wonderful
too.

------
RcouF1uZ4gsC
One option may be C++17. With the right libraries, C++ code can be very high-
level.

Perhaps a well-written bio informatics library would be a nice solution.

~~~
MiroF
I think C++ is a quite good language in its modern form that gets such a bad
rap.

~~~
nerdponx
Where does one go to learn modern C++? Is there a "JavaScript: The Good Parts"
for C++?

~~~
steerablesafe
Stroustrup's A tour of C++ is a good start (2nd ed)

[http://www.stroustrup.com/tour2.html](http://www.stroustrup.com/tour2.html)

------
laserson
I always thought Go might be a good candidate for some of these tasks.

~~~
sk0g
There's a go file or two in the repository, but no mention of it in the
article.

I would make a pull request, but their setup uses 1TB of RAM...

------
maxk42
Interesting and roughly in-line with what I've read elsewhere. One must note,
however, that performance on a couple of benchmarks does not generalize to all
problems. Still - perfect is the enemy of good, and it may be that when
looking for a high-level language one might have to settle for "good enough."

Anyway - good job!

------
exabyte
Forgive my lack of specificity, as I am quite new to Julia, however, I believe
that after the initial compile takes place, you do experience significantly
faster runtimes, so you can "forgive" the one time delay.

I think you can also have VS Code just load up the packages you expect to use
on a regular basis at startup.

~~~
zwaps
He did take out the start-up time.

but I agree. If you complain that compile takes 11 seconds when the program
runs in 30, then I wonder if that's really the use-case where you need every
last bit of performance.

Now, on a program that runs for two days straight in Python or Matlab, and
Julia reduces that time by half, I can deal with a bit of compile time.

------
leephillips
The author got good results from the language that he says he’s proficient in,
and bad results from those he says he’s a beginner with. (Read some comments
here below explaining his Julia mistakes.) He ran each benchmark one time
only—not a very good methodology.

~~~
hawski
Yeah, because his benchmark is also his ability to quickly write efficient
code.

> I am equally new to Julia, Nim and Crystal.

He might be more familiar with a certain style or paradigm, but he wants to
switch from Python to something significantly faster jumping through as little
hoops as possible.

------
gintery
This is sort of interesting but isn't informative without any idea as to why
some languages are slower. What is the difference between the LLVM-IR/assembly
produced by C and the one produced by Crystal/Nim/Julia?

~~~
filleduchaos
Compilers aren't different from pretty much every other kind of software, for
which it's generally painfully obvious that two programs that target the same
data _format_ will not necessarily output the exact same _data_, especially
when they have wildly differing end goals.

I don't think every article discussing benchmarks has to restate that the
differences between programming languages are not just syntactical to be
informative.

~~~
gintery
I'm not sure I follow. What I meant is that a benchmark over a specific task
is not really informative without a comparison of why it is slower in some
languages compare to other.

------
SeeTheTruth
I wonder what the author would think of F Sharp.

~~~
blast
Common Lisp is another obvious candidate. SBCL, for example, is known for
performance.

~~~
Rochus
I would rather vote for CLASP for which there are many applications in
chemistry and molecular biology, see [https://github.com/clasp-
developers/clasp](https://github.com/clasp-developers/clasp)

------
aduitsis
Very interesting. If I understand correctly, this article is mainly focused on
bioinformatics and performance of high level languages in that area.

I wonder why BioPerl ([https://bioperl.org/](https://bioperl.org/),
[https://en.wikipedia.org/wiki/BioPerl](https://en.wikipedia.org/wiki/BioPerl))
has not been included, it would have been interesting. Perl is considered
surprisingly fast for certain classes of tasks (for an interpreted language of
course, no point in comparing to C for example).

My info could be outdated of course.

~~~
DougWebb
Perl isn't really an interpreted language. It's compiled to a sort of bytecode
when you run the program, and executed on a sort of virtual machine. Similar
to Java or C#, but much faster to compile. So fast that most people don't even
know it's happening.

As for performance, it really helps if you write idiomatic Perl code. There
may be more than one way to do it, but some ways are better. For example, an
explicit for loop over a list has to be compiled to byte code that steps
through the looping, but if you use map or grep the byte code calls a pre-
written optimized function that loops as fast as code written in C would be.
The more idiomatic your code is, the more optimized calls like that you get,
and the faster your code will run.

~~~
nerdponx
_Perl isn 't really an interpreted language. It's compiled to a sort of
bytecode when you run the program, and executed on a sort of virtual machine._

Same is true for both CPython and certain sections of R code.

------
TeMPOraL
> _Nim supporters advised me to run a profiler. I am not sure biologists would
> enjoy that._

We need better profilers! Ones that don't require anything more complicated to
use than passing a --profile-me parameter to the interpreter/compiler, and
whose output can just be dragged into a pretty, user-friendly and fast
application (included with the language runtime), and where reported results
are both trustworthy and correspond to actual locations in the source code.

------
anonymoushn
The Luajit implementations should perhaps use FFI C structs, especially if the
main issue for one of them is the lack of arrays of structs (other than FFI C
structs)

------
DonaldPShimoda
It's perhaps worth pointing out that programming _languages_ are not
themselves fast or slow; it is the programming language _implementations_ that
are fast or slow. (This is a particularly sore point in the programming
languages research community.)

~~~
TylerE
There are certainly languages that it is essentially impossible to implement
to run fast.

Like, one can certainly write a slow C... but writing a fast Python (real
Python, I mean, with full reflection, run-time introspection, etc..) would
be...near impossible.

Yes, I know there are projects that make large subsets of Python run fast, but
it's that last 5% that kills you.

~~~
nerdponx
FWIW PyPy is signfificantly faster than CPython in a lot of tasks.

~~~
marcosdumay
PyPy doesn't support a lot of CPython's reflection and introspection.

to see the problem with Python, take this loop as an example:

    
    
        x = 0
        for i in range(0,1000):
            x = x + i
    

There is no way, looking only at this part of the code to know if the `+`
operator does the same operation every time it is executed.

~~~
nerdponx
Wouldn't that require some code to redefine `int.__add__` or `int.__radd__`
between iterations of the loop? Which I would file under "bizarre shit that
shouldn't normally happen." Before the loop starts, you'd have to override
`int.__add__` to modify itself every time it's called, or something crazy.

If we're talking about custom classes and not ints, maybe it's a bigger
problem. But if PyPy doesn't allow the required introspection to make this
work, how does it run anything at all?

------
pansa2
If the goal here is maximum performance, that's going to require an AOT-
compiled language, which implies static typing. Is that acceptable to
scientists used to writing Python and R scripts?

~~~
jakobnissen
Bioinformatician here. It's certainly a drawback. There is no quesiton that
Python etc. are more expressive and quick to write in than C are. Some
bioinformaticians swear to static languages for reasons other than speed (e.g.
static type checking).

There is _always_ a need for scripting languages. Not just for speed of
development, but for interactive data manipulation and visualization. Static
languages are a no-go in that regard.

My bet in on Julia. Although slower than C by itself, I think in practice,
Julia code written by bioinformaticians will be faster than C code wrapped in
Python, or C libraries used in inefficient workflows from the shell. That's
certianly been my experience so far.

~~~
thu2111
Why not take a look at Kotlin? High level, statically typed, fast iteration,
good runtime performance, great support for parallelism (via Java's APIs for
it).

~~~
jakobnissen
I very well might! As I said, I really do need an interactive language for
some (most?) tasks, but it would be nice to complement it with a static
language. I've been drawn to Rust since it seems to be enjoyable and less full
of arcane obscurities than C/C++, but I'll take a look at Kotlin.

~~~
thu2111
There's a Kotlin REPL and IntelliJ has this feature called 'workspaces' that's
meant for interactive use. But it's not the primary focus of the team, for
sure.

------
hopia
What an odd bunch of languages to benchmark.

Why wasn't Go, Haskell, Java or OCaml even considered here? Seems much more
appropriate than the likes of Javascript for scientific computations.

~~~
dunefox
The inclusion of javascript makes me want to dismiss this whole article. If
that's a serious candidate then I know he didn't look very thoroughly. But no,
go would be just as horrible.

------
madhadron
I kind of wonder if modern FORTRAN would be the right option... Or pushing
towards something like Oberon or a Pascal dialect.

~~~
neutronicus
Biologists are text-mungers extraordinaire so I don't think Fortran will ever
be right for them

~~~
madhadron
Treating sequences as strings of characters is a peculiarity, but it really
isn't much like text. The operations you would expect to perform on text like
uppercase, lowercase, tokenize at word boundaries, etc. are irrelevant. The
operations that you perform on sequences like reverse, reverse complement,
align, edit distance are separate. No reason why FORTRAN couldn't do great for
that.

~~~
coldcode
My first ever professional program was a source code formatter (for Jovial
language) written in Fortran (77 I think). Writing a parser in Fortran (early
80's) is an exercise I am glad I never had to repeat.

------
bjonnh
I wonder how the jvm (be it with kotlin, java,...) would behave on these
algorithms.

------
TheRealKing
There has been one around for the past 7 decades and still evolving: Fortran

------
einpoklum
A post comparing programming languages which makes a reference to "C/C++" is
already problematic IMHO.

------
sho
Hard to install crystal? It's

    
    
      brew install crystal

~~~
filleduchaos
This might come as something of a shock, but not everybody uses or wants to
use Homebrew.

~~~
sho
It doesn't come as a shock but more of a sigh - there's always one. 90% of
devs I know use macs. Of them, basically 100% use homebrew. So for 90% of
developers, installing crystal is as easy as I said. So the criticism that
it's hard to install is, for the most part, invalid.

Of course the build-everything-from-scratch gentoo linux crowd is going to
have a harder time but isn't that part of the masochistic appeal?

~~~
filleduchaos
I mean, all well and good if the only package manager you've ever heard of on
macOS is Homebrew, but there are others _and_ Homebrew is of questionable
enough quality/has made enough questionable decisions (especially with the
last major upgrade) that many people are justified in abandoning or not using
it in the first place.

And that's beside the fact that 1) outside of your bubble, more devs use
Windows than any other OS, 2) the person who wrote this article isn't even a
software engineer, and 3) the tests weren't even run on macOS.

------
ausbah
has anyone used Kotlin for similar purposes?

~~~
jghn
The author has a lot of familiarity with JVM based tooling for the operations
he describes in the blog. I'm not aware of a Kotlin implementation but have
seen him commenting on both Java and Scala implementations over the years. My
assumption is the performance would be similar to those.

------
zozbot234
It's so sad that OP bothered to benchmark Nim and Crystal, but no Rust.

~~~
stolen_biscuit
Is Rust really high level?

~~~
eximius
I would qualify a language as high level if you're not expected to deal with
pointers in 'normal' use. Yes, Rust _has_ pointers, but using them is NOT
normal.

C/C++/C#/D, etc are not high level by this criteria.

There are also of course degrees to all this. Rust is lower level than a lot
of languages by virtue of having native pointer uses, even if it's frowned on.

There's also something to be said for language features, but I can't quite put
my finger on it.

~~~
catblast
The difference between pointers and references can easily become muddled. If
you have a language with pointers that are non-nullable, type- and memory-
safe, is that not high-level?

I think a good real-world example that exposes the problem in your definition
is Go. Go has pointers, using them _is_ normal. However, go does not pointer
arithmetic and outside of unsafe (like Rust) they are memory safe. I consider
golang to be higher level than C/C++ for this reason, and many others (GC,
channels, defer, etc, etc) -- I'd also consider it lower-level because of its
non-answer/cop-out to error-handling.

But what is special about pointers compared to references? If you have a
language with pointers that are type-safe and memory-safe how is this
distinctive?

~~~
eximius
I mean, even Python has references to variables. You can't escape references
(as opposed to values).

The difference is one of model. Pointers are an exposure of the underlying
computer architecture. Whereas references are more of a property of common
language design. In theory, you could not have pointers but still have
references.

~~~
catblast
> Pointers are an exposure of the underlying computer architecture. Whereas
> references are more of a property of common language design.

Sorry, I just don't follow. How are the pointers in Go more exposing of the
underlying architecture than a reference? (I'm using Go as an example to make
it concrete, but any language with similar properties will do).

The syntax and some of the semantics of assignment and rebinding are different
between say go pointers and python references, but that's the point I'm
contesting, I don't see how one is necessarily higher level than the other. If
you put automatic memory management, null pointer checks, removal of any
"undefined behavior", pointers aren't necessarily low-level. It wasn't the
pointer, it was the memory safety.

I personally think that once you tease it out that it becomes a semantics
argument that unfortunately doesn't shed much light on what is "high-level".

------
lostmsu
He complains about compilation time, and does not try neither C# nor Java?

The choice of languages for this benchmark is laughable.

~~~
jmchuster
I guess such languages, which are defaults in the realm of web development,
don't even come up for consideration in the realm of biology?

~~~
grawprog
Trying to get biologists to care about computers is like telling someone to be
excited that they need to get all their wisdom teeth pulled and have a cavity
filled on the same day. I was pretty much the oddity throughout all my
schooling and any of the biologists I met while working that actually knew
much of anything about computers and programming beyond what needed to be done
to write reports and enter data.

Most biologists I met were far more comfortable in the pissing rain, up the
mountain, in the middle of nowhere collecting animal shit than in front of a
keyboard.

If there were a Venn diagram of biologists and computer and technology
enthusiasts the overlap would need a micrometer to be read.

Disclaimer: Please take this extremely generalized and likely offensive to one
of those small few in that tiny overlap I mentioned, statement with a grain of
salt. Please don't take this too seriously, it's just from my own narrow
sampling of people i've interacted with which is may or may not be
representative of the overall population.

~~~
oefrha
You really need to learn more about bioinformaticians’ contributions to
computing before making an ignorant and knowingly offensive comment like this.

Disclaimer: my background has absolutely no overlap with
biology/bioinformatics.

~~~
grawprog
I know about bioinformatics and the contributions of biology to computer
science and vice versa. I've personally worked in both field sampling and data
analysis. My overly offensive generalized statement was a light jab towards
field biologists i've known, including my own friends who i've debated on this
subject with, who tend to dislike computers and will go as far to avoid even
excel and hand write their data while doing all math on a calculator. It
really wasn't meant to be taken seriously and I suppose would go above the
head of those who haven't spent a lot of time in the field with biologists.

~~~
oefrha
Okay, sorry about the misinterpretation, but it should have been clear to you
that ggp was referring to computational biologists or bioinformaticians,
making your observations rather unrelated.

