
How Lisp-family languages facilitate building bioinformatics applications - abrax3141
http://bib.oxfordjournals.org/content/early/2016/12/31/bib.bbw130.full
======
torrent-of-ions
When I started working in bioinformatics I was excited to use Lisp for my
work. I wrote a considerable amount of library code and programs at first. But
I always found resistance when it came to other people using my programs and
even more if the program was to be published and released for general use.

There is a large amount of really, really bad Perl code still in use in
biology. Trust me when I say to those outside of the field, I've seen things
you people wouldn't believe. I try to discourage Perl use between it makes it
so easy for biologists to write terrible code.

I've found a compromise in Python. People will happily use it because it's
trendy and ubiquitous. And I think it makes it easier to write OK code. But I
miss using Lisp. The style of programming when you have a REPL is perfect for
this kind of thing. The ability to quickly write prototypes which can later be
improved is an advantage which cannot be stressed enough. C is fantastic when
you have a good idea of the algorithms and data structures you want to
implement and are concerned with implementing them efficiently. But that's
rarely the case when writing bioinformatics code. I need to know if the damn
thing _works_ first before I spend time making it work fast. And why would you
not use a language where both things are equally easy?

I'm happy to see this paper because it means that I have a way to justify my
usage of Lisp in the future. Big thanks to the authors.

~~~
hatmatrix
Building a community around lisp is tricky. While there are many criticisms to
this article [1], there are some elements of the "Lisp Curse" which is true -
each programmer tends to rewrite his/her own libraries and so a collaborative
framework seldom emerges even while many programmers may work in the same
domain. This can be a disadvantage in a domain like the natural sciences where
deep collaborations are necessary.

Furthermore, the disconnect with traditional mathematical syntax has also
turned off many users. There was a predecessor to R called Lisp-Stat [2] which
never achieved the acceptance that S/R did, and the Lisp-based syntax of Lisp-
Stat was cited as one of the reasons.

Following the example of a high-profile Lisp supporter like Peter Norvig who
accepted Python as a "Lispy" alternative early on, I think there are a lot of
us who have come to accept Python as a necessary compromise. While Python is
not homoiconic, functional programming is encouraged through its libraries
rather than its core language definition, and the power of macros are missing,
the inherent popularity and all the benefits that come with it is tough to
beat.

[1]
[http://winestockwebdesign.com/Essays/Lisp_Curse.html](http://winestockwebdesign.com/Essays/Lisp_Curse.html)

[2]
[http://homepage.divms.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.ht...](http://homepage.divms.uiowa.edu/~luke/xls/xlsinfo/xlsinfo.html)

~~~
throwanem
> each programmer tends to rewrite his/her own libraries and so a
> collaborative framework seldom emerges even while many programmers may work
> in the same domain

So, exactly like bioinformatics, then. Trivial tasks are supported by packages
of libraries like BioPerl, but all the original research is _sui generis_ ,
with the attitude in general being that each problem is unique and only
results get published, so who cares if the code is a write-only mess as long
as it does the one job after which no one will care about it anyway. In such
an environment, the "Lisp Curse" already exists anyway, so why not get the
benefit of a more powerful and expressive language when you're already paying
its cost?

(Granted, Lisp for bioinformatics is never going to become a generally
accepted thing. But if you're not releasing code anyway, no one cares what
language you write it in...)

~~~
hatmatrix
What about Bioconductor and Openbabel/Pybel?

~~~
throwanem
R and especially Python didn't see a lot of use where I was; the former was
regarded mainly as a chart generator, while the latter had one diehard fan (an
engineer like me, not an investigator) and no further uptake. It has been a
few years since I was there, though, and I haven't kept closely in touch, so
perhaps things have changed in the interim.

------
lispm
Ira Kalet (RIP, 1944-2015) wrote a book using Lisp for the examples:

Principles of Biomedical Informatics, Second Edition, 2013
[https://www.amazon.com/Principles-Biomedical-Informatics-
Sec...](https://www.amazon.com/Principles-Biomedical-Informatics-Second-
Kalet/dp/0124160190/)

------
yomly
Surprised that Racket didn't get a mention in the paper, especially given the
authors' interest in building DSLs

EDIT: (I consider Racket separate from Scheme)

------
arca_vorago
What an awesome paper! It was during my time as a sysadmin at a DNA lab I
became really enamoured with lisp (and emacs), and functional, homoiconic
languages as a whole, for many of the reasons listed. Great to read them all
and more articulated so well.

BioLisp would be a great project.

------
agumonkey
I asked Bioinformatics MOOC teachers about non python/java language in the
field, naming common lisp or haskell. I got a large NOPE from them, saying it
was mostly for performance reason. I specifically asked about the paradigm
too, because the MOOC spent a large amount of time talking about control flags
and temporary variables, that to me obscured the overall logic.. but alas.

------
cixin
Using lisp always seems attractive.

However, in Bioinformatics, the reality is that many users are still using
Perl. It would be get to have them migrate to, modestly more readable
languages.

Outside of scripting, alignment and other performance sensitive code is
usually written in C (or C++). As with many academic projects, the code
quality unfortunately is quite low.

~~~
jghn
I've been in the field since 2001. I have seen exactly two perl files in that
time

~~~
jhbadger
It really depends on the subfield. Sequence-based genomics is _very_ highly
Perl-dependent even today. BioPerl is pretty much the standard library for
that even though BioPython, etc. are beginning to take over. Other subfields
such as differential sequence abundance which are more about the statistics
rather than the raw sequence tend to be R-centered thanks to Bioconductor.

~~~
jghn
I think by your definitions there are sub-subfields then ;) In my neck of the
woods sequence based stuff has java & c doing a lot of the heavy lifting.

------
mybrid
Usability matters.

Functional versus procedural programming is first and foremost a usability
problem for developers. The two are interchangeable as far as being Turing
complete.

I just spent a week at a Jakolb Nielson's education course on usability. I
asked the executive vice-president at the company if they had branched into
software development itself, sadly the answer is still no.

Functional programming for the majority of developers is not usable. The
article mentions they are are targeting DSL, domain specific languages. Which
is fine. For example, Haskell has seems to have found a boutique community in
async, MQ world. But, from a usability perspective one is severely limiting
ones pool of developers choosing a functional language.

It is fascinating from a human nature perspective that people who think in
functional programming are brains are wired differently from the procedural
programmer. There is very visceral reaction when either camp is asked to
program in the manner they find least usable.

Ultimately I think compilers will get to a development stage where independent
of functional programming or procedural programming approaches the same
optimized code gets implemented under the hood.

Postgres was originally written in LISP. Ultimately Stonebraker had to make
the call that if Postgres was going to get adopted widely, LISP had to go and
it was rewritten in C/C++ before being open sourced. As a point of research
you all might want to look into the Postgres experience.

~~~
DonaldFisk
The original paper is here:
[http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf](http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf)

There are other cases where an original Lisp implementation has been
rewritten: ViaWeb and Reddit. I think the reason is that the teams which took
over the projects were unfamiliar with Lisp.

My experience is different, though I should mention that I'm a sole developer.
I'm roughly equally experienced in C and Lisp, but find I'm far more
productive in Lisp. My first language was FORTRAN. Unlike C, Lisp is memory
safe and you don't have to manage the memory yourself, interactive, strongly
typed, you never need to write a parser, the built in symbol and list types
are incredibly useful, and there are fewer "gotchas".

Lisp is multi-paradigm rather than purely functional, so like the Algol
derivatives it has loops and destructive assignment.

------
swuecho
Not try to be negative.

But Perl is still used more widely than Lisp in bioinformatics. or lisp can be
neglected in bioinformatics compared to Perl.

Check the status of BioPython, BioPerl. BioLisp? not even exists.

~~~
DonaldFisk
It does exist. It's now called BioBIKE.

[https://en.wikipedia.org/wiki/BioBIKE](https://en.wikipedia.org/wiki/BioBIKE)

[http://biobike.csbc.vcu.edu](http://biobike.csbc.vcu.edu)

------
chmaynard
This is obviously an opinion piece disguised as a scholarly article. I'm not
sure what the hidden agenda is, and the HN comments don't clarify matters
much.

For example, this nonsense statement reveals the author's bias: "Clojure is a
rising star language in the modern software development community." Huh?

~~~
swuecho
in my opinion, better to publish as a blog post.

------
dangom
"Gat [3] compared the run times, development times and memory usage of 16
programs written by 14 programmers in Lisp, C/C ++ and Java. Development times
for the Lisp programs ranged from 2 to 8.5 h, compared with 2 to 25 h for C/C
++ and 4 to 63 h for Java (programmer experience alone does not account for
the differences). The Lisp programs were also significantly shorter than the
other programs."

Could selection bias be skewing these results?

Proficient Lisp programmers can certainly create shorter and faster programs
with Lisp. Who would ever contest that? Average programmers, on the other
hand, can probably develop in a similar amount of time and write faster
programs with C++ (given the amount of libraries and information available -
in comparison to Lisp - and the ubiquity of tooling).

------
vt100
How about Hy? It's a Lisp that runs on the Python interpreter, and
interoperates with Python. It's not extremely mature yet, but it is really
cool!

~~~
goatlover
It's pretty wild using numpy or pandas in Lisp. Hy is definitely cool.

------
jxy

       on average, the Lisp programs ran significantly faster
       than the C/C++ programs and much faster than the Java
       programs (mean runtimes were 41 s for Lisp versus 165 s
       for C/C++).
    

The only thing I can say is that their C/C++ code has serious problems.

~~~
lucidguppy
Perhaps they didn't optimize.

------
jonathanyc
As a Lisp fan myself, I wish they would show an example of how WITH-GENES or
MAP-GENES encouraged encapsulation vs. objects or passing closures.

~~~
abrax3141
The section of the paper where this is mentioned offers a novel (at least to
me) argument in support of homoiconicity, to wit, that you build DSLs directly
into the base language, which (it is asserted) makes the DSL development
process more flexible (or something), and that this is important for 'living'
domains, where you're trying to work out the domain semantics, like biology.
So, you're not really so much building a specialized language for X, as slowly
turning (almost in the sense of wood turning, as on a lathe) Lisp into X-Lang.

~~~
Volt
A concrete example.

