

Ask HN: Is it still worthy to learn Lisp nowadays for data mining tasks - xyjprc

Seeking advice whether I should continue investing more time on Lisp.<p>My background: switched major to CS in graduate school, familiar with Python, and need to deal with JSON and csv-like data every day. Python has been handy doing the job, but I just started learning Lisp for fun. Though the language looks powerful, for many simple tasks I have to start from scratch while Python usually has convenient libraries and can do things in one line.<p>My friends say Lisp is too old and not suitable for data mining task, no one is actually using that for work, and they all prefer Python. But I leave their words to doubt as they are not familiar with Lisp. (In our generation, people don&#x27;t seem to learn functional programming any more?)<p>Is it still worthy to invest time learning Lisp, for data mining purposes? Any advice is welcome. Thanks :-)
======
ACow_Adonis
I'm torn in how to respond. As usual, I think the answer is "it depends".

A few years ago, i got bored/frustrated with using SAS for really specific
custom built work/research that also worked at large scales and went searching
for a replacement. For my uses, I wanted it to be:

Compiled, fast but also flexible, free, have functions, macros, and an object
system. (and I wanted to learn something from it even if i ended up not using
it).

I settled on Common Lisp. The rest of the world seems to be python/R these
days. I really do like common lisp, lisp is easily my favorite language so
far, but there are some realities which, two years on or so, which i feel able
and qualified to share:

Some observations on Common Lisp:

Cons:

-its a big language. I still feel like i don't get all of it, but that doesn't necessarily matter.

-its not batteries included. That means you're basically going to be coding ALOT yourself. And that requires a lot of work/knowledge about what you're doing. And if you don't understand good data structures/compsci fundamentals, you aren't going to beat the implementations that already exist in other languages. I want to dismiss this because I'm generally trying to write new stuff that doesn't exist anyway and I want low/high level access simultaneously for performance reasons, but it seems there's always supporting libraries/infrastructure underlying some of your work that you didn't really think of that don't exist sufficiently now. THIS IS REALLY THE BIG NUMBER 1 STRIKE AGAINST THE LANGUAGE. Do not underestimate how much you'll be doing if you think you can just implement techniques that have already had X years of work being implemented in other languages.

-It is not dominant, or even widely known any more. Workmates/friends will ignore you for writing in it. Your work probably won't let you use/install it...

-You'll resent other languages if you successfully learn it.

Pros:

-of course, i find current libraries in languages don't do what i want, so i often find i have to rewrite things anyway.

\- Its great for solo, exploratory work or work that doesn't exist yet.

\- Its fun/liberating to code in.

\- It will beat the absolute pants off of Python/R performance wise if you
ever get it up and running and if that's important to you. It is to me. But
you can only implement said performance if you know what you're doing.

\- SBCL kind of gives you the best of both the dynamic/static
compiled/interpreted worlds.

\- I found it really does open up your eyes to a lot of compsci-theory aspects
other things gloss over. Of course, by glossing those aspects over, python/R
can make your job a whole lot easier if they aren't important...

Python is actually pretty cool, but its also pretty slow. I really do prefer
lisp, but most of the world prefers algol-esque syntax. I think its really up
and coming in the machine learning/stats world.

R: Is liberating coming from SAS. Has huge stats community backing and huge
number of stats packages. Coming from LISP however, its the horribly
disfigured plastic-surgery older-hollywood nightmare of a beautiful starlet
you remember from your youth, but has now carved up its face to try to look
like the other young starlets :P (haha, only serious). Which is to say, its
got enough lisp influence behind it to seem familiar, but its a horribly
designed/implemented language coming from a programmer background...

More accurately though, if Python is slow, then R is SLOOOOOOOOOOOOOOOOOW.
Really, its painful to type things at the R repl after coming from common
lisp. It has a bit of a cult-like following amongst stats people though...

Hope i've offered at least a little bit of valued feedback...YMMV

~~~
xyjprc
Thanks for sharing your valuable experience!

Yes I have the same feeling (for now), that at many times I have to implement
things from scratch using Common Lisp, and that pushes me to understand both
the essence of the algorithm and how computation works.

As I am still in the early stage of learning Lisp and just switched to CS, I
am learning many mind-blowing things at the same time (SBCL+Emacs+Slime), this
combination sum up to create a quite steep learning curve, and yes they are
just time black holes, I have invested more time than expected every day in
them and sometimes have to use Python to get the job done first to make my
advisor happy, and then try rewrite in Lisp in leisure time.

Maybe it can be good for me to learn a bit of Lisp every day, because I don't
expect to understand all the ideas in the SICP book and remember all rules in
the Practical Common Lisp book in 21 days.

As I am doing some research, Python is indeed effective in exploring
data/testing preliminary models. But most recently as I just experienced, it
gets unbearably slow sometimes. (By unbearably I mean >15 minutes on PC.)

Performance is also important for me, not only for the sake of "big data", but
also for self-fulfillment: feels much better if the code gives result in 10
secs rather than several days. (By the way, does CL have mature support in
parallelization?)

My lab mates also use Mathematica a lot and it is even more convenient in
testing models for data mining tasks, but I have been warned that some
packages are buggy at this moment, and most modules are just black boxes that
don't allow fine-grained tuning.

I also tried R and at most times don't quite understand the logic of the
language. Maybe as you pointed out, it is the gene of the language that
determines its shape.

Thanks again for your response, it is good to see someone doing real stuff
with Lisp for data mining tasks nowadays.

~~~
ACow_Adonis
If you're going to try to do practical things in Common Lisp, make sure you
get well acquainted with quicklisp, for easy library installation, and
[http://www.cliki.net/](http://www.cliki.net/) for finding out about half of
those libraries. I say you have/will want to do a lot yourself, but you don't
have to do it all.

Be aware that SICP is written for Scheme. They're both lisp dialects, but they
are different. I've done bits of it here and there in Common Lisp, but the two
variable namespaces, no first class continuations, different function names,
less emphasis on recursion (due to specific iteration forms/macros in Common
lisp) and native efficient data structures offered by common lisp might make
some things non-applicable or difficult even if you're already familiar with
the language. Tail call optimization is, i believe, not required in the Common
Lisp standard, but I believe SBCL performs it, at least above certain low
optimization declarations.

I've only just started looking at multi-threading in Common Lisp myself. I
imagine the proper answer about "mature" parellelization is no, relative to
some other languages (i have no real basis for this, just side-knowledge that
other languages focus on such issues more. Hell, its probably better than R.).
Multi-threading is not established in the language specification, but in
practice, implementations have their own version of it, including SBCL, which
is the version I'm sticking to for numerical work. If you're looking into it,
I believe you might also get acquainted with Bordeaux Threads. Once you've got
quicklisp installed, you can install that easily from there :P

That being said, when you're performing something 10-500 times faster than the
interpreted memory-hungry mess that is R for example, the complication of
multi-threading can fall down the priority list...

Totally independent of my decision to try to the language, i later found this
paper,which suggests that one of the authors of R is well aware of and in
agreement with some of my own observations, plus there's a benchmark in there
of common lisp, R, and python, though take all such microbenchmarks with a
grain of salt, because you will be spending more programmer time in Common
Lisp:

[https://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-20...](https://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008.pdf)

Unlike some language zealots, i feel its my responsibility to remind you this
is the road less traveled. I don't think its the way industry/academia is
moving. I think there is, perhaps arrogantly, a selection bias in who becomes
a Common Lisp programmer, such that inquisitive/smart/alternatively-thinking
people tend to be attracted to the language, but if you think it will
magically makes you 100 times better than you already are because some other
smart guy uses it, put that idea right out of your head.

But also honestly, I've found the experience extremely rewarding, and I code
my own projects in Common lisp now where I can.

I came in the top 20 of [http://www.kaggle.com/c/facebook-recruiting-iii-
keyword-extr...](http://www.kaggle.com/c/facebook-recruiting-iii-keyword-
extraction/leaderboard/private) by literally coding something up from scratch
in common lisp, and my model had easy room for improvement and was among the
fastest, so you can most certainly compete writing in lisp.

Now, aside from my eternal side-project, I'm trying to put something together
in the field of data linking in Common Lisp. I think its one of the only
options that will let me combine the flexibility of dynamic languages with the
performance of C...(i'm actually hoping to perform better than the US Census
Bureau C program)...I already know it blows the current python/R options out
of the water, but time will tell whether I can match it with C performance-
wise...

------
rurban
Sure. LISP is always worth the time, even if you cannot use it at work.
Technically still far superior to everything else out there.

------
bugsenseusesit
BugSense (now part of Splunk) uses Lisp for quick analytics.

[http://highscalability.com/blog/2012/11/26/bigdata-using-
erl...](http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-
lisp-to-fight-the-tsunami-of-mobi.html)

------
Snail_Commando
Which dialect of Lisp are you studying? Common Lisp?

~~~
xyjprc
Yes I am trying to learn to use SBCL+Emacs+Slime, using the book "Practical
Common Lisp" and SICP

