
Rise of the Scientific Programmer - StylifyYourBlog
http://byterot.blogspot.com/2015/01/future-of-programming-rise-of-the-scientific-developer-bigdata-datascience-machine-learning-and-fall-of-the-craftsman.html
======
Fede_V
As other people have mentioned - the problem is that academia rewards
producing papers, not stable software libraries.

For example - scikit-learn is an amazing project, lead mostly by a small group
at INRIA with Gael at the helm - and in terms of academic prestige, scikit-
learn is probably 'worth less' on your CV than a couple of Nature papers.

This is of course ridiculous - scikit learn is used by a huge amount of
people, it takes an insane amount of work to run the project, yet the
incentives are what they are.

~~~
palvaro
> the problem is that academia rewards producing papers, not stable software
> libraries.

isn't this the good thing about academia? aren't you kids getting paid the big
bucks to write and maintain the software libraries, while we work on novel
problems for pennies?

~~~
Fede_V
I am an academic myself! Aside from that - it's actually a bad thing, poor
software quality is incredibly harmful when trying to create reproducible
research.

A few people are fighting this (Titus Brown, etc) but it's mostly swimming
against the tide of bad incentives.

~~~
blackkettle
i have started using Docker for this kind of stuff. you can build an isolated
environment for your software and experiments, where you can absolutely
guarantee that anyone who wants to can easily replicate your experiments,
since they don't need to create the environment themselves - just pull the
docker image for conference-paper# and run the scripts.

if the experimental data is proprietary, or you want to keep it separate, you
can set a mount point for it in the lxc.

~~~
mcguire
" _anyone who wants to can easily replicate your experiments_ "

Replicate the experiments, or just repeat the results?

~~~
blackkettle
the stuff i work on is in the area of machine learning, so most published work
involves one or more well-known data sets.

i would argue that the two are the same in this case.

the lxcs provide all the source code i write [plus of course the compiled
version], all third-party libraries, and all scripts used to run and evaluate
the experiments, and the data as well, where that is permitted.

it's still not perfect, but for my area, i honestly think it is the best, and
most accountable way to do things that i have seen.

~~~
mcguire
And hopefully one or more not-so-well-known, local data sets to check that the
results are actually as claimed?

~~~
blackkettle
well, the idea is that you should be able to run _any_ data set you have, and
get good results relative to other solutions. but that is an open question
with any research.

the point of the docker/lxc aspect is to provide a simple working environment
to facilitate replication and validation.

so in comparison to the status quo, which is basically 'write a paper, include
some high level equations, and results', i think this is a step forward in a
better direction.

------
danso
The other night I was searching for Python science books and stumbled across
this one titled "Python for Biologists: A complete programming course for
beginners"...the homepage for the book is here:
[http://pythonforbiologists.com/index.php/introduction-to-
pyt...](http://pythonforbiologists.com/index.php/introduction-to-python-for-
biologists/)

Admittedly, it only has two reviews on Amazon, but they're both five stars,
and they both seem to come from biologists who are apparently thrilled at
being able to leverage code for their work...the funny thing is, the book
itself is not "advanced" as far as what most professional programmers would
consider "advanced"...the Beginners' book ends with "Files, programs, and user
input" and the Advanced book ends with Comprehensions and Exceptions...

I think we as programmers vastly understimate how useful even basic
programming would be to virtually anyone today. I work at Stanford and it
continually astounds me when I run into non-programmers who are otherwise
doing data-intensive research, who fail to see how their incredibly repetitive
task could be digitized and implemented as a for-loop. It's not that they are
_dumb_ , it's that they've never been exposed to the opportunity. And
conversely, it's not because I'm smart, but I literally can't remember what it
was like _not_ to break things down into computable patterns. And I've been
the better for it (especially because I'm generally able to recognize when
things _aren 't_ easy patterns)

Sometime ago, I believe it was Stephen Hawking who speculated that the realm
of human knowledge was becoming so vast that genetic engineering of
intelligence might be required to continue our progression...that may be so,
but I wonder if we could achieve the same growth in capacity of intellect by
teaching more computational thinking (and implementation), as we do with
general literacy and math. As Garry Kasparov said, "We might not be able to
change our hardware, but we can definitely upgrade our software."

[http://www.nybooks.com/articles/archives/2010/feb/11/the-
che...](http://www.nybooks.com/articles/archives/2010/feb/11/the-chess-master-
and-the-computer/)

~~~
noobermin
From my experience, I've come across many individuals in academia who I've
tried to suggest particular approaches to their choice of problem and be
rejected because they are too proud or too afraid to learn something new. I'm
not trying to be an ass when I do, because that would undermine my attempt to
share information, but something as simple as suggesting "for i in list:"
instead of "for i in range(len(list))):" is so offensive to them since they
didn't learn it first when they learned how to program (and god-forbid they
learn it from a second year graduate student).

May be biologists or people at Stanford (or biologists at Stanford) are less
proud, but my experience here has made me stop trying to relate basic
programming concepts to fellow academics.

~~~
jwmerrill
I had the same experience in grad school, and I can tell you that giving
people (good!) style advice when they don't ask for it is pretty pointless.
It's kind of like giving grammar advice that wasn't asked for.

People need to be in the right frame of mind to learn new things. Otherwise,
if they're getting their point across, just let them keep talking (or coding).

~~~
noobermin
You're probably right about it seeming rude, comparing it to grammar advice
makes is a good analogy--I didn't think about it like that.

~~~
nitrogen
Programmers are expected to give and receive style, performance, and idiomatic
advice in every code review. Is there some way this sort of peer code review
could be integrated into the academic process?

~~~
jwmerrill
Principle Investigators (i.e. professors that run labs) could establish a code
review process for code that their group produces, just like any other manager
can establish such a process.

Many PIs don't have the expertise to do that well, and many of them don't
especially value style, performance, or idiomatic code. You have to remember
that most academic code gets used by 1-5 people, and is run something like
1-50 times total. In those cases, it's actually kind of ok that that code
doesn't end up being maintainable.

The important risk, of course, is erroneous results. But good researchers
generally find many independent ways to check their results, so ideally, bugs
that affect results should get caught in that process.

I personally love writing high quality code, and I found the academic science
attitude about code to be frustrating. But there are structural reasons for
these attitudes, and it's wrong to imagine that you can change them by arguing
in terms of things that software engineers find valuable. You'd need to make
your case in terms of things that science PIs find valuable, and in the
process of trying to do that, you might actually discover that what you wanted
to argue for isn't so critical after all.

~~~
ajtulloch
> You'd need to make your case in terms of things that science PIs find
> valuable, and in the process of trying to do that, you might actually
> discover that what you wanted to argue for isn't so critical after all.

This is a great sentence - thanks for contributing.

------
elliotec
I have noticed a general trend upwards in the interest of scientific
programming for a few months now, and the community (most specifically Hacker
News) has driven my interest in that area as well. The idea of functional
programming and thinking in mathematically sound ways really appeals to me,
but my lack of math and comp sci background is holding me back from going
full-speed learning and getting better at it.

I feel many of us are lost swimming in a sea of opinions and juggling
frameworks du jour, development methods, and business strategies, that it
keeps us from focusing on improving our skills in areas that matter. This
frustrates me and I've been looking for ways to get out of it. There is also
this fear of another bubble mixed with trying to keep up with the trends and
hipness of the industry, to remain gainfully employed.

I realize I am sort of just reiterating the authors point, so I guess what I'm
saying is I agree.

~~~
kirse
HN tends to gives this perception of rapid change because you're always
hearing about hot new tech and how great it is from the various tech
evangelists. It's a side effect of their faith/vision in their own tech that
you feel like you're missing the train.

However, I've been following HN for 7.5+ years now, and change is hardly as
fast as you think. There really is nothing new under the sun, and you see the
same core principles (and human needs) being expressed as the underlying
technology changes. The core principles of functional programming have been
around since the 30s as lambda calculus, and since the 50s with their initial
expression as LISP. It's better to learn FP to add a new way of thinking about
problems to your toolbox, rather than treating it as a panacea of programming.

If you're feeling lost in the sea, define your lighthouse. HN will most
certainly have you paralyzed by the many waves of every new choice if you
haven't defined a clear vision for how you want to harness your finite energy.

------
patkai
Great perspective on the future, we would need many more similar discussions.
I love the main message of the post, i.e. "we need to grow up". I have the
impression that while we are all optimistic that the future is owned by
software developers we don't realise that not for all. There will certainly be
more segmentation in our profession and there will be great demand for high-
end developers. This requires a lot of learning, and I personally feel it's a
tough challenge.

The post also made me realise how much we still think in terms of disciplines.
E.g. we think a developer should learn more mathematics. If we were thinking
in terms of problem solving, or "modelling reality" (at least in part with
software) we couldn't separate these so easily. E.g. if you are writing a
software for vehicle condition monitoring you use a combination of
engineering, physics, mathematics, computer science - the less you try to - or
need to - separate them the better you do.

I can't quite put it simple, but in my mind I can see the future "developer",
how got a BSc in Physics, went on to work as a software developer for a couple
of years and then continued to learn every day maths, physics, biochemistry,
worked in various projects where she could use all these. She is neither a
physicist, nor a software developer or mathematician.

~~~
lightcatcher
> you use a combination of engineering, physics, mathematics, computer science
> - the less you try to - or need to - separate them the better you do.

I disagree that mentally separating math, physics, and computer science is a
good thing. I think a good scientific programmer should understand the science
on its own, and then figure out how to model/approximate the science to do
something useful on a computer. For instance, if you want to implement Reed-
Solomon codes efficiently, you'll realize that understanding the algebra
required to understand the code is more or less an orthogonal skill to
designing an efficient encoders and decoders.

As a personal anecdote, I had much better luck learning about waves and
quantum mechanics after I knew about differential equations, orthogonal
decompositions, and a fair amount of linear algebra than I did before I knew
this math, even though the physics classes included all of the necessary math.
I attribute this better understanding to have a cleaner mental map, because I
knew which statements were true because of some sort of physical fact, and
which statements were just mathematical results.

In line with the generally smiled upon principle of decomposing problems, I
think its particularly critical that the scientific programmer can decompose
these interdisciplinary problems. A scientific programmer should understand
the science (at least mostly), understand the hardware, and figure out how to
efficiently and cleanly map the scientific problem to something that to be
solved by a program.

------
walshemj
I started on the science/engineering side at BHRA (on campus at CIT) only
problem with technical/scientific vs commercial is the pay is so poor.

~~~
toufka
And academically the prestige is poor. One is not granted 'research time' to
develop software, but to 'get things done' (see u/Danso's comment). As such
there's no one to take the first step in actually making software that would
benefit anyone. And in the generous circumstance that one can be allotted time
to write the software, the result is a pat on the head - 'good job' \- for
reducing everyone's workflow from weeks to minutes. Sometimes you get an
acknowledgement. And no one will ever support/read your software when you
leave - it will be used ritualistically until the lab's last computer's OS no
longer supports it.

On the topic I see two other significant problems:

1) In basic research there is often a need for 'Every Option' style software -
you're doing something that's never been done before and you need to be able
to tweak it exactly how you need (but also be able to 'just hit run' for a
first pass when coming from your native field). And those types of software
are inherently a mess to design and build (ie. photoshop, CAD, 3D, programming
languages).

2) Some of this software can only be written by those who directly do the
research - or someone who very closely collaborates with them. Scientific
software contains _scientific_ assumptions in it that are very hard to
evaluate if you're not part of the field. Deciding to go right-way-round, or
rounding up, or leaving off the last element in an array, or any other of
those programming tricks can really mess up scientific work. Or conversely,
using the entire array, using a non-weighted, 'avg' or treating the red
channel mathematically the same as the blue channel is a very different way of
designing software than other industries - and is not common and rarely given
much thought.

~~~
tjradcliffe
There's a modest living to be made at the interface between "scientific
programming" and "commercial implementation of programs scientists write".
This is an under-appreciated niche because it takes a lot of work to get into
it: you need to be a good developer and a good scientist with diverse
experience. My way in was through experimental and computational physics, but
there are certainly other avenues these days.

Modern statistics is the biggest piece of the picture that every interesting
area has in common. If you're interested in scientific programming you need to
understand Bayes as well as algorithms etc. I have friends in psychology,
biology, etc and we can communicate surprisingly well because we all speak the
same statistical language.

But more importantly you need to understand how scientists think. They are
amazingly hard to pin down to the kind of specs developers need.

For example, a guy on my team once said after talking to one of the scientists
we were working with for a couple of days, "I now have a much better grasp of
the problem, but I still don't know what the default value of this parameter
should be." I spent ten minutes talking to the scientist and came back and
told the developer "5", because I could tell from the way the scientist was
talking that he had no clue if the number should be 3 or 10, but seemed to be
favouring the lower values. I didn't need to understand the problem domain in
detail to make that judgement, but to have a reasonable grasp of the
psychology of working scientists. So far as I know, there's no way to get that
without working as a scientist yourself.

------
juretriglav
Shameless plug: In January I'll be focusing on a project which deals
specifically with scientific software:
[http://sciencetoolbox.org/](http://sciencetoolbox.org/) This current version
is a product of a hackathon, but this month will be improving it and adding
functionality which brings the scientific software developer and her efforts
into focus. Scientific software is gaining importance, but the recognition its
developers get is trailing behind - I want to raise the level of associated
recognition/prestige (among other related things). Some other projects rely on
data collected here, e.g. a recommendation engine for said software that
enhances GitHub: [http://juretriglav.si/discovery-of-scientific-
software/](http://juretriglav.si/discovery-of-scientific-software/)

Shameless plug continues: if you'd like to keep track of what I'm doing I
suggest you either follow the project on GitHub
([https://github.com/ScienceToolbox/sciencetoolbox](https://github.com/ScienceToolbox/sciencetoolbox))
or Twitter
([https://twitter.com/sciencetoolbox](https://twitter.com/sciencetoolbox)).

~~~
toufka
Units. I'd love so much for a standard fileformat/interpreter/concept that
contained SI units as a requirement for most datatypes. Much to be learned
from my TI-89.

~~~
dalke
I don't know what you mean by "as a requirement for most datatypes". I work
with molecules, where distances are typically measured in Angstroms and masses
in amu. I don't want to have factors of 1E−10 and 1.660538922E−27 hanging
around my code.

~~~
detaro
I assume that he means that the type system knows that these are in useful
units, and tracks them, so you can have it check that your calculation
actually results in a value of the unit you expected.

~~~
wtallis
Yeah. There's really no reason why our hot new scientific computing languages
and libraries should all be lacking capabilities that graphing calculators had
in the 1980s when they only had 2k of RAM. HP demonstrated that it doesn't
even require a CAS to be extremely useful.

~~~
dalke
I know what unit libraries are, and why they can be useful. There are several
units libraries for Python, the Boost C++ library includes one, etc.

I don't know why they should be a 'requirement for most datatypes'. Could
someone please explain the requirement part?

As a further clarification, why should people in scientific computing, in
fields which use non-SI units like eV, amu, Angstrom, barn, light year, and
megaparsec, use a programming language which requires SI units? Quasar 3C 273
is 749 megaparsecs from us, or 2.31E25 meters away. I don't see why SI should
be preferred.

~~~
wtallis
Any programming language that makes units a first class part of its type
system would allow for defining custom units just as easily as defining other
data types. Nobody's saying that SI has to be the only units expressible, just
that it has to be the foundation of the unit system. Likewise, unitless
numerical quantities will necessarily still be expressible, but using those
types for variables that represent a value with a unit should be considered
extremely poor practice, just like when communicating those values on paper.

There's really not a good reason to argue against using only SI units for data
interchange formats. It's trivial to map to the preferred units on import or
display, if and only if you know what units the input data is in. I've dealt
with too many bugs where interacting programs have differing assumptions about
meters, centimeters, and millimeters to believe that the flexibility of
storing different units on disk is ever worth the trouble.

~~~
dalke
I use Python. If I use one of the third-party packages for units then I can do
what you say I can, but at extreme cost. Every single operation checks for
unit conversion, on the off-chance that the values aren't compatible. The
system, in trying to be nice to me, ends up making things invisibly slow.

(In practice, the performance code runs in C, so the Python/C boundary would
have to negotiate the array types for full unit safety.)

In my work the base unit of length is angstroms. I've used nanometers a few
times, and never used any other length unit, though I know that GROMAC's xtc
format uses picometers. Saying something has a volume of 600 cubic angstroms
is much more useful than 6E-28 cubic meters. While I can appreciate that other
fields closer to human scale use may like to standardize through SI, I don't
want your preferences enforced on my field. All I see is the chance to make
things worse, and slower, and don't see any advantages.

One of my data formats has coordinates in angstroms, like "8.420 50.899
85.486". How would you suggest that I write that in an exchange format? As
"8.420E-10 50.899E-10 85.486E-10"? (Or the last two normalized to E-11.) At
the very least that's a lot of data for very little gain. It gets worse for
trajectories, which might save 1 million time steps x 10,000 atoms/time step x
3 coordinates/atom = 3 billion coordinates to an exchange file. I see no
advantage to doing that in SI units.

In practice those distance coordinates will likely internally represented in
angstroms. Consider that the Lennard-Jones potential is sometimes written as
A/r^12 - B/r^6 , with expected values of r around 1E-10m. The denominator of
the first will go to 1E-120 in intermediate form, and not be representable in
32-bit float. While not relevant for Python, which uses 64 bit floats, some
molecular dynamics programs will use 32 bit float. (Eg, for older GPU
machines, or to save space.)

My other example was the atomic mass unit, another non-SI unit. I have only
used amu (for chemistry) or dalton (for biology) in my work, not kilograms. It
seems pointless to require that I store the mass of a carbon as
1.9926467051999998e-26 kg instead of 12 amu.

I therefore disagree, and believe there are good reasons to argue against SI
units for some data interchange formats. I agree that I want to store a single
distance unit on disk, only that unit is the non-SI unit angstrom and amu, and
not the tremendously huge meter or kg.

------
roflmyeggo
Computational and biological sciences will likely meet on a financial
equivalent to commercial software applications at the intersection of
epigenetics and pharmaceuticals in the new few decades.

When scientists begin to discover feasible methods to cure or manage
previously incurable diseases (a more recent example of this has been attempts
to cure Cystic Fibrosis), or more specifically reversing some of the diseases
that our older baby boomer populations are suffering from via epigenetic
methods, you can bet your bottoms that there will be a huge influx of capital
in the sector and a subsequent increase in demand for computational
biologists.

Of course we could end up in a sort of quasi-understanding parallel to that of
quantum mechanics and end up in a epigenetic limbo, but the general feeling is
that of high hopes.

------
crb002
Pair programming keeps it human, and transfers knowledge very well.

TDD is about reproducibility of results, which is very in line with the
scientific method. Benchmark tests will show you when your solutions are
getting out of hand on performance.

The sunk cost fallacy is a big problem. Moving to a new platform like
HTML5/iOS/Android gives a short reprieve, but soon those proprietary code
bases will age.

The other big problem is that usually a smaller portion jobs goes towards
management in flatter organizations. Managers want lots of layers for job
security.

Eric Meijer is right that small teams which are given narrow mission
objectives instead of detailed requirements, and measure their problem domain
instead of guessing, will be effective.

I'm curious if a Fat-Tree model of management will take hold,
[http://en.wikipedia.org/wiki/Fat_tree](http://en.wikipedia.org/wiki/Fat_tree)
You get a flatness that improves communication latency, lots of bandwidth, and
managers are happy because there are a lot of jobs at the top.

------
tarikjn
I think many in the comments are misunderstanding what the author mean to say
in his post. He is not talking about working in academia or making scientific
software. He talks about improving one's skills in basic science and such
fields in computer science as A.I. which have historically been entrenched
more in academia than industry.

~~~
sqrt17
The author of the post started by getting into programming/CS by wanting to
earn more/better money, by picking up an Access book and working it out from
there, and now that he's more established, he looks down at the young hungry
people who picked up an Access book in the hopes of more/better money, running
through all the tropes that people who got where they are through knowledge
will use.

Yes, it's useful to understand things better, and to know math. And as always
since the first CS degree started, CS people gripe that people should Know
More Math. Sure it helps. Other, less prestigious, things also help but you
don't hear people griping about it. TDD allows the idiots in. Yes, that's
effectively why you want TDD, you want to get more mileage and solving more
complex or more bug-sensitive problems using the same people. Building
software is not about being smart (although that helps on occasions), it's
about getting stuff done.

Yes, machine learning and AI are the new kids on the block, and like Web
programming, they will see a bloom of increased customer demand, and like Web
programming, we'll get a progression from bespoke boutique software to
frameworks that make people's lives easier to frameworks that allow any person
with the intelligence of a pet rock to do simple stuff productively. Why is
that? Because building frameworks is the only way that the smartest people can
earn money faster than programming the (N+1)th variation on that theme
everyone follows -- frameworks are what make people more productive, or allow
you to use a workforce that's more accessible.

As a Wizard With a Pointy Head (aka academic), I'd say that the need for
Wizards With Pointy Heads in production work is often overestimated and/or
idealized. There is a large number of PhD graduates, and the market happily
gobbles them up (indeed, realizing that you can hire PhDs and have them do
productive work is one of the things that made Google successful as a company
back in the early 2000s).

------
placebo
Mostly agree with the article. Being myself fascinated with machine learning
and in the process of refreshing mathematic knowledge I haven't used since
university (too many years ago) in order to dive deeper into it, I can
definitely relate.

However, I think the main point is not that software developers should all
hone their academic math skills (that would be probably be pointless for many
if not most software developers), but rather that it would be best if software
developers would strive to follow the scientific _mindset_ when developing
software - In my experience, occam's razor is just as important in software
development (design, architecture, algorithms, testing, you name it) as it is
in physics, chemistry or other sciences and it is this aspect (which I feel is
the most basic and most important) that gets lost sometimes in the noise of
software development trends and fashions.

------
amelius
The problem with scientific software is that the market is so small.

It is far more profitable to just write mainstream software.

~~~
Retra
It's more profitable to do scientific research slowly and tediously than it is
to automate it?

~~~
analog31
Disclaimer: I'm in the scientific equipment and software business. There are
some issues to overcome. These may just be excuses from people who don't want
to change their game, but nonetheless:

1\. Sometimes, wages and equipment / software come from different pots of
money. No matter how much sense it makes to replace labor with automation in
the grand scheme of things, if you can't move money from one pot to the other,
then you're stuck with the status quo.

2\. I think that people sometimes underestimate the degree of customization
and effort required to automate a specific process. Or sometimes overestimate,
as in, "our process is so special that no commercial tools will work for it."

3\. If anybody is going to work on automation, it will be the students
themselves, as it's a way to learn a valuable skill. They may be doing it with
minimal fanfare, and there is a strong movement towards open source tools.
Students realize that the generous budgets and site licenses will vanish when
they leave the academe, and are interested in preparing themselves for
freelance work, startups, etc. This may also favor general purpose tools,
rather than those designed specifically for science.

My anecdote, from 25+ years ago: I taught myself electronics and programming
by automating my student projects, culminating in my thesis experiment.

~~~
Retra
I really think we need more required (good) computing courses in both math and
science curricula.

~~~
analog31
I was really lucky that I took to programming pretty easily while still in
high school. (graduated '82). Likewise with math. As a result, I was able to
integrating computing into my work with minimal guidance.

But it's my view that the teaching of math and science should involve
computation, starting as early as possible. It still amazes me that a kid can
go through high school without learning about something that has had so much
impact on our society. In my utopian world, there would be at least one
question in each physics homework assignment with the instructions: Solve this
with computation. And it wouldn't be a big huge deal.

------
gpcz
I'm having trouble understanding the definitions of these roles. I see the
chart, but the terms are all vague to me. What does a data scientist do that a
mathematician or scientist doesn't do, and what does a scientific programmer
do that a data scientist doesn't do?

My impression was that "data scientist" was a colloquialism for "statistician
that knows how to program." Is a scientific programmer just a programmer that
knows some statistics? Why is the direction important? The author says he/she
feels that a programmer that knows statistics can make "more robust software"
than the other way around, but what exactly does that mean? Do they mean
"doesn't crash as much", or do they mean "gives the right answer more often?"

~~~
tarikjn
for one thing statistics don't really encompass A.I., computer vision etc.

~~~
Toenex
Sorry but this is wrong. Statistics and probability theory underpin most of AI
and machine vision.

~~~
tarikjn
One thing can underpin another without encompassing it.

Mathematics don't encompass physics and finance.

------
thyrsus
Here's my attempt at a TL;DR:

* The '90s "Access in 24 hours" programmer has been replaced by the latest anecdote-based technique/toolset preacher; e.g., TDD.

* Becase deep learning is better than humans at finding useful patterns in data (whether concerning biochemistry or web site interaction) it is the best technique.

* Aesthetic (e.g., language) and social justice (e.g., feminisim) issues distract from utilitarian effectiveness.

* Utility is only furthered by math and science (where for "science" read "patterns inferred from data"), and we should aspire to be "scientific programmers" who apply only math and science.

~~~
thyrsus
My biases may be guessed at in my summarization, but let me make them more
explicit.

I think it odd that some of the techniques he rails against are inspired by
mathematics: TDD tries to preserve invariants; RESTful design tries to impose
the invariance of idempotency. If the question is whether those techniques
make those who use computers more productive, then a scientific answer would
involve a stunningly expensive human subjects experiment involving large
numbers of people and complex problems. The likely result would be this:
[https://xkcd.com/1445/](https://xkcd.com/1445/)

I'm a sysadmin doing my best to automate (e.g., puppet); I rarely have the
luxury of collecting data sufficient to the immediate problem, so I rely on
math and (unreliable) heuristics. I write perl/shell/puppet/ruby/anything in
small fizz-buzz complexity chunks; an "artisan" if you will. I support CAE
environments with low-latency and poor parallelism opportunities, and until
that changes (e.g., becomes cloud compatible) I don't see my tactics changing
significantly.

------
sgt101
A post about data science and scientific programming featuring a set of graphs
with no y-axis scale and labels. At my gaff this kind of presentation of data
leads to "scrap the whole analysis and start again".

~~~
aliostad
dude, have you ever used Google Trends? It is a cut and paste from the source.
The y axis is the popularity as the title says. It is so obvious even Google
has omitted it.

~~~
bbcbasic
should have labelled it "Magic Google Unitless Dimension"

~~~
sgt101
You can put all of the terms on the same graph; that would allow people to
compare these on a like for like basis as opposed to "look at the shapes, here
is an argument".

------
vonnik
Deeplearning4j and ND4J contributor here: We've created a distributed
framework for scientific computing on the JVM, ND4J, which is the linalg
library behind Deeplearning4j, which includes ConvNets and other deep-learning
algorithms. Contrary to the author, we believe Scala is the future of
scientific computing. While Python is a wonderful language, the optimization
and maintenance of scalable programs often happens in C++, which not a lot of
people enjoy.

~~~
aliostad
Thanks guys for your hard work.

------
samuell
A look at the trends for "R programming", compared to "Python programming", is
quite interesting too:
[http://www.google.com/trends/explore#q=R%20programming%2C%20...](http://www.google.com/trends/explore#q=R%20programming%2C%20python%20programming&cmpt=q)

(Their curves are more or less parallel since 2011)

------
mollerhoj
I believe the author is right - This is the reason why I'm spending my final
ECTS points on statistics and machine learning.

------
briantakita
What if we define intelligence, not from an anthropomorphic view, but from a
systemic view; as all systems have intelligence.

What is "Artificial Intelligence"? The opposite of "Natural Intelligence"?

~~~
basaah
Saying that all systems have intelligence == anthropomorphizing all systems IF
AND ONLY IF you consider intelligence to be a uniquely human trait.

A more useful distinction for the realm of intelligence could be 'designed
intelligence vs. grown/evolved intelligence' instead of 'artificial
intelligence vs. natural intelligence'; however, stuff like reinforcement
machine learning is then neither, or perhaps a hybrid form of intelligence. In
the end the pragmatic value of the concept intelligence is low, both for
systems as for humans.

------
iamwil
What is deep learning currently applied to besides object recognition in
images?

~~~
tangentspace
[https://www.quantamagazine.org/20141218-machine-
intelligence...](https://www.quantamagazine.org/20141218-machine-intelligence-
cracks-genetic-controls/)

------
z3phyr
So Machine Learning in general is an almost solved problem?

~~~
patkai
I think the author claims more like "it works", not that "it is solved".

------
stared
Plug: I made a list of software that is useful for scientists:
[https://gist.github.com/stared/9130888](https://gist.github.com/stared/9130888).

------
michaelochurch
First of all, I hate this "Agile" nonsense. I've seen it kill companies. It's
truly awful, because it gives legitimacy to the anti-intellectualism that has
infected this industry. It's that anti-intellectualism that, if you let it,
will cause a rot of your mathematical and technical skills. Before you know
it, you've spent five years reacting to Scrum tickets and haven't written any
serious code, and your math has gone to the birds as well. It's insidious but
dangerous, this culture of business-driven engineering mediocrity.

I hope that it'll be the fakes and the brogrammers who get flushed out in the
next crash. Who knows, though? Obviously I can't predict the future better
than anyone else.

To me, Python doesn't feel like a "scientific" language. Python's a great
exploratory tool, and it's got some great libraries for feeling out a concept
or exploring a type of model (e.g. off-the-shelf machine learning tools). That
said, science values reproducibility and precision, which brings us around to
functional programming and static typing... and suddenly we're at Haskell. (Of
course, for a wide variety of purposes, Python is just fine, and may be a
better choice because of its library ecosystem.) I do think that, as we use
more machine learning, we're going to have a high demand for people who can
apply rigor to the sorts of engineering that are currently done very quickly
(resulting in "magic" algorithms that seem to work but that no one
understands). I also agree that "deep learning" and machine learning in
general are carrying some substance, even if 90% of what is being called "data
science" is watered-down bullshit.

I still don't feel like I know what a "scientific programmer" is, or should
be. And I'd love to see the death of business-driven engineering and "Agile"
and all the mediocrity of user stories and backlog grooming meetings, but
there's nothing yet that has convinced me that it's imminent just yet. Sadly,
I think it may be around for a while.

~~~
vskarine
Curious, what are the software development processes that you do not consider
nonsense?

~~~
michaelochurch
I'm negative on "Agile" because it's an attempt to patch closed-allocation,
business-driven development. If you have a high level of talent and you're
solving interesting problems, you can do open allocation. If those aren't the
case, and for some reason can't be, then you need to take different approaches
entirely (but you should seriously question whether what you're working on is
worth doing in the first place).

~~~
bkeroack
"Agile" in many shops means letting the passengers fly the plane (to borrow a
phrase from one of your blog posts).

