
Ask HN: Breaking into Scientific Programming? - sciprog
Hello HN,<p>I work as a software engineer in the business and web application world.  I love programming and am quite successful at it, but I&#x27;d love to transition into scientific programming and data science.  I have an undergraduate degree in the humanities, so my quantitative background is honestly a bit lacking.  I&#x27;ve considered a second undergraduate degree in, say, math and physics or electrical engineering to fill the gaps.  In an ideal world I pursue my studies independently, but my day job doesn&#x27;t leave me with much free time, and frankly I&#x27;m not sure how seriously I would be taken without a proper degree.<p>Advice from science and data hackers is much appreciated.
======
Q4273j3b
Basic prob & stats:

1\. _Stats: Data and Models_ by De Veaux, Velleman & Bock

2\. _Fifty Challenging Problems in Probability with Solutions_ by Mosteller

3\. [http://yudkowsky.net/rational/bayes](http://yudkowsky.net/rational/bayes)

Basic data analysis:

1\. _Python for Data Analysis_ by Wes McKinney

2\. [http://camdavidsonpilon.github.io/Probabilistic-
Programming-...](http://camdavidsonpilon.github.io/Probabilistic-Programming-
and-Bayesian-Methods-for-Hackers/)

3\. _Exploratory Data Analysis_ by Tukey

4\. _The Visual Display of Quantitative Information_ by Tufte

Tools:

\- R & ggplot2 & (Sweave | knitR)

\- Python & numpy & pandas

\- UNIX tools
([https://news.ycombinator.com/item?id=6046682](https://news.ycombinator.com/item?id=6046682),
[https://news.ycombinator.com/item?id=6412190](https://news.ycombinator.com/item?id=6412190))

\- basic SQL
([https://schemaverse.com/tutorial/tutorial.php](https://schemaverse.com/tutorial/tutorial.php))

\- data visualization: (R & ggplot2) | (Python & matplotlib) | d3.js

\- OPTIONAL: C/C++/Java for hardcore Bayesian stuff, Julia for being cool,
Fortran for specific academic domains

On getting people to take you seriously: If you knew the stuff up there, I
would take you _very_ seriously, even without the STEM degree. You can pick
this stuff up outside the classroom (in fact it might be hard to find uni
classes that cover this stuff). So if you did self-study, and blogged about it
or something, people would take you seriously (esp. if you got good at
something "hot" like d3.js or Bayesian). In fact, given your background in web
/ software / business, you could be considered even _more_ valuable (by web /
software / business people).

What are you interested in specifically? Where do you want to end up?

~~~
sciprog
Thanks for the pointers. My goal is more knowledge than career directed --
it's a bit vague, but when I read documentation for projects like GSL or
Octave, I get the same sense of awe that I had when I cracked my first book on
C. I've also always been fascinated by digital signal processing and other
such applications. Ultimately I'm driven to grok the domain. How I apply that
knowledge is secondary to me at this point. :)

~~~
Q4273j3b
Nice! Yeah, I think it's really important _not_ to be intimidated by stats &
data science. I mean, as ihnorton said, if you want a career in academia, you
need an academic degree. Period. You want to enter a v sophisticated field,
like quantitative finance or Google search or whatever, that degree is really
going to help. But if you just want to surround yourself with good data, good
questions, data-loving friends, then I believe you can teach yourself.
Absolutely.

Why do I think this? Because (1) the day-to-day of "data science" is more of a
craft than a science. Before you run complicated analyses, you are just
cleaning the data, visualizing it, trying to see what questions it could
answer. It feels a lot like playing in the woods to me... you're just knocking
about, seeing what cool stuff is in there, building a fort out of
scatterplots. And a truly wise data analyst will know when to _not_ run tests
at all. Also (2) it would surprise you how many academics are themselves kinda
self-taught. They get into research because of their interest in a specific
topic, and only come around to learning stats later. Think about it! The MCAT
does _not_ test prob/stats, even though doctors end up reading _reams_ of
statistical studies for work. I'm not saying these doctors are ignorant, not
at all... just that after undergrad, they (and a bunch of PhD candidates) find
themselves in the exact same position you're in. They teach themselves, or
they squeeze into stat dept courses, and they all turn out OK.

I would just look for the data that's already around you. Cool data being
produced at work? Ask if you can make some charts in R/Python/Julia to show
your team. The internship suggestion was a great idea. Or you can take on some
cool longish-term project and blog your way through it online. You'll do the
next thing, then the next thing, and so long as you keep the awe, you'll end
up somewhere neat!

------
ihnorton
As Q4273j3b pointed out, improving quantitative skills is a must. The upcoming
Coursera Machine Learning course would probably be a good start (a lot of the
necessary math is introduced in the course).

Regarding the degree, credentials are important (and imperative if you want to
direct your own research), but one option is to start out by contributing to
an open-source project. If you have a specific area of scientific interest,
then be strategic and find a project in that area. To take biology as an
example, I would look at something like CellProfiler (they are on github!).
Also read the papers published by that lab to get a sense for how the software
is used. There are many other open-source scientific software projects, and
contributions to a project could give you a foot in the door to employment as
a developer in the field.

------
Choronzon
Start working on independent visualisation projects using d3.js.Data
visualisation is not data science but 90% of people cant really tell the
difference and if you can build up an impressive visual portfolio you will get
the work you want. Whether you can do the work or not is more dependent on how
you can get through Q4373j3bs excellent reading list however.

Another thing you can do is attack real world problems,there is a shocking
amount of bad data science out there. See:
[http://www.slate.com/blogs/moneybox/2013/04/16/reinhart_rogo...](http://www.slate.com/blogs/moneybox/2013/04/16/reinhart_rogoff_coding_error_austerity_policies_founded_on_bad_coding.html)

------
agibsonccc
Look at this course on the 25th as well.
[https://www.coursera.org/course/scientificcomp](https://www.coursera.org/course/scientificcomp)

It's slightly advanced, but it might be great for you for just taking
materials from to explore the scope of scientific computing at large.

It goes in to things like digital signal processing, some computer vision and
other things.

------
codeonfire
Do you want to do scientific programming or work in scientific programming. To
work in scientific programming you should probably find a PhD program. As a
grad student in the right program you'll probably spend most of your time
doing data science. Once you get the degree you can go back to where you work
now except work on slightly different stuff.

------
kghose
You could offer to intern at a data science place and start in their user
interface/visualization end. Then as you interact more with the people doing
statistical analyses you could figure out if that's something that you would
like and get pointers from them what books to read/courses to take.

~~~
sciprog
That's a great suggestion, thanks. In your experience, have you run across any
self-taught analysts, or do they typically have a background in a quantitative
field?

~~~
kghose
I'm not in industry, but I do a lot of data analysis and enjoy it. I'm
effectively self taught. I think beyond a certain level most people are self-
taught.

