Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Breaking into Scientific Programming?
13 points by sciprog on Sept 19, 2013 | hide | past | favorite | 10 comments
Hello HN,

I work as a software engineer in the business and web application world. I love programming and am quite successful at it, but I'd love to transition into scientific programming and data science. I have an undergraduate degree in the humanities, so my quantitative background is honestly a bit lacking. I've considered a second undergraduate degree in, say, math and physics or electrical engineering to fill the gaps. In an ideal world I pursue my studies independently, but my day job doesn't leave me with much free time, and frankly I'm not sure how seriously I would be taken without a proper degree.

Advice from science and data hackers is much appreciated.




Basic prob & stats:

1. _Stats: Data and Models_ by De Veaux, Velleman & Bock

2. _Fifty Challenging Problems in Probability with Solutions_ by Mosteller

3. http://yudkowsky.net/rational/bayes

Basic data analysis:

1. _Python for Data Analysis_ by Wes McKinney

2. http://camdavidsonpilon.github.io/Probabilistic-Programming-...

3. _Exploratory Data Analysis_ by Tukey

4. _The Visual Display of Quantitative Information_ by Tufte

Tools:

- R & ggplot2 & (Sweave | knitR)

- Python & numpy & pandas

- UNIX tools (https://news.ycombinator.com/item?id=6046682, https://news.ycombinator.com/item?id=6412190)

- basic SQL (https://schemaverse.com/tutorial/tutorial.php)

- data visualization: (R & ggplot2) | (Python & matplotlib) | d3.js

- OPTIONAL: C/C++/Java for hardcore Bayesian stuff, Julia for being cool, Fortran for specific academic domains

On getting people to take you seriously: If you knew the stuff up there, I would take you very seriously, even without the STEM degree. You can pick this stuff up outside the classroom (in fact it might be hard to find uni classes that cover this stuff). So if you did self-study, and blogged about it or something, people would take you seriously (esp. if you got good at something "hot" like d3.js or Bayesian). In fact, given your background in web / software / business, you could be considered even more valuable (by web / software / business people).

What are you interested in specifically? Where do you want to end up?


Thanks for the pointers. My goal is more knowledge than career directed -- it's a bit vague, but when I read documentation for projects like GSL or Octave, I get the same sense of awe that I had when I cracked my first book on C. I've also always been fascinated by digital signal processing and other such applications. Ultimately I'm driven to grok the domain. How I apply that knowledge is secondary to me at this point. :)


Nice! Yeah, I think it's really important not to be intimidated by stats & data science. I mean, as ihnorton said, if you want a career in academia, you need an academic degree. Period. You want to enter a v sophisticated field, like quantitative finance or Google search or whatever, that degree is really going to help. But if you just want to surround yourself with good data, good questions, data-loving friends, then I believe you can teach yourself. Absolutely.

Why do I think this? Because (1) the day-to-day of "data science" is more of a craft than a science. Before you run complicated analyses, you are just cleaning the data, visualizing it, trying to see what questions it could answer. It feels a lot like playing in the woods to me... you're just knocking about, seeing what cool stuff is in there, building a fort out of scatterplots. And a truly wise data analyst will know when to not run tests at all. Also (2) it would surprise you how many academics are themselves kinda self-taught. They get into research because of their interest in a specific topic, and only come around to learning stats later. Think about it! The MCAT does not test prob/stats, even though doctors end up reading reams of statistical studies for work. I'm not saying these doctors are ignorant, not at all... just that after undergrad, they (and a bunch of PhD candidates) find themselves in the exact same position you're in. They teach themselves, or they squeeze into stat dept courses, and they all turn out OK.

I would just look for the data that's already around you. Cool data being produced at work? Ask if you can make some charts in R/Python/Julia to show your team. The internship suggestion was a great idea. Or you can take on some cool longish-term project and blog your way through it online. You'll do the next thing, then the next thing, and so long as you keep the awe, you'll end up somewhere neat!


As Q4273j3b pointed out, improving quantitative skills is a must. The upcoming Coursera Machine Learning course would probably be a good start (a lot of the necessary math is introduced in the course).

Regarding the degree, credentials are important (and imperative if you want to direct your own research), but one option is to start out by contributing to an open-source project. If you have a specific area of scientific interest, then be strategic and find a project in that area. To take biology as an example, I would look at something like CellProfiler (they are on github!). Also read the papers published by that lab to get a sense for how the software is used. There are many other open-source scientific software projects, and contributions to a project could give you a foot in the door to employment as a developer in the field.


Start working on independent visualisation projects using d3.js.Data visualisation is not data science but 90% of people cant really tell the difference and if you can build up an impressive visual portfolio you will get the work you want. Whether you can do the work or not is more dependent on how you can get through Q4373j3bs excellent reading list however.

Another thing you can do is attack real world problems,there is a shocking amount of bad data science out there. See: http://www.slate.com/blogs/moneybox/2013/04/16/reinhart_rogo...


Look at this course on the 25th as well. https://www.coursera.org/course/scientificcomp

It's slightly advanced, but it might be great for you for just taking materials from to explore the scope of scientific computing at large.

It goes in to things like digital signal processing, some computer vision and other things.


Do you want to do scientific programming or work in scientific programming. To work in scientific programming you should probably find a PhD program. As a grad student in the right program you'll probably spend most of your time doing data science. Once you get the degree you can go back to where you work now except work on slightly different stuff.


You could offer to intern at a data science place and start in their user interface/visualization end. Then as you interact more with the people doing statistical analyses you could figure out if that's something that you would like and get pointers from them what books to read/courses to take.


That's a great suggestion, thanks. In your experience, have you run across any self-taught analysts, or do they typically have a background in a quantitative field?


I'm not in industry, but I do a lot of data analysis and enjoy it. I'm effectively self taught. I think beyond a certain level most people are self-taught.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: