Hacker News new | past | comments | ask | show | jobs | submit login
From PhD to Data Scientist: Tips for Making the Transition (insightdatascience.com)
101 points by jakek on July 31, 2013 | hide | past | favorite | 87 comments

Sweet, according to his list I'm over-qualified. Interesting to think it would be so easy to make the transition to data science. Except I can't imagine wanting to work on less important problems than the ones I work on now.

Global food security vs. social network analytics. Yeah, fuck the money.

edit: calling all data scientists - why not consider becoming a computational biologist? We have hard problems, real outcomes that affect people's lives, and not much money.

I am graduating phd bioinformatician, most likely going to transition into industry. It's very easy to be caught up with the self importance of academia because you are essentially in a bubble. It's great to be passionate about science, but I really dislike religifying academia. It's almost expected of aspiring academics to live like monks and just to be okay with shitty pay and long hours. That's bullshit and academics take it while constantly assuring themselves that "it's important and they love it". I am sorry that I am coming off as extremely cynical, but I really don't think propagating the idea that pursuing pure science is somehow more virtuous than other professions helps with the situation.

And in my opinion, as inexperienced as it might be compared to more established scientists, computational biologists are ready for biology, but biologists are not ready for computational biology.

There is definitely an academic bubble. The infamous ivory tower. But in biology specifically, many of the problems are objectively important (as judged by society). And some people really do love spending all their waking hours working on them, and don't give a crap about the money.

I wouldn't call it virtuous, but it is deeply intellectually satisfying.

Agree about biologists not being ready - computational biology needs more computer scientists, not more biologists.

One bubble which could use piercing is the hard science one (please, humor me).

Why do you think "improving the efficiency of photosynthesis" will have a greater impact on global food security than improving the efficiency of social and commercial networking? If I'm not mistaken, economists (eg Amartya Sen) agree that food insecurity is caused by dysfunction in the distribution mechanism, not by a lack of supply (so growing more food won't necessarily help).

IMHO, the important problems in the world are much more social and political, than technological. The work twitter does may easily have greater beneficial impact, direct or indirect (Arab spring and all that), on global food security than working for Monsanto on GE crops. I wouldn't be so self-righteous for choosing to work on "hard" science problems. Are you really doing it for the benefit of the world, or just for the deep satisfaction of your own curiosity?

That's an excellent point. Food insecurity is caused by a wealth of factors including social, logistic and agricultural. In addition to the problems you highlighted, were are currently approaching the maximum yield capacity for many crops, and are pushing the maximum land under cultivation for some.

There are huge problems to solve in all those areas. The biological problems are made more important by the lack of progress in solving the world equality problems. By 2050, when the world population is something in the region of 9-12 billion, either billions will be starving or we will have solved one or more of those problems. The science problems are tractable, while the others are ill defined and involve many factors we cannot control, so I think there's a stronger moral imperative to work on the science.

The other consideration is that working in a job that, by chance, invokes positive social results is not equivalent to working directly on trying to solve a problem. Progress in science suffers because there aren't enough good people working on these problems, because so many of them are seduced by industry.

I don't work for Monsanto; that's a straw man. We're talking about academic computational biology jobs.

The answer to your last question is: both. I couldn't do a job where I didn't satisfy my curiosity. But I know working in tech would do that just fine - there are hard problems in many fields. I chose science because I want to use whatever skills I have to try to solve the problems I see.

I completely agree. I work in plant sciences (as a bioinformatician) but my background is in econ, pol sci, and stats. Unless things have changed since I switched fields, most global food security problems are market related. There's more than enough food, and we're really good at getting it all around the globe quickly. Also, huge amounts of food are lost due to post-harvest losses. I'd bet a beer that in terms of absolute food weight to mouths, post-harvest research may have higher benefit/cost ratio than photosynthesis research. Evolution has been pretty damn good at getting photosynthesis as efficient as possible (read R. Ford Denison's Book Darwinian Agriculture for this point argued well).

Having experience in both fields, I don't work in plant genomics because I want to feed more people (I do, but if that's what solely motivated me I'd be working under economics still). I do it because genetics is awesome, and plants are great to study.

But, I'd argue that the hard sciences are always a good worth investing in. Being capable of trying to understand our world with the scientific method is something that is uniquely human. We should use this talent as much as possible. Drosophila (fruit fly) genetics is a great example. Decades ago, drosophila was chosen because it was cheap to grow in a lab and had a short generation time. Yet through drosophila we've learned so much about genetics, development, and evolution in ways that are just unparalleled. Yet Sarah Palin[1] and others attack it as a silly waste of money. If we'd have limited drosophila research decades ago because we didn't think an organism with a ~700 million year split with humans would be useful for us, where would be? Much stupider, and much worse off. Basic science matters, big time.

[1]: http://www.youtube.com/watch?v=HCXqKEs68Xk

Evolution has been good at getting photosynthesis as efficient as possible in the most extreme cases. In rice and all other C3 plants, it could be at least 50% more efficient in the majority of field situations. That's what we work on. Projected yield improvements are on the order of 50%.

Absolutely agree about the importance of post-harvest problem-solving, but I disagree about the benefit/cost ratio. There are a few key things in photosynthesis which, if achieved (which won't cost that much), could have massive benefits. In post-harvest research there are many small, localised problems that change over time. It's a less tractable, but extremely important, set of problems.

It's too bad you got downvoted. Your first question was pretty good.

Glad to see someone's enjoying it. I think I came out my program more jaded than the average student.

I agree that there are intellectually satisfying problems to solve. However, without getting into the tedious debate on the values of basic science vs translational science, how much of that intellectual satisfaction is mental masturbation?

Are these problems really that important? How much of the cool intellectual questions will directly give you a meaningful biological interpretation? Perhaps this is more of a comment on our field. I found a lot of the intellectual satisfying questions during my phd to involve algorithms/data structures, which mostly are just the tools to get at the biological interpretation.

> shitty pay

I can see this complaint in humanities academia, but pay in the sciences past the PhD student level is pretty reasonable. You could probably make more elsewhere, but it's not like you're scraping by on ramen noodles as a bioinformatics professor or anything. Postdocs typically make $50-60k, and professors start at something like $90k at the minimum, easily up to $120k, $150k, or more after tenure, especially if you're in a hot area like bioinformatics, have made a name for yourself, and can get a position at a top-30ish place. Unlike in tech, those salaries often come in places with a lower cost of living than SF, too (at least if you want them to). Six figures goes pretty far in Atlanta, Austin, Urbana-Champaign, Ames, or Raleigh, for example.

You could beat that in industry, but either way you're making solidly in the top 10% of U.S. salaries. And if you really need more money, most universities will let you do 20% consulting time, or do a spinoff startup. There are admittedly other reasons not to go into science academia (the list is pretty long, actually), but fear that you'll have to take a vow of poverty doesn't seem like a strong one.

NIH trainee pay scale: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-12-03...

Depending on the institution, you may make slightly more than the minimum (starting at $39K), but postdocs in the sciences do not 'typically' make $50-$60K. As a monetary investment, academia is about as poor a bet as you can make: spend 5-6 years making ~$25K then another 3-5 years below $50K. Then you might be able to start making professor money if you're in hot field and willing to sell your soul to your work.

Interesting, that's lower than the people I know. They lean more towards the machine-learning side of bioinformatics (plus people in straight AI, not bio-related), and generally make around $50k, some $55k, at research institutions in the US.

Sounds like recent American graduates may want to start reading the job listings in Europe, though. A postdoc where I teach in Denmark has a minimum civil-service salary of ~$55k, and in Switzerland the going rate is well above that: a friend works in Lugano, fresh out of grad school, for somewhere in the neighborhood of $80k, although that's a bit above the norm. Postdoc candidates with strong tech skills have good demand at institutions with large EU projects, so they're not lottery-win positions either.

Postdocs in biological/physical sciences are easily 3-6 years for < $60k (generally closer to $40-50k in the past 5 years), and the chances of getting an academic position with meaningful salary are very low. Postdocs are plentiful, positions are scarce.

Meanwhile the postdoc could have been making $100-120k for a profession where the job market is almost the polar opposite, which makes a difference for job security/stress. The 2x-4x salary difference, especially during the late 20s and early 30s, is a pretty big deal, especially if you're able to save that extra 2x-4x.

The even crappier pay in humanities makes it basically undoable except for the upper class or outsizedly talented. The low pay in science makes it doable for normal people, but there is a strong financial incentive not to.

This response tells me you've never been in the science trenches. I've never heard of a postdoc making anywhere close to 60k, even at elite schools in high COL areas. The financial opportunity cost is staggering.

Depends on what you do...

Straight wetlab postdocs are usually around the NIH levels (~40K). For computational postdocs (especially if you have a good biology background), 50-60K isn't out of the norm.

Well, you'd be incorrect, since I'm currently a science academic. Have you checked what Stanford, or Georgia Tech, or UT-Austin pay postdocs with machine-learning or data-mining experience, in the past 5 years? There are definitely areas that pay less, but bioinformatics, if by that you mean people with serious computational skills, pays above the norm.

Getting a postdoc at Stanford has the same probability as playing in the NBA. Only you get paid $60K instead of $60M.

If you have strong machine-learning experience and a few good publications in the current market, getting a postdoc at a top institution is nowhere near NBA odds. I don't know what the odds are specifically for Stanford, but if you apply to whoever has openings among top schools, there are many each year. If you know something about biology and a lot about machine learning, labs might even recruit you rather than vice versa.

This has certainly been my experience as a recent PhD in computational biology. I pretty much have my pick of Post Doc positions - I was getting offers before I even graduated. I was also able to negotiate 50k without much trouble, but I'm definitely worried about the opportunity cost. Giving up 100+ k for more than a year or two seems like a poor decision.

Also a Post Doc from a top lab directly correlates with how much $$ you can make in industry.

Also, finishing a post-doc from a top lab does not mean you will get a professor job at a decent school.

Those salary expectations only hold in the very top tier of universities.

From [1]: Median starting salaries for assistant professors are more like $75k. Median full professors--who are nearly 50 years old--are earning $120k. (Admittedly this does not control for field.)

That's about what a green PhD gets offered at age 28 for a data science job in SF.

[1] https://chronicle.com/article/aaup-survey-data-2013/138309#i...

My biochem friend just accepted a post-doc at a respected lab for $39k.

For just wet-lab people, this is still a little low, but not by much.

I spent so long at uni they started paying me to stay - not quite got a PhD yet so I'm still kicking around the system, but unless you're very lucky uni is a pretty crappy place to work. A lot of what used to be good about working at uni has been stripped away by increasing class sizes, increasing bureaucracy and decrease in discretionary time.

A couple of years ago I got a long term freelance gig which I was describing to a professor I talk to about once a year the other day. He said:

"Interesting work, good money, and they leave you alone to get on with it. Sounds brilliant".

Although my current work has next to nothing to do with what I did at uni, it was a valuable experience. However I don't think most people are as lucky as me.

The majority of PhDs will not obtain permanent careers in science. In the UK, it's less than 4% (0.45% professors) [1].

Most of your peers--and perhaps even you--will find themselves searching for careers in a new field at some point. Let's not badmouth them for taking a good opportunity.

[1] Figure 1.6 of http://royalsociety.org/uploadedFiles/Royal_Society_Content/...

I don't think I badmouthed them, I just said I can't imagine wanting that. Being forced into it is another matter. I totally sympathise with anyone who is forced out of science due to lack of jobs.

Ive got strong convictions too and the big data fad (real or not its still a fad) seems specious and unfulfilling. But as a PhD candidate who has had his funding cut and grant proposals continuously denied since sequestration, money of any kind is starting to sound good right now.

true - I might be whistling a different tune if the money runs out

Not everyone has the luxury of working in the field they got a PhD in. Two of my friends were looking into switching to data science positions because they were having problems finding positions in their fields. One was on her second post-doc as an astrophysicist, and the other is a soon-to-gradute biochemist. Both did find positions in their fields, but not before much fear and existential angst.

I can certainly see how the harsh realities of the job market might force one's hand. It sounds like your friends' priorities were the same as mine though...

Actually, no. One of them actually preferred a data science position, and was scheduled to interview for this very program. But that person's spouse was not okay with moving to California, so that option was out, and a post-doc position came through.

calling all data scientists - why not consider becoming a computational biologist?

The pay is shit, you're at the whim of the funding moods of the day, and contrary to your last statement most of the results don't really ever go on to affect anyone.

I agree about the pay. It really sucks.

But the other things depend what you work on and where. If it's cancer or food security, the funding is there and not going away. And you can choose how direct the outcomes are by choosing the position.

I'm not saying everyone should do it, but if you're good enough to breeze into highly paid positions at top tech companies, you're good enough to get a really interesting position in computational biology.

Yeah, it can be interesting but often times it's almost the same sort of thing as any lousy computing job on a day to day basis, it just pays worse, with weird academic attitudes and bureaucracy tacked on. Plus, when I did it you were stuck using some janky Perl scripts and whatever bogus Java package was promising to replace the perl scripts of the day.

I worked in a lab at HMS that sounded interesting on paper but wasn't all that interesting in practice. The researchers did the same stuff as you would at The Office, they checked ESPN.com, went to meetings, typed in some SQL and Perl codes for a while, went on coffee break, complain about something, go to another meeting, fart around with the design of their conference poster, etc ad nauseum. Then they all just went to go work at some big corporation, anyway. The upside was low expectations so I was able to work almost full time as a contractor on something interesting at the same time.

That sounds tedious, but it's not my experience. People do leave to work at big corporations because the stress and relatively low pay of academia drives them to it. But the work here is pretty exciting.

You're everything that everyone hates in academia. Congratulations as I hear that self importance is one of the key ingredients to solving the biggest problems facing the world today.

Well, I would posit that the majority of data science positions are in advertising and marketing optimization. I would believe that 'global food security' may be more important than social media analysis, as the original posted speculated.

Not to belittle the meaningful point, but becoming a "data scientist" at a startup or large corporation that makes their money by advertising is analogous to becoming a 'quant' on wall-street in the 80's. You have to be in it to make money, rather than caring about the types of data the developed algorithms are applied on.

I mean, developing a new algorithm for data analysis could be just as important to the OP as analyzing bio data to forecast some sort of disease is to someone else.

Actually advertising and marketing optimization is a very critical problem that hasn't been completely solved yet, especially when privacy concerns are taken into consideration. It is critical towards keeping the Web free and create a more inter-connected economy. While this will obviously have the side-effects of "frivolous" analytics, there is indeed a dearth of enough hybrid practitioners-researchers in data science today. It will probably saturate in 5-10 years but I'm no analyst to predict that.

I doubt that. Firstly because most people don't hate anything in academia, and secondly because you know very little about me.

Perhaps my short comment sounded more arrogant than I am... I'm motivated by wanting to help people. If given the choice between trying to help alleviate starvation for little money and trying to optimise advertising on some website for a shitload of money, I'll take the former.

Interesting that you think moral judgements are an indicator of self importance.

But couldn't optimizing advertising on some website for a shitload of money lead one to develop a novel algorithm or modeling framework that had applicability to diverse fields including alleviating starvation?

Is culture important? Is The Big Lebowski frivolous? Is Old Navy Performance Fleece is a waste of time? Should the cast of SNL all quit and start learning R? Do those folks not pay taxes and thus support most academic research?

I just have to challenge the assumption that it is obvious which things are important, moral, and noble and which things are frivolous. Perhaps in hindsight those things are clear. History will be the judge, as a wise man once said. Or maybe he wasn't wise. Or maybe he made some unwise decisions and learned from them. Or maybe it doesn't matter, and I'll give him the benefit of the doubt because the secret to happiness is thinking happy thoughts.

> Interesting that you think moral judgements are an indicator of self importance.

Definitely not. But the way one expresses them certainly is.

Calling all data scientists -- why not consider working on medical and/or biological data instead?

There is not necessarily less money in these fields, but a much much greater potential impact!

Maybe you could help me out here. I'm split on the whole data science in research vs business. I'm an undergrad Senior majoring in CS (minors in math and Computational Science). All the data science jobs I see for research firms require a PhD. I'd much rather work for a research firm than as an analyst for a business (I think). Any suggestions for someone looking to get some experience before pursuing more schooling? (not like I don't enjoy my classes but I'd rather not drop the dough after undergrad if I can get decent experience and a salary to help pay for a graduate program)

Depending on where you are in the world, you could consider applying for computational jobs at some of the biotech giants (or startups, depending on your cultural preference). There are plenty on the US west coast and around Boston, Seattle, etc.

An alternative is to get a programming job doing something relevant (e.g. something with applied machine learning) and use those skills to work on open-source bio projects in your spare time. You'd then have some money, relevant experience, and demonstrated interest which could be a good foundation for graduate work if you decided to go that route, or for a career in data science if you don't.

Cool. Thanks for the advice. My game plan was pretty much just that. Get a solid programming job doing (preferably) some stats work after graduation (maybe the company will pay for grad school?) and then move on from there. I just wanted to make sure there wasn't one weird trick to landing a research job in data science.

Most jobs in high tech that push on the fronteirs of scientific knowledge that have a group leadership role need a PhD in charge of the team. Back when I worked in biotech, around 3/4 to 5/6ths of the team leaders had PhDs. I wouldn't expect this to have changed. For a commercially viable biotech company, salaries are pretty decent at that level.

That makes sense though. One would definitely want someone in a leadership position with a lot of experience and knowledge, especially at a biotech firm.

really messy data with political silos surrounding access to it and often a really shitty sample size:feature space size ratio.

Not to mention frustration surrounding funding for primary data generators and then all the other problems related to the extremely competitive world of academia.

What skills do we need to learn to get into such a position? That is apart from statistics and programming? How much effort will go into learning that stuff?

If you had most of these things I think you would have a good shot at a compbio position:

- statistics, probability, and especially probabilistic inference

- nix/gnutools

- multiple scripting languages (Ruby, Python, Perl, BASH)

- at least one data-oriented language (R, Octave)

- understanding of molecular biology (read Molecular Biology of the Cell)

- applying machine learning tools to new problems

- understanding the major high throughput biological technologies and the kinds of data they produce, along with the current tools used for processing the data

You could pick up all of that in a year of intense self-study, and less assuming you already have some of those skills.

This is something I would really love to do. Where do you work (in academia, I presume)?

I have the programming background, and a bit of the bio background... but I am weak on statistics. How much of statistics and probability theory would I need (beyond a basic 1st-year college level)?

It's hard to say exactly, but if you can work through all the problems in Barber's Bayesian Reasoning and Machine Learning, and some other standard 'frequentist' stats text, you'd be well placed to get started.

My profile says where I work.

A lot of statistics. It's core.

It's unfortunate that biological statisticians have hijacked the term 'computational biology'. There's still a lot of computer science to be done in the area, particularly in genome assembly what with new sequencing technologies appearing every few years.

Certainly new algorithms and data structures are going to be crucial (e.g. Bowtie), but statistical analysis/scoring, even in a heuristic way, is always going to be an essential component just from the nature of the work being mostly about evaluating hypotheses from evidence.

I should have added data structures and algorithms to my list. De-novo assembly, alignment, phylogeny, and pretty much all sequence work rely heavily on advances in maths and comp sci.

Technical skills aside, the best piece of advice in the article is "show them that you want it."

I've conducted countless interviews / hires where it basically went: candidates P & Q are the best on paper and in person, but candidate P said x, y, z or did a, b, c, and seems to really want this job and work in our company

x, y, z was sometimes as simple as enthusiasm, and other times was in describing what he/she did in their spare time. a, b, c was usually a project for work, school or fun that was highly relevant.

Intellectually, I think I know that "enthusiasm" is a poor / weak predictor of success. But, emotionally, it's a go-to tie-breaker.

Should I start putting every substantial R/Python script I write, even if they are based on some tutorials, on the Github/Personal-Website? Is that how I "show"? I missed the Github bus for all my previous projects.

What do you mean you "missed" the github bus? If you still have the code saved somewhere, you can just create a new repo and put it up there.

No need to point to projects that are based on tutorials. Lots of githubs are nothing more than that at this point.

If you're going to go the coding route, put up a working page, publish a blog entry about it, publish a working app, etc.

The key is to show effort (I spent time on this) and relevancy (I'm solving a problem that you might care about).

I'm currently finishing a PhD in economics and have spent a lot of time learning the exact technologies he suggests (Python, SQL, a bit of R). Working as a data scientist would be an awesome opportunity. But are most companies _really_ in need of so many data scientists, or is it just a trend?

I think it's a new name for an old thing, lots of jobs through the last century had things like "analyst" attached to them. Business has been about measuring things for a long time, look at Taylorism or Gosset at Guinness in 1899 for example.[1][2]

A few little 1% gains from some A/B tests, or looking at geographic breakdowns of customers from IPs or addresses add up.



"Data scientist" is a terribly broad term that's begun to encompass a lot of jobs that used to be called "junior analyst" or somesuch.

There certainly is an overlap. But for me, the difference between a Data Scientist and an Analyst is that the Data Scientist uses the data to build stuff (eg a recommendation engine, or a forecasting system). They can also use more technical skills to aid analysis (nlp, scripting, etc).

It is a trend. The question is rather, is it a trend that is likely to persist? And that depends on whether you believe that organizations are likely to capture and store more data or less. If you believe the answer is 'more' then the problem becomes deriving insights from it. And that process - is data science.

Well, if you believe Insight's white paper (http://insightdatascience.com/Insight_White_Paper_2013.pdf), the answer is "Yes".

Dang, the graphics in this are terrible and unreadable. But it "looks" positive.

Thanks for the link. Do you know anyone who went through the program?

Actually, yes. I had dinner on Monday with the author of this article. Obviously Insight worked out very well for him.

This may be too forward, but could you put me in touch? Going the data science route is an option I've been seriously considering for awhile, and this might be the chance I've been looking for.

If you're not comfortable with putting me in touch, that's fine.

Thanks much!

Have you already started into your specialty? Maybe you should do econometrics.

I'm far into my fields. I do econometrics, computational economics, and industrial organization.

"Recursive programming"... as in, programming using recursion? Why would this be important to "data science"? Surely loops are just as effective.

Are you serious? How would you iterate over a set of rules and a big data volume, without using recursion?

A loop? Can you please explain?


I assume the "explain" refers to the most important thing you can learn about recursive programming: that it is often but not always the least efficient strategy when compared with looping.

So the claim that recursive programming is the only or primary method of iterating over large data sets requires some explanation...

I haven't read the article but i'm guessing this a reference to divide and conquer methods. Such as map reduce used in hadoop and such.

Any possibility for a dev-minded MBA (finance) to make the data science transition? I was pretty good back in the data with respect to R

There is no magical set of qualifications to become a data scientist. Just learn enough linear algebra, probability. Show people you can code. Maybe setup some github projects. It is not like people in tech are doing something magical with all these fancy data scientists. A little bit of math, a slap and dash of code.

Are data scientists whom are in demand today dealing with neural networks and machine learning, or are a large majority still working with large sets of data and running correlation analyses/regressions? Your response above seems to indicate that it's not overly complicated.

It depends; I personally haven't used neural networks since graduating. Standard machine learning algorithms get used. I work with large data sets all the time. Sometimes correlations, regression things. The point is that 95% of the stuff that gets used on a day to day basis is not hard to learn. Especially if your background in linear algebra, probability and statistics is good. The five percent that delves into more complicated things can be figured out on the job.

Thank you for you candid and thoughtful answer :)

Should have studied harder in undergrad. All the current "Fellows" are from top tier schools :(

For those looking to make the transition to data science, another option is Zipfian Academy (http://www.zipfianacademy.com/). No PhD required.

No PhD required, but you're expected to pay 14k in tuition. In contrast, Insight pays you.

If you want to pay money for experience, why not get an actual degree from an accredited institution?

A more realistic alternative to Insight is to do a (paid) internship at a tech company. This is the path I took.

Often advanced degrees at universities are much more expensive (http://datascience.berkeley.edu/admissions/tuition-and-finan...) than an intensive program such as Zipfian and take much longer. I hope that private education institutions (such as GA, Hackbright, Dev Bootcamp, etc.) can coexist happily with traditional universities, as they each fill a different niche. Universities are in the business of training researchers and professors (and do a great job at that) while alternative educational companies aim to produce industry practitioners (similar to trade schools).

I highly recommend internships and they are wonderful if you can get one. Unfortunately not everyone can be so lucky, either due to lack of experience/technical abilities or an advanced degree (not everyone goes to college). I believe these alternative educational routes are democratizing such industries and many of them offer scholarships and tuition assistance programs.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact