
Building a Data Science Team at a Startup - An Engineering Perspective - rnfein
http://kurt.karmalab.org/2012/01/30/building-a-data-team/
======
kreilly
This is pretty spot on with what I've seen up close. Specifically:

1) "Throw [y]our normal engineering practices out of the window." - We treat
Data Science very much like the "R" in R&D. We point them generally towards a
problem and give them time and latitude to solve it. Trying to fit that into
our normal scrum process is impossible.

2) "Data scientists are going to end up building things that need to be
translated into production code." - Our hand-off between Data Science and
Engineering can be pretty messy. Getting stuff into production efficiently is
an ongoing challenge.

3) "Trying to explain some of the hard math that's going on to the entire
company isn't a productive use of time." - This point is pretty self
explanatory. I get in pretty deep on a regular basis and it gets over my head
quick. It can be very hard for the average account manager or marketing person
to keep up.

~~~
PaulHoule
yeah, my experience is that big data + agile planning = fail.

in the typical agile process, calendar time is ignored, and people pretend
that you can just manage punchclock time. the result of that is that you end
up with two days to go to the end of the sprint and a four day job that needs
to run.

~~~
Drbble
Where is a two day delay for delivery "fail". There is no incompatibility with
agile planning. You don’t go on vacation every time a job runs, you work on
another project.

------
etrain
I'll chime in here as a member of a 'Data Science' team at an early stage
company. A lot of this really resonated with me. In particular - I think
several of his points are consequences of the others:

1) We sit on an island, quasi-separated from engineering. We have our own
processes, codebase, and tools, that are mostly different from what
engineering has deployed in the wild.

2) This leads to 'throw normal engineering processes out the window,' which is
good and bad . We write plenty of code, but not much of it makes it to source
control. Why? so much of what you write is "throwaway code" - this, of course,
is no good, but it's very tough to tell apriori what is going to work and what
isn't.

3) This makes getting things deployed to production hard. Your code isn't
written in the Java that the rest of the backend is. There was never the
thought that there'd be a build process attached to it. Deployment? Nightmare.

As far as not scaring the rest of the company with all the math - totally
agree, but only to an extent. If your models are so sophisticated that they
can't be explained to an engineer, or your customers, in plain (insert locale-
specific spoken language), they might be too complicated.

In terms of having an impact immediately - I think this goes for all engineers
and data scientists. Someone should be able to put a fresh set of eyes on your
problems, and be able to solve a new one with enthusiasm.

------
dangoldin
Something more interesting to me is how do you convince a company to build out
a true data science. Every company can use one but very few have one. Has
anyone had any experience convincing management of the value?

~~~
adamio
In my experience if management isn't asking the question, then its a dead
issue. At a large company I worked at - data analysis was flat out rejected.
I've been told to stop doing and return to standard procedure using heuristics
. Even something as simple as introducing a new data model - at a large
company I've been told that "my future replacement" wouldn't understand it.
Management looking for continuity will not accept new ways of doing things.

~~~
dangoldin
Yea - that's what I'm afraid of. I've been trying pretty hard to get people to
view data as an asset and it's very hit or miss. Some people embrace it, most
will agree with me but not do anything more than that, others don't care.

------
aria
A lot of this advice is also applicable for managing your team of magical
pixie fairies.

~~~
gaius
Hit the nail on the head. A "data scientist" is just a business analyst by
day, hipster by night.

------
chwolfe
Kurt mentions it early in the article, but I have to also recommend "Building
data science teams" by Patil:

[http://radar.oreilly.com/2011/09/building-data-science-
teams...](http://radar.oreilly.com/2011/09/building-data-science-teams.html)

~~~
kschrader
I'm hoping to write a couple of follow-up posts talking about some of the
trade-offs that you're going to have to make when you're a startup that
doesn't have the resources of a Facebook or a LinkedIn. It's difficult to do
everything in that (admittedly great) article if your resources are limited.

~~~
larrydag
I'm a data scientist. What sort of resources could I provide to the startup
communities? Is it just consulting needs or could I provide an "data science"
API?

------
john_horton
On the data science models => production code problem, could
PMML<[http://www.dmg.org/pmml-v3-0.html>](http://www.dmg.org/pmml-v3-0.html>);
be (part) of the answer? Anyone have any experience using it?

