

Show HN: A data science fellowship to solve the world’s toughest problems - pyduan
http://www.bayesimpact.org/fellowship

======
micro_cam
I appreciate what you guys are trying to do but I can't seen many
mathematicians or statisticians applying for this unless you provide a little
more information about what these "hard" problems are.

Honestly it reads like your offering basic in training in a a random selection
of tools and then hoping some non profits present a problem with nice clean
data that can be solved through application of a few methods from
scikit.learn.

If you wan't to attract math people my suggestion would be to identify a few
intriguing and hard problems a head of time and taking applications
specifically for them...you can always suggest a change if you think an
applicant would be better suited to a different one. Providing intriguing
problems that might match up with peoples pre existing research interests is
key...there is lots of room for cross pollination and growth but a bayesian
statistician is going to be much more intrigued by something that might
benefit from a hierarchical model then something that needs ODE's or online
convex optimization.

Worse 4-6 months might not even be enough time to formulate a problem that
needs a solution and get the required data in place. Non profits are generally
extremely overworked and take a long time to do things. They will not have
their data in anything resembling a database or standardized format...think
short hand notes in word files if you're lucky. Identifying people and data
you can work with on this end a head of time is key.

For the record I work for a non profit analyzing complex diseases and my
background is in math. I've also sat on the board of and been involved in a
few other non profits.

~~~
pyduan
Paul from Bayes Impact here. I appreciate the sentiment, though in all respect
it does seem like most of your concerns are addressed on the website, either
on the fellowship page or in the others.

> unless you provide a little more information about what these "hard"
> problems are

The second paragraph does go briefly over the problems we are currently
working on (granted, not in much detail for the sake of brevity, but enough to
give an idea of what type of challenges they are). There is a little bit more
information on the front page, but granted since we started Bayes Impact two
months ago we haven't been able to put as much work into the website content
as we'd like to.

> Honestly it reads like your offering basic in training in a a random
> selection of tools

This is simply not the case -- while their level of experience varies, our
current fellows actually comprise some well-established data scientists in
their own right. It is precisely because the problems worth solving are
_tough_ to solve that we need to round up talented individuals who are able to
commit to working on social impact projects full-time and pair them up with
industry and domain experts who have the domain knowledge but may not have the
time.

They each bring their own set of skills -- for example, someone who built
Lyft's grid optimization system might be uniquely suited to help save lives by
improving ambulance and fire truck dispatch and reducing average emergency
response times.

> and then hoping some non profits present a problem with nice clean data that
> can be solved through application of a few methods from scikit.learn

This is precisely the point of Bayes Impact and why a longer engagement model
such as fellowships is needed in the space (most current data science for
social good organizations work on a volunteer basis model), so we have the
time to build these longer relationships with nonprofits to leverage data
science even in cases where data is messy or sensitive. We go a little bit
more in-depth about it on our article here:
[http://blog.bayesimpact.org/blog/the-bayes-impact-
mission/](http://blog.bayesimpact.org/blog/the-bayes-impact-mission/)

> Worse 4-6 months might not even be enough time to formulate a problem that
> needs a solution

This is why they're not 4-6 months, but typically 6-12. We do have a pilot 3
month program in the summer for problems that are comparatively easier to work
on.

> and then hoping some non profits present a problem with nice clean data that
> can be solved through application of a few methods from scikit.learn

This is why we have a fellowship application page and not a project
application page -- we actually tend to identify and scope projects ourselves.

On that note though, I want to point out there is no need to be so overly
dismissive of the work nonprofit and civic organizations have been doing in
collecting and storing clean data. For example, most fire departments we
talked to had surprisingly good data, and some such as the Fire Department of
New York had even started initiatives of their own to use data science to
improve their processes. For example, by integrating building permit data with
their own systems, they've been able to direct inspectors where fire were
predicted to be more likely to occur.

One direction we've been headed towards is seeking these data-educated
organizations to create pilot projects, then use the results of these as a
basis to export these solutions in similar institutions whose data practices
may not be as good. In that end, we are helped by some data engineers from
companies like Splunk or Cloudera so we do believe in working with these
organizations in the long run to bring them up to speed. This is precisely the
problem we're trying to solve with our model!

> For the record I work for a non profit analyzing complex diseases

Then you might be interested in the project we are doing on Parkinson's with
the Michael J. Fox Foundation! Feel free to email me for more details.

~~~
micro_cam
I'm trying to offer constructive, if harsh, criticism based on my own
experience which includes recruiting for similar positions and working with
large and small 501(c)(3)'s.

I don't mean to come off as dismissive but to suggest that your write up is
vague to the point of being easily dismissed and provide feedback on how
someone from outside your local peer group might read this.

And there are organizations out there with great IT and clean data but I and
most people in this field have lost months writing hideous combinations of NLP
and regular expression to pull data out of old medical records and things and
hand validate it or correct for batch effect in supposedly clean data.

I think that fleshing out the projects and areas of investigation you guys
already have lined up would go a long ways towards addressing my concerns and
making the program more appealing to the typical analytical folks i've worked
with. I'd also suggest focusing the intensive course on analytical methods not
the tools, this is what will intrigue people with expertise. At the moment it
reads like it is focused at people new the the field with no programing
experience.

What data sets/types are you using for the Parkinson's thing? My main focus is
on analysis methods that resist the noise, imbalance, heterogeneity and other
issues typical in extremely wide/multivariate genetic+clinical+proteomic
studies...a few sentences about the study in the write up would have told me a
lot about if my skills could be useful. (I'm not looking to relocate but I am
always open to collaborations and correspondence with people working on
similar things.)

~~~
pyduan
As I said earlier -- I definitely appreciate the sentiment, and constructive
criticism is always welcome when actually substantiated. I also took your post
as an opportunity to elaborate a bit more on our model so my post got longer
as a result.

> And there are organizations out there with great IT and clean data but (...)

This argument also works the other way round -- there are organizations out
there with terrible data (and this is especially common with medical data),
but there are also many high impact projects for which the data _does_ exist
in a workable form that are begging to be solved (and that we are actually
working on solving). We are focusing on these in the short term, while laying
the groundwork for the others in the medium-long term (both through the
research arm we are building, and our data engineers). There is no reason not
to get the low-hanging fruit first.

> I think that fleshing out the projects and areas of investigation you guys
> already have lined up (...)

Agreed. Since we created Bayes Impact two months ago our main focus has been
on building the program from scratch and working on the projects as well, so
the website has unfortunately taken a backseat. Another problem is that
government organizations are very sensitive about communication and we can
only communicate about our projects on their timeline. This results in us not
having a website as fleshed out as we'd like, but this is par for the course
for a new organization.

> I'd also suggest focusing the intensive course on analytical methods not the
> tools

Ah, I just saw the paragraph you're referring to. I get how the language may
be a bit confusing and will make the appropriate changes -- our goal is
actually to do the opposite: we bring on individuals who already have the
analytical methods but some may not have had exposure to best industry
practices. Because we focus on building _production_ systems and not just
write case studies, it's important to bring them up to speed in that minor
respect. This is why we can spend only a week teaching tools -- teaching
analytical methods to people without the required background would likely take
much longer, which is not our target audience.

At a broad level we simply provide an avenue for data scientists to work on
social impact problems in collaboration with domain experts, with us taking
care of the overhead of scoping projects and doing the dirty work of acquiring
and preparing the data as well as defining the implementation strategy. We
also smooth out the edges in our Fellows' backgrounds if any but this is
really not the core of the program.

Fortunately the pool of applicants as well as our current fellows does not
seem to echo your fears but I'll review and see which changes to the
fellowship page could help remove ambiguities in the future.

Hope it helps clarify. Regarding the Parkinson's project, feel free to reach
out to me by email -- unfortunately we need to wait for the press release from
the MJFF and the other partner before I can actually communicate about the
details publicly.

~~~
danelectro
Seems like you've got big data problems to solve and data scientists up the
wazoo.

I would think the missing element would include avant problem-solvers,
regardless of (advanced) degrees or not who are as outstanding in that
specialty as the data scientists are in theirs.

------
murtza
To get more exposure, consider posting the fellowship to these subreddits:

[http://www.reddit.com/r/datascience](http://www.reddit.com/r/datascience)

[http://www.reddit.com/r/datasets/](http://www.reddit.com/r/datasets/)

[http://www.reddit.com/r/statistics](http://www.reddit.com/r/statistics)

[http://www.reddit.com/r/machinelearning/](http://www.reddit.com/r/machinelearning/)

If you have not already, I would recommend reaching out to these companies to
sponsor: Cloudera, Palantir, New Relic, Tableau, Domo.

~~~
ajiang
Awesome - thanks for the feedback. We're indeed going to post to those
subreddits and reach out to those companies to potentially sponsor us. If you
know a good contact, we'd love to be introduced!

~~~
denzil
You will also probably find people interested in this on:
[http://lesswrong.com/](http://lesswrong.com/)

~~~
ajiang
We just tried to, but couldn't b/c of the karma requirement :(

------
corydominguez
I love the last item in the FAQ,

> I am a frequentist. Can I still join Bayes Impact?

------
rlazer
This is an awesome initiative. It's good to see an organization using and
promoting data science for something other than "optimizing click ads." Kick
some ass guys!

------
kfor
Always glad to see these skills put to uses besides selling products and
eyeballs!

Here's another fellowship using data science towards non-commercial goals
(global health research): [http://www.healthdata.org/get-
involved/fellowships](http://www.healthdata.org/get-involved/fellowships)

Full disclosure: I participated in the fellowship in 2008.

~~~
ajiang
Hi kfor, the fellowship program sounds really interesting. Do you mind
chatting with our team and telling us about your experience?

------
ntoshev
I have a vehicle routing solution (minimal routes via multiple destinations,
with time windows, capacity constraints, weekly scheduling; it's a website
service on top of Google Maps) that I would be happy to provide for free to
social impact projects. Email is in my profile if you're interested.

------
gulbrandr
This site does not work properly on Firefox, because of cross-origin requests
of fonts.

    
    
      downloadable font: download failed (font-family: "sinkin_sans600_semibold" style:normal weight:normal stretch:normal src index:1): 
      bad URI or cross-site access not allowed
      source: http://d1arcc3qu8ndpn.cloudfront.net/fonts/SinkinSans-600SemiBold-webfont.woff

~~~
ajiang
Thanks gulbrandr! We're fixing right now

------
lightcatcher
For those who think this is an awesome idea, but that don't want to relocate
and/or work full-time, I recommend you check out the similarly minded
[http://www.datakind.org/](http://www.datakind.org/)

~~~
shankysingh
Thanks for the link man, this really look wonderfull.

------
shoyer
Can you elaborate on what a "Fully funded fellowship" means? I'm guess it's
vague because you haven't figured out how much support you'll be able to
provide yet?

~~~
ajiang
Hi Shoyer, one of the founders here! For our fall fellowship, support will
likely be in the range of $4,000-6,000 per month based on experience. We also
provide a fellowship house in San Francisco for our fellows to live in.

~~~
hsshah
Hi ajiang, This is a great initiative. Glad to see Data Science knowledge put
to use for noble causes. I am a mentor in a Data science/analytics program
based in Bay Area where we help professionals looking for a career change to
data science. We are always hunting for interesting projects for them to work
on. Would love to have them work on real projects with noble goals. Love to
connect to discuss this possibility. If interested, please ping me. You can
find my email in my profile. Thanks.

~~~
ajiang
Hi hsshah, that sounds interesting. Shoot us a note at hello@bayesimpact.org -
we'd love to talk!

------
roscoebeezie
This sounds amazing.

