Hacker News new | past | comments | ask | show | jobs | submit login

I stood up a data science operation at my company over the last few years, and have noticed a key difference in data-science projects that have been successful and those that have failed. It hits on a number of points brought up in the article, namely where does data science "fit" in an organization delivering software and how is the value realized by the business.

The worst cases I have seen is when executives take a problem and ask data scientists to "do some of that data science" on the problem, looking for trends, patterns, automating workflows, making recommendations, etc. This is high-level pie in the sky stuff that works well in pitch meetings and client meetings, but when it comes down to brass tacks this leaves very little vision of what is trying to be achieved and even less on a viable execution path.

More successful deployments have had a few items in common

1. A reasonably solid understanding of what the data could and couldn't do. What can we actually expect our data to achieve? What does it do well? What does it do poorly? Will we need to add other data sets? Propagate new data? How will we get or generate that data?

2. The business case or user problem was understood up front. In our most successful project, we saw users continuously miscategorized items on input and built a model to make recommendations. It greatly improved the efficacy of our ingested user data.

3. Break it into small chunks and wins. Promising a mega-model that will do all the things is never a good way to deliver aspirational data goals. Little model wins were celebrated regularly and we found homes and utility for those wins in our codebase along the way.

4. Make is accessible to other members of the company. We always ensure our models have an API that can be accessed by any other services in our ecosystem, so other feature teams can tap into data science work. There's a big difference between "I can run this model on my computer, let me output the results" and "this model can be called anywhere at any time."

While not exhaustive, a few solid fundamentals like the above I think align data science capabilities to business objectives and let the organization get "smarter" as time goes on as to what is possible and not possible.




As a person doing data science / ML in the last 4 years, I mostly agree with your points. Especially about the hype driven demand for DS/ML. One thing that is often neglected though is the exploration part it. There really is a lot of data out/in there that your company knows anything about, but can probably benefit from knowing. E.g. even a simple crawl of a popular jobs/ads/... site done diligently for e.g. 6 months can reveal many interesting insights about market structure and trends. Google and its mission to organize all data in the world exist for a reason. This however is in stark contrast with the approach that most executives take. Instead of managing it as a well thought strategic/long term investment, they want to time-box it, to get immediate value and to show off to senior management or customers. I've seen this tendency in both big corporations (mid-level management) and startups, which makes me think that the confounding variable is the fund/incentive management process. In both big corps and startups, there is a limited time&budget to show meaningful results and people optimize for that, which often involves taking shortcuts, neglecting strategy and outright lying. In contrast to that, I've seen projects driven by wealthy individuals, who don't look for immediate value, but are scratching an itch (e.g. curiosity). These usually fare better than the former as long as budgets don't get out of hand (to exhaust the cash cow). I would argue that these are most successful, because of better alignment of motivation (person paying the bill) and execution (person driving the process).


A math friend of mine often consulted for scientists. His least favorite were those who asked him to "make some clusters". (think k-means) "What are you looking for? What is your hypothesis?" "Just make some clusters and we'll see."

Not utterly without merit, but fairly blind fishing nonetheless.


>The worst cases I have seen is when executives take a problem and ask data scientists to "do some of that data science" on the problem...high-level pie in the sky stuff that works well in pitch meetings and client meetings...

I'm been in various external and internal facing Data Science roles for 8+ years and this is spot on. IME it's the #1 reason Data Science projects "fail." If you can replace "do some of that data science" with "do some of that black magic" that probably means nobody actually checked to make sure the data and problem made sense in the first place. But somebody somewhere already committed to it, so the Data Science team has to deliver it.


> The worst cases I have seen is when executives take a problem and ask data scientists to "do some of that data science" on the problem, looking for trends, patterns, automating workflows, making recommendations, etc.

While I agree on the point, there's a case that's arguably worse: When those executives hire Data Scientists and then ask them: "So what can we do with Data Science?"




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: