Reading the article, I couldn't help but wonder about the cost for doing something like this for a typical small business...i.e. where this work had to be done by a contractor. It looks like a big line item for a company with where the inhouse tech expert knows Excel and how to plug in an ethernet cable.
It's not that I don't think it's useful, I just wonder about the ROI for cash constrained businesses.
I'm by no means an expert at machine learning, but, given how organized the scikit learn libraries are, building a simple classifier as show in the example would be a few days of work at the max. In fact, an initial first version can be built within a day. After that, one has to tune the hyper parameters and spend time with feature selection to improve the baseline accuracy.
The most important thing will be the training data. You need a good number of samples, and the data also needs to be reasonably "clean".
At US rates, that smells like a few tens of thousands of dollars. At the core of my "concern" is that magnitudes of turnover, margins, and required increases in sales due to the analysis make application of the idea uneconomical.
To put it another way, the business case feels week most [i.e. small] profit seeking enterprises.
"few days of work" for "tens of thousands of dollars" seems a bit absurd. Are you assuming they are making 10K per day? Seems a bit high. I would assume 200-300/hour tops.
The amount of time you can spend preparing the training data is unbounded. The number of times you can do the training with data that ends up not actually looking like what you see in-the-wild a week later is unbounded. When all is said and done, the yak-shaving alone will be tens of thousands.
At the rate of a couple of hundred bucks an hour that I'd expect to pay for a qualified consultant, 200 hours works out toward the high end of the few in "a few tens of thousands".
Just like with the vast majority of project forecasts the "few days" is what you say to get the sale - internally or to outside clients. If you think it is that simple, well, I would like to sell you just a few days of machine learning expertise if you have a project... :) Even very simple tasks that you can let the intern do can - and often does - take days longer than projected.
There are lots of tools to help with the hyperparameter problem to make that faster/cheaper as well. This problem is often orthogonal to the domain expertise required to do good feature selection.
Scikit-learn implements things like grid search natively [0], and tools like SigOpt [1] (YC W15, disclosure: I'm a founder) do this automatically as well.
Yup, hence the need to have it open source, otherwise that's just too much time spent for little return (leads datasets are too limited for significant uplift)
It's not that I don't think it's useful, I just wonder about the ROI for cash constrained businesses.