I've never heard of anyone hiring expensive Data Scientists, spinning up Spark/H2O clusters, building a data lake, doing a database offload to S3/HDFS all for a "select from orders table where basket size is the biggest" query.
AI/ML doesn't even work like this. It's simply not designed for giving 100% accurate answers to highly structured queries.
Anecdotally[1] speaking, you're correct but you're missing the message. You're correct that no one should hire expensive data scientists for this. But what happens is that there's no marketing against these sorts of pragmatic best practices, so it never comes on the radar of business executives making decisions. Instead they're inundated with ML/AI/Data Science pitches and mentions everywhere. And so when they go to invest in improvements like this, they reach towards the buzzwords they know instead of the solutions they're not aware of.
What ends up happening is effectively the Data Science engagement becomes 90% data cleaning, a handful of SQL statements that should have existed beforehand but never did because the data infrastructure wasn't there, and possibly a veneer of ML/AI just to say it was used. Clients come out happy (sometimes), despite overpaying for what was a much more basic engagement than they think it was, and they go on preaching to their business exec friends the virtue of ML/AI and the cycle continues.
[1] I built up a Business Intelligence/Analytics team at my last job, and currently work for a marketing agency managing digital analytics for Fortune 50 clients. Lots of exposure to analytics in lots of varying environments, and I've seen firsthand how ML engagements get pitched and results get presented.
I also own a consultancy that's the anti-version of this phenomenon, offering digital analytics management and support services. 50% of my work involves being a knowledgeable resource for marketing and business execs to lean on to cut through the bullshit. With most of the rest being basic Google Analytics/Google Tag Manager management, CrazyEgg, and drip marketing campaigns. All of which seems like AI-level magic for clients when done correctly.
Interesting comment, makes me wonder if people just generally underperform. You hire an "AI" team with multiple PhDs, you get work that a DBA could've done. You hire a DBA, you get work that a PHP intern could've done. You hire a PHP intern, you get work a kid could've done with Excel formulas :P
It's less about over/under-performance and more with incentive alignment and political perception, and organizational maturity.
It's really, really hard to get executive budgetary approval for a foundational data audit/cleaning project (comprehensive data cataloging, data cleaning, source auditing and validation, etc). Doing so implicitly admits that you weren't doing that before, and now you have to pay gobs of money to fix it. The larger the company, the more infeasible it is to push this through because of the breadth of technical/analytical debt that has accrued and the price tag associated with the project, combined with the perception of incompetency (i.e. it's an expensive project that's fixing a problem you as the executive shouldn't have let happen to begin with).
Whereas an ML/AI project/initiative/push is a net new capability that you're spearheading, and it's easier to get the political traction to spend money on net new things, especially buzzwordy net new things that the firm can use to be viewed as cutting edge. The fact that you're rolling up the cost of a complete data management audit to be able to even do the ML/AI project is a minor bullet-point that doesn't matter. Executive expectation is that anything ML/AI/new-age-techy is going to be astronomically expensive anyway, so it doesn't get noticed that they're paying a premium on labor to do it as a combined project rather than as two separate projects.
Effectively, the foundational work that's needed to support ML work is also the work that's needed to do basic SQL-based analytic work, but it's way easier to get that budgetary line approved in a flashy ML project, even if you're paying a premium by having the ML/AI firm do the foundational work instead of the specialist work.
Plus, spearheading ML/AI initiatives make for better resume points than "data management initiatives". So there's little reason for anyone in this process to attempt to change anything, unless you happen to materially benefit from a firm's profit. For this reason, my main consulting clients are bootstrapped firms that actually care about being pragmatic over being trendsetters.
Note: This is a huge generalization, and doesn't apply universally. But it's far more common than you would expect, especially as you veer away from the type of companies that pop up on HN towards more traditional industries.
I was asked about my thoughts on AI/ML at work, I said it didn't really apply to us. I was told "but with ML we can figure out when deliveries are happening and scale the machines before the deliveries happen based on the peak traffic times". I tried to explain that we could so all that from SQL and looking at our data. We have all the data we just need to formulate it into something that makes sense to predict which times of day, days of week, for each region, where we have more traffic then use that data to pre-scale. I was shot down to "you clearly do not understand ML and should go read up on it".
The catch it the “and looking at our data” part. ML is basically a collection of thorough ways to look at your data, understand the patterns and infer what that means for the future.
In your example, you should absolutely start by cleaning your data up and run some basic SQL aggregations and plotting volume over time. So you look that that and notice (1) volume is increasing over time, (2) some holidays bump a few days ahead but drive very low volume day of (3) weekends are higher, but the effect isn’t pronounced the whole year and (4) summer is better for you than winter except for the Christmas season. Now: it’s two days before Halloween, what’s our anticipated sales volume?
If you baked all those observations into an ARIMA model, it’s trivial to crank out a forecast with quantifiable accuracy. If you just have lines on a graph, it’s hard to pin down all the independent effects and recombine them for arbitrary scenarios.
You're absolutely on the right track. There are well-established statistics tools that formalize your intuition and carry it forward to its logical conclusion.
> "by day of week"
This means you have a time series with daily resolution (one observation per day) and you expect Weekly Seasonality to matter. Model this as 7-period lag in your daily series.
> based on % of increase from other markets if no previous years
There are multiple markets, each with their own time series? Congratulations, you have Panel Data [0]. Do some regions have similar trends? Need to account for that correlation, maybe try a Mixed Model [1]
> percentage of growth from last 2 years
So there's a trend component (constant growth over time). Easy enough to fit the I term of an ARIMA model for this. You'll need to do some custom work to integrate this with your cross-regional correlations though.
> Average processing time
You'll want to model this at least as well as you're modelling the demand. That means seasonal effects, correlations between factories / warehouses, etc. PS any time you're dealing with a two-day weekend you'd better use at least 3 years of data in case major holidays happened to fall on a Saturday and Sunday, blindsiding you when it shows up on Monday this year.
> show the holidays
You'll definitely need to put together a calendar of major holidays for each region. These models will calculate the effect for each holiday. Specifically, the effect of the holiday AFTER accounting for the day of week, overall growth, time of year (season), and region. You might even get nifty charts like [2]
===============
Anyway that's the basics. You can take graduate math courses in just this kind of modelling. Easier - you can contract a decent statistician for a couple weeks to build the model for you. They'll be delighted that you can produce SQL queries with the relevant data, and their models will help you get a lot more value out of those queries.
> We have all the data we just need to formulate it into something that makes sense to predict which times of day, days of week, for each region, where we have more traffic then use that data to pre-scale.
Don't love your tone, but I agree. I have been working in what was called predictive analytics for 16 years.
I've done tons of projects, for tons of companies, and this sort of refrain from people is pretty common when they don't have experience in the field. They think of it like some fad that doesn't make a lick of sense outside of a C-level discussion.
But the reality is, predictive analytics is extremely powerful. One of the last projects I did was to save Trains from derailing. Another was to improve crop yield of a farming company by using satellite imagery to determine when a field was most needed to be harvested. Tons of other examples.
To even explain the particular use cases would take quite awhile because they are domain specific issues. Cron isn't solving these problems.
What the person is really saying is, I don't have experience in these topics, what can be so hard about them?
The same style for people who arm chair sports, or politics, or programming, or any other topic. It all seems easy when you don't know the details.
Didn't you use expert systems 10 years ago for those? Rules running of SQL queries, which you crafted yourself or through the help of a domain expert?
I think its fare to claim that companies who skipped that process might want to consider it first, as a cheaper way to start with predictive analytics.
But, I'm not sure, I am actually intrigued, what was the techniques used before ML in that field otherwise?
These people are describing 99% of the Fortune 500 companies who have no idea what AI means other than hiring a team of data scientists that will hopefully solve all of their problems in the name of technology.
Man, I am really curious what position you hold that you know the AI strategy for 99% of Fortune 500 companies. Those same Fortune 500 companies would pay you a lot of money for this level of insight into their competitors.
Wow, at least one other person shared my opinion. Clickbait article/post title. Also, the use-case for simple sales coupons mentioned in his article are not even close to the kind of things a company like Amazon does AI on in the recommendation process of WHAT product to advertise to each customer (as mentioned by another poster, I would not expect to see a discount for breast pumps when I just bought something for myself-male-such as men's deodorant).
People who don’t know what AI actually is and buying it anyways. I’ve actually seen this first hand. The developers/data scientists involved simply did what they were asked even though it didn’t make much sense (we had tried and failed to explain why this was a waste too many times and got nowhere)
Although to be fair, this outcome was still an improvement. At least with using machine learning unnecessarily the data actually meant something and wasn’t just arbitrary excel numerology.
Even Data Scientists couldn't tell you if it just means neutral networks or if it include ML techniques. There are technologies like AutoML which automate feature engineering but is that ML or AI. Not sure.
So I am not concerned whether people know AI/ML or not. What I have issue with is people thinking that you can 90% of AI/ML using SQL. Which makes no sense.
Maybe they're describing a case where the systems have been well designed so they already have the data they need in a useful format and don't need to do any of that bullshit to make it usable.
I've never heard of anyone hiring expensive Data Scientists, spinning up Spark/H2O clusters, building a data lake, doing a database offload to S3/HDFS all for a "select from orders table where basket size is the biggest" query.
AI/ML doesn't even work like this. It's simply not designed for giving 100% accurate answers to highly structured queries.