Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning Is a Marvelously Executed Scam (lastweekinaws.com)
58 points by cratermoon on May 14, 2021 | hide | past | favorite | 50 comments



The author is clueless. There's a lot of hype and a lot of people / products that aren't actually any good at ML (or exist as an academic venture without getting their hands dirty).

But when ML works--Uber forecasting demand, Instacart optimizing shopping paths, delivery service route dispatching, and an infinite number of other real-world use cases with double-digit process improvements--it's extraordinary.

Sales forecasting, fraud prediction, modern search, and more are all powered by ML, and throwing a few stones at bad business plans, bad marketing, or over-marketing doesn't change the fact that those things are bad precisely because they're not that good at actual machine learning.


So when ML seems like a scam that means it's just not done right. And there are some successes that show that it does actually work, really. When it's done right. As determined by its ultimate success.

We are an industry as notorious for snake oil as we are for our lack of standards. And both have standard apologetics. The parent's comment is reminiscent of the refrains "Agile, you're doing it wrong" and "that's not the way to write microservices," among many similar others.

I don't think the problem with any of this is ML as such. ML is just software. But software has problems.


The examples you give seem to fit into the "marginal process improvement" mentioned, I should think? I mean, forecasting demand, optimizing shopping paths, route dispatching, etc, don't require ML to do well. ML -might- make it a bit better still, but is it enough to offset the work that went into it? Couldn't say.

Certainly, the number of times I've seen business stakeholders treat ML as magic, where they say "It's like (standard business process) but we'll use ML!" to try and create a business case is appalling. And I think that's more what he's referring to; in many companies, ML is a solution in search of a problem, one that the business is quite happy to pay for to say they're doing (it pleases stockholders), and data scientists are happy to accept money for (it's a job, after all).


If a "marginal" process improvement increases your conversion rate by a double digit percent, then it's not all that marginal, and it's not a scam.

It's kind of boring to read "our revenue jumped 4% after adding multi-objective optimisation to our existing model", but if you stick a few 4% improvements together and apply them to a big revenue stream, you get a big number.


What you're missing is the scale at which ML allows you to do those things.

Classical algorithms for path-finding for example might work really well in narrow cases that have firm constraints. ML allows you to expand the scale of optimizations arbitrarily.


I agree. To use an example from a talk I attended last year:

A govt. department contracted out the development of an ML model to identify an invasive tree species from sat and aerial imagery. New Zealand is sparsely populated and mountainous, so crews are deployed for a week at a time by helicopter to remove these trees. It is very expensive. By being able to scan large amounts of the country for these trees, they can optimise the removal. The model appears to work very well and can identify the trees when they are young.

These kind of deep models are hard to do with traditional computer vision.


How can we look at what openai is doing and think that ml is a scam. GPT-3 probably could write a better blog post!


If AI/ML is real, why do so many obvious Russian hooker bots message me on IG?


Your annoyance is not a cost high enough for Mark to pay


because you like them. they see you do but you don't accept it yet. trust the ai


Machine learning is a hammer and there are just not very many nails. However there are a lot of screws that we wish were nails. Or we pretend they are nails.


I came out of college super excited about machine learning because the math was so interesting and there was a sense of boundless utility and applicability. I came to realize that, at least in industry, the utility is mostly ad targeting and other stuff that doesn't really matter, and other than that ML is mostly overpowered for the task. There is not much socially positive application of machine learning, mostly socially neutral or negative. Some positive, sure, like there's some in medicine, but my understanding is the hardness there is not algorithmic.


> There is not much socially positive application of machine learning, mostly socially neutral or negative.

How do you arrive at this conclusion? What about Computer Vision? Speech recognition and synthesis? Fraud detection? Sentiment analysis?


Computer Vision can be used for facial recognition in surveillance applications. Sentiment analysis for ad targeting. Fraud detection to harass and profile the poor and groups the state considers enemies. And all of them gather massive datasets that can be stolen or sold and used for less than savory purposes. https://algorithmwatch.org/en/syri-netherlands-algorithm/


You basically described all or most technological innovation. It’s not specific to machine learning.


Exactly, all technology has both good an bad uses. Some technologies are more easily pressed into service for one or the other, but there is no black-and-white. The comment I was responding to implied there were no downsides of ML, or none of significance.


for me its not that ML is uniquely inherently bad, its more that I was really excited about its applications but realized that, in practice, it's usually pretty simple algorithms, and the more complicated applications dont have much direct application to socially positive stuff. prove me wrong!


Perhaps the reason ML isn't used for socially positive stuff is because it's really not all that good, and while it can be profitable to sell ML tools and skills, the actual benefits to the organizations buying it are marginal. In an era when budgets for socially positive things have been cut to the bare minimums, and schools, libraries, social services, museums, and the visual and musical arts are crumbling, why would they spend a lot of money on something that doesn't work all that well, compared to simpler, cheaper, existing methods?


exactly. Im not saying ML should be used in more places. I'm saying that there's almost never a real need for it, despite how hyped it can be and how cool the math/algorithms are in theory


The comment you were responding to (mine) was responding to a comment implying there were no significant upsides of ML.


Higher productivity increases the scale of good and evil equally.


I'm not saying every application is bad. It's just that most of the positive applications seem to mostly be data aggregation and cleaning problems, not hard ML problems. Most of the time it's really a simple algorithm with a lot of good data that you need, there's not much interesting work to do besides plugging things together


and some of those nails don’t need to be hammered!


ML for improving business seems to be an overkill. As in, if you are a regular business, the best I can guess is developing better pattern matching systems, say credit card fraud. Most businesses I see, are using AL/ML for improving their current processes.

However, AI/ML is definitely opening up new areas of business, such as autonomous robots (A roomba is much better at its job with a ML component in it, without which, it would have been difficult to come up with such a device), Social media, face recognition, etc.

All in all, most of AI/ML needs are centered around pattern matching in one form or the other. There is much less AI, a lot more Pattern Matching.


Do Roombas include any ML component? I thought they followed a rather straightforward heuristic along the lines of a random walk, with turns when they hit an obstacle.


Some roomba types also use kalman filters (or similar) to help map out a room. Honestly I’d be surprised if ml gave you anything valuable above some already well known algorithms in this space other than a marketing bullet point.


The early generations worked as you describe. Newer/better ones attempt to map out the layout of the home, which allows them to cover larger areas, return to base to recharge/dump dirt, and to only gently touch objects instead of slamming into them at full speed. They actually probably still don't need ML for all of that, motion planning was able to avoid the need for "learning" for decades. But I bet they do use ML anyway.


New roombas and clones map your house with lidar and let you clean exact zones using a map from an app.


That’s how it operated a decade ago but they currently use computer vision to map your house and navigate the world


Ah CV not ML? No CNNs in there?


I think the criticism is a bit over the top to be honest but not by much.

ML seems to be something an organization reaches for most often when: a) it doesn't understand the data it has and doesn't have individuals who have the competencies to hire analysts to help them; b) when they want to tell a story to an audience of engineers (e.g. as to attract "talent" or signal about how bleeding edge the tech is in the company).

A far less frequent case is when individuals with actual expertise have identified a real need for the use of the statistical methods and infrastructure used in ML applications. In this sense it's a scam--but one that technology organizations use against themselves.

The real money in ML, just like with most fads, is in selling the tools, not doing the work. Hence you see the rise of all these "ML ops" platform type businesses or business units (see: Sagemaker, Databricks, and various others).


| it doesn't understand the data it has and doesn't have individuals who have the competencies to hire analysts to help them;

Yeah, this is common and often explains why ML projects fail. ML won't magically understand data for you. And if you don't understand the data you are feeding into your ML pipeline, you will almost certainly have a garbage-in-garbage-out situation on your hands.


There has been not much value in the Data Science/Machine Learning capabilities in my company. The real value is that those people basically learn Data Engineering as a side effect and deliver much faster/prettier reporting.


Is that because the ML models aren’t useful or because of integration problems?


Not the OP, but in my experience as someone who has run a few analytics teams, you need a pretty mature data team that has eked the majority of the value of plain old BI style data visualization and boring data analysis before you need to delve into the realm of even simple undergrad level statistical techniques like regression and t-tests.

Hiring a data scientist before you have a solid data engineering pipeline is like hiring an interior decorator while you're still framing your house. Unfortunately most businesses (even highly technical ones) just don't understand the moving parts of analytics.


Yeah, this is something that even a lot of data science leaders don't understand.

As an example, I recently discovered a bug in a production system that was costing many millions of dollars, that essentially happened because the team was told to go off and implement a shiny new ML model rather than understand and incrementally improve the system.

It's incredibly depressing.


Seeing how much the author flashes the AWS acronym, surely he knows how fundamental ML, statistics, and optimization are to Amazon's core business model.


Ya seriously - while I too am skeptical about ML-as-a-service, if Corey genuinely thinks that ML has no business value I have a rabid pack of Applied Scientists from Amazon Ads & Search that I'd like to set on him


How much peer-reviewed research has anyone from Amazon published showing that their ML has made a difference in their sales, profitability, or other meaningful business measures? Based on my recent experiences trying to buy anything, more and more they seem to be leaning towards adopting all kinds of dark patterns and old-fashioned sales gimmicks.


ML is - over-hyped - very useful for some problems - nets being blackboxes makes it dangerous - hardware-hungry - nowadays useful for narrow-AI, but not even close to general AI

I think unexplainable decisions affecting the lifes of of human beings (e.g. Googles Playstore App removal algorithm) are today's version of the "Terminator" movies.


Long before it was ML it was called data mining. The textbook example is Walmart discovered that men buying diapers rewarded themselves with beer so Walmart created endcaps with beer and diapers. What the textbook didn't say is whether the cost in millions putting the data warehouse together ever was matched by the extra money made putting beer with diapers. Data mining was a fad long before ML co-opted its throne. A human eyeballing data has a better chance of finding these kind of new trends. Where ML weighs in us with fine tuning known trends such as with auto pilot, voice recognition, facial recognition, etc. Even then ML is only as good as the things being tuned.


There are plenty of ML applications in industries like farming, energy, transportation, construction etc. ML is one of the tools along with others like mathematical programming and simulations that allow for more optimal use of resources such as materials, energy, land and human resources. These applications are domain-specific and less known to the general public but have substantial sociatal impact.


There's mega bucks being thrown around ML-based infosec startups at the moment which seem to just do simple outlier detection on some SIEM logs...

I mean.. Yeah..


the real scam is the huge popups on your page


As someone who has implemented plenty of ML-on-IIoT solutions across clients of various sizes, industries, and maturities ..

I'll just say it's unimpressive low hanging fruit to write an article about any wildly popular subject to rehash basic mental models-as-objections-to-hype like Sturgeon's Law, Maslow's hammer, Occam's Razor, and YAGNI.

ML is objectively great at finding signal in noise. When you know what signal you're looking for and how much it's worth, ML is very economical - at certain scales, it's the only sustainable option.

Here are 6 steps to evaluating ML success. The most important point is after each step, if you don't have the economics right, stop what you're doing and go use your money for something else.

1) Choose a pattern recognition objective with a financial benefit Quantify your objective over a fixed timeframe. Focus on how the objective will be achieved. "If we can reduce MTTR by 35% and reduce our fixed maintenance cycle time by 15%, we will see $11.7m gains in net revenue over the next 5 years by reducing our maintenance crew contracts."

2) Draw a clear picture of the data you have and its relationship to the objective. Identify the target variable(s). Assess the quality and granularity of your labeled data. Identify relationships between your data. Declare a hypothesis on the marginal relationship between the model loss function and the objective. Eg: "If this model is 90% accurate we expect to see a 6% lift in upselling / increase maintenance cycles by 14%, etc."

3) Bring a devil's advocate into the room and let them try to tear the idea apart. See if you can piece it back together. Compare your theoretical signal to whatever heuristics you have today. Why is it different? Can you just automate those?

4) Do a 6 week pilot to find the signal in the noise to prove your hypothesis. Do not worry about "cloud scale data platforms." Do not do MLOps. Do not build pipelines. Move heaven and earth to get data out of source systems quickly, by hand if you need to - copy and paste is also data engineering, have a domain SME and a data scientist joined at the hip for avoiding rabbit holes. Use a Kaggle-style holdout set for final model evaluation.

5) Do a field pilot, with your devil's advocate as the judge. A/B testing is usually best - think John Henry vs. the steam engine. Again, move heaven and earth to test your hypothesis in a real world setting. Bathe in uncertainty and confidence.

6) Estimate the cost to create and operate the ML pipeline needed against your expected benefits. Is it justified? Build it; otherwise, either look for ways to augment its value or kill it.

Most ML goes astray at steps 1 or 2. But a lot of good ML solutions are missed because the "elegant scam" skips most of these steps altogether.


modern = modern.replace("software engineering", "machine learning")


Is this how ignorant engineers write dumb articles with a lot of technical jargon to appear smart?

Just the rekognition service he is whining about can and is being used to detect cheaters in carpool lanes and automatic billing. Postal mails are sorted by ML, as well as spam at an email service provider. Recommendations and search on an ecommerce shopping site directly multiplie the revenue of the site. I had no idea some one who calls himself a software engineer could be so completely clueless.

I expect this person to delete this post in the next couple of years when he realizes how dumb it is.


Speaking as someone who's worked in ML/statistics for over a decade, the author is right on the money.


From what I’ve seen (and it’s arguably inexact) is that machine learning is the “respectable” version of blockchain - a solution in search of a problem - though it’s easier to call things machine learning to get the big funding than it is to call MySQL “blockchain”.


That analogy is way off. There are plenty of legitimate problems for which ML is a good solution. It's just the vast majority of companies employing ML techniques would find simple analytical techniques (basic statistics: regressions, simple clustering, etc.) to be at least as useful. They'd also be less costly in time and money as compared with building out even the basic infrastructure to implement an ML pipeline.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: