
You don't need ML/AI, you need SQL - cyberomin
https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html
======
tfehring
> _say a person bought a pair of shoe, sunglasses and a book. For their
> newsletter, we will show include shoes, sunglasses and books. This was a lot
> more relevant than sending random stuff._

I agree with the general sentiment of the article, but this seems like a poor
example, since a more sophisticated approach can add a lot of value to a
recommendation system. How do you know whether a customer is likely to want
more than one item in any of those categories? If they already purchased
sunglasses, wouldn't they be more likely to purchase, say, a sunglasses case
and/or sunscreen? If they purchased a book, do you recommend the same book
again? And if not, how do you choose which book(s) to include?

Of course, you could technically still handle this in SQL with a bunch of CASE
statements, but obviously that doesn't scale well across a wide range of
products. The whole point of ML/AI in that use case is to scale that type of
nontrivial decision making.

~~~
shadowmint
Obviously ML can add a lot of value here, but its questionable to me if its
trivial to build such a model with available data, keep said model up to date,
or train variations on it easily, cheaply and quickly enough to A/B test the
result and ensure you’re _actually_ making any tangible difference.

So you know... I don’t think it’s unfair to say that for smaller vendors, the
cost/effort of setting up a ML model may dwarf the fractional improvement it
offers over just having one person doing human generated SQL queries.

The point is this isn’t like machine vision or voice, where its almost
_expontentionally_ better than traditional approaches.

It’s just... a bit better. Which is worth it only if the fractional
improvement pays for the setup cost.

~~~
halflings
It's not unheard of to see +10-30% in revenue when adding a recommender system
[0]; The system described by the author is arguably more complex than a
recommender system, since he has to develop, maintain and evaluate a set of
rules that are not based on real data, but only on his intuition of what users
want. GP gave good examples of how this would easily fail (do you always want
to recommend items from the same category; if not, how do you know which other
category to recommend?)

[0]
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.895...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.895.3477&rep=rep1&type=pdf)

~~~
natalyarostova
For newer or smaller firms the SQL approach makes sense. Both because there is
less data, and it's less risky to implement. Once the context is fully fleshed
out, then it's easier to move on to a ML recommender system. It's also easier
to track improvement vs a benchmark.

------
joe_the_user
Nice post,

Here's a different way to think about the situation with current AI/deep-
learning; if the current upsurge of methodologies was getting close to
_general_ AI, it would be getting closer and closer to a hammer that really
did let you treat everything as a nail. IE, it would be general purpose.

But I think I can say we're not seeing that even though deep learning seems to
be continually expanding the domains that it can operate on. How is that? This
Open AI is very eye-opening; "We’re releasing an analysis showing that since
2012, the amount of compute used in the largest AI training runs has been
increasing exponentially with a 3.5 month-doubling time (by comparison,
Moore’s Law had an 18-month doubling period)." Essentially, as a rather brute-
force-y method, we have shown we can expand deep learning's impact to a larger
and large domain but not at all in the fashion of human learning tricks (where
the new isn't that much harder than the old trick).

Maybe, in this process, a better algorithm that adjusts to new situations
without increased costs will surface. But until then it seems new and old
methods will need to coexist.

[https://blog.openai.com/ai-and-compute/](https://blog.openai.com/ai-and-
compute/)?

~~~
halflings
This doesn't sound related to the post, no? The post doesn't argue for general
AI that learns like humans do, it only discusses the merits of AI as a whole
vs hard-coded SQL queries and heuristics.

------
posix_compliant
Good post, but I couldn't disagree more. Regardless of your business size, it
will always be valuable to know information such as:

* How does every additional coupon-dollar affect the total amount a customer buys?

* What is the relationship between customer age and retention for my store?

* Does giving a customer more purchase options help or hurt their chances of making a purchase?

My experience is that each of these questions can be solved, in part, using 3
lines of Python code:

    
    
        from sklearn.linear_model import LinearRegression
        lr = LinearRegression()
        lr.fit(X,y)
    

Then look at the beta coefficients of the model, and you have a rough idea of
how different features are correlated. Doing something like this in SQL sounds
difficult. If you have data to interpret, it makes sense to use similar
methods. I can't think of an example where you have data but refuse to look at
it until your company is "bigger".

~~~
perturbation
I overall agree that ML is needed over 'just SQL' in a lot of cases (though
SQL + good visualizations / exploratory analysis can answer a lot of those
questions qualitatively). I would also be careful with the linear model
approach. Multicollinearity can hide how important a feature is (or reverse
sign of a feature) when trying to use coefficients to interpret importance, so
using a linear model like that isn't as straightforward as it seems.

As a workaround, you could look for high VIF to detection multicollinearity,
use some sort of stepwise selection / penalized regression, or use something
like relaimpo
([https://cran.r-project.org/web/packages/relaimpo/index.html](https://cran.r-project.org/web/packages/relaimpo/index.html))
- not sure of a Python equivalent - to judge overall feature importance in the
model.

------
tilt_error
The premise for this article is wrong!

The author describes using SQL to pull facts from history; who was the number
one customer the last week, who abandoned online orders and so on.

The premise should instead be how to fit a model onto your business data so
that you better can guess who will be the number one customer next week, what
(s)he will order and so on.

The problem that ML addresses is how to arrive at that model, under the
assumption that you can use historic data to pick either model or parameterise
a model.

SQL has it merits, as does the relational database model, but this has nothing
to do with creating models (even though we are modelling the data itself). The
author gives some examples that are, frankly, trivial.

But he has a good argument around namedropping "hot" technology when your
business need does not incorporate distributed trust (blockchain), modelling
behaviour (or some such) using ML and so on.

------
gwbas1c
Maybe I'm niave, but are there really people who want to hop on the AI
bandwagon just to do mundane lookups like this?

When I worked with machine learning many years ago, we learned that it was no
better than the heuristics already in place. The thing is, it's much easier to
diagnose a well written and understood heuristic than a machine learning
model.

~~~
whoisjuan
Machine Learning is usually seeen as a magic black box by many people. So yes.
There's a recent trend that seems to favor a machine learning first approach
to solve very simple and mundane problems because people feel that they are
missing some magic insight if they don't do it (FOMO).

For example, the author refers to a shopping newsletter where you personalize
suggestions for certain products after a customer buys a particular product.
This is very often a machine learning 101 example but really there's nothing
preventing you from writing those heuristics yourself -no ML involved(e.g: if
a customer buys a pillow, suggest pillow cases).

Machine learning does makes sense for something like that if your website is
Amazon, but is definitely an overkill if your website is an e-commerce for
house garments.

The funny thing is that usually you will end up writing those heuristics
implicitly since you need to label your data anyways.

~~~
apatters
Another fun thing is that you will learn a lot more about your customer base
if you do the research and write those heuristics yourself, vs. having ML do
them. A business that understands the behavior of its customers is much more
competitive than one which delegates that understanding to a black box. I
don't contend that ML has no valuable (even transformative) applications but
it's not a substitute for personally understanding every detail of your
market.

~~~
threeseed
I don’t understand this at all. I’ve worked with a close to a hundred data
scientists now and every one of them is an expert in the business problem they
are trying to solve.

You can’t just throw an algorithm (even one like AutoML) at a problem and
expect to be able to do magic with no knowledge of the domain. The technology
simply doesn’t work like that.

------
reificator
I like to think I'm not too nitpicky about fonts, but that st ligature is
incredibly distracting.

It's the second article I've seen here that uses it over the last few days,
but I'm not sure if it's the same site or not.

~~~
cyberomin
Thanks for the feedback. Do you have any font preference?

~~~
pwg
Turning this clause off in your style sheet turns off that awful st ligature:

font-feature-settings: "liga", "dlig";

So just remove that clause from your stylesheet and you'll be rid of that
ligature.

~~~
reificator
I like ligatures when they're used well, but that one is just incredibly
distracting.

Shame the advice is to turn them off completely...

~~~
pwg
Testing a big more, just dropping "dlig" from the feature settings declaration
turns off the awful st ligature. So they don't have to 'all' be turned off to
get rid of that one. What I don't know is what other ones get turned off by
dropping "dlig" from the declaration.

------
gonyea
This post is downright bonkers. “We don’t need ML/AI! Proof: _list of things
you wouldn’t use ML for_ ”

There are so many problems you can solve with a neural network. Should Waymo
ETL sensor data and do a WHERE NOT IN for bicyclists?

This is blog post is pretty dismissive. Statistics software has been in use
since the beginning; see SAS. Financial institutions, actuaries, etc, have
been using these methods with SQL data as the input and it’s the only reason
they’re still in business.

If this blog post simply suggested hiring a BI Analyst in your startup, I
wouldn’t disagree.

------
benkarst
There's no logical equivalency between SQL and ML/AI.

SQL is a language that helps retrieve the data you're looking. ML/AI helps you
predict the future (using past data).

Maybe this is directed towards product people? But it has SQL in the title so
it can't be. I'm confused as to who the audience is here.

~~~
tylerjwilk00
You're missing the point. Much of the "intelligence" that AI/ML is touted as
solving could be accomplished through standard SQL queries on well normalized
data. But no one will invest in a company because we have proper data
structures and accurate SQL reports that we learn from.

~~~
m00x
That's been done for ages. The point of AI is to adapt to the world instead of
having humans spend time understanding the problem set.

SQL can apply a human understood model to data points. AI lets us develop new
models and adapt them.

AI lets us solve problems that have abstraction, or problems that change over
time. You can't have SQL detect cats in an image or drive a car.

~~~
titanix2
Cleaning the data, choosing the right ML algorithm, selecting the features,
tuning the parameters, etc. ML also involves a lot of human time thinking to
the problem.

------
oh-kumudo
The title is often true, but it doesn't mean too much or anything. And the
same argument is brought up again and again in the past as well.

What OP suggests, the so called SQL, is basically a heuristic based system.
When done probably and carefully, it could of course work very well, and is
indeed often used as baseline model to bootstrap a ML system. However,
eventually the rule-based system will hit the wall, and ML be the savior of
the day to push the metric further for a margin of 20-30%.

So yes, when you are small and has little data, ML is irrelevant. But same
thing could be said to too many things in software industry, you probably
won't need Docker/Big Data/Fancy JS as well, if you are building a small scale
online store.

Choose wisely your tech stack based on your problem, but the title is
needlessly sensationalized.

~~~
itronitron
that is quite a leap to call SQL a rule-based system. SQL is a standard query
language that you can use to discover how attributes and values relate to one
another within data.

~~~
oh-kumudo
> standard query language that you can use to discover how attributes and
> values relate to one another within data.

That doesn't make it not a rule-based system. There is no learning component
in SQL.

------
Nasrudith
I think this highlights a problem separate from machine learning, block chain,
and similar vs the tried and proven technologies and a long standing one:
attempting to solve via understanding vs seeking the simple solution to avoid
thinking about it.

Ironic that machine learning is 'simple' but that seems to be the case at
times especially with the 'throw block chain or machine learning at it'
approach when a proper algorithm could do it far more efficiently. The funny
thing is that both approaches have their place. If turning it off and on again
fixes a rare issue faster than following every instruction to machine code you
are better off restarting it occasionally - unless it is a critical
application where doing so will cost millions of dollars or lives.

------
panic
I like this article's focus on technology as a way of helping skilled people
do their job more effectively. Why shouldn't a business owner be able to use
Bash and SQL to run their business? Maybe the solution isn't new technology,
but training people to use the old stuff.

~~~
cyberomin
This was exactly the point I was making. Thank you.

------
mrtksn
But you don't gain ML/AI know-how by doing SQL, nor you discover previously
unknown potential about your product buy sticking to your usual toolset.

Not that I necessarily disagree with the OP but I find it deeply
uninspirational.

What's the difference between using ML/AI for problems traditionally solved by
some other tool and using any other tool to solve the same problem
unconventionally? Both can be "hacking". I guess my issue with this is the
word "need", don't do what you need to do but what you want to do if you are
looking for inspiration. After all, mankind never needed to leave the garden
of Eden but left it anyway.

~~~
rs86
I think the OP just meant that you can get a lot done with databases queries
and a bit of automation. There's no need to call that ML/AI.

~~~
MaxBarraclough
Something I hate about the business side of the tech world: the fad-chasing.

From the article:

> I hear these days for you to close that funding round quickly and early
> enough, you must throw in “Blockchain” even if it has no relevance in the
> grand scheme of things. A while ago, it was Machine learning and Artificial
> Intelligence.

Right on. No, blockchain won't help you with your corrupt voting system. If
you don't understand the technology, you can't reason about its applicability,
and there are more buzzword-chasers than serious technologists.

------
pnathan
There aren't very many DBAs practicing in modern shops and devs don't seem to
be too into SQL and delivering excellent SQL queries and schemas. It's its own
skillset.

I would also call out the NoSQL hype train here.

NoSQL _has its place_ , and largely its place is when SQL can not tolerate the
intensity of traffic or the size of the dataset. You can look at the Dynamo
paper for an example of the engineering rationale.

Postgres can take _enormous_ amounts of data at quite decent rates - without
spending too much time on tuning even.

~~~
autokad
usually I am joining many different data sets many of which include some time
of log data (sometimes petabytes in size but usually a few TB). the logs are
persisted to hdfs or s3, which is why spark and hive make such a nice way of
doing work compared to something like postgress.

also, its nice to plop json, avro, csvs, parquet, or what ever data in storage
and just query/join/analyze it. no need to put the story on hold because you
are waiting for the oracle dba to increase space again.

------
thomasfedb
Turns out that people are actually kinda smart - toss in some raw cycles to
handle the mind-numbing bits and you can have a solid system that does smart
things en masse.

------
altitudinous
I'm not sure about this article, but there is certainly scope for an article
named "You don't need Blockchain, you need SQL/a Database"

------
sacado2
The real problem is people (including the author of the article, apparently)
think ML is necessarily some kind of ultra-complicated technique that needs a
PhD and a GPU. But, come on, 80% of the times you can use ML, dead-easy
techniques are more than enough.

I mean, the author is talking about how SQL is a good-old 40 year old tech. In
the mean time, one of the simplest ML algorithm, linear regression, is about
200 years old, even older (AFAIK) than Ada's program for Babbage's machine.
It's very easy to understand and implement, and even excel has it as a
standard function.

Sure, linear/logistic regression or naive bayes won't help you tag pictures
with text à la facebook "this is a picture of a young man dancing with a red
shirt", but the vast majority of use cases of ML are way easier, anyway. So
yes, most of the time, you can easily find "talents" that will solve your ML
problems. And if you really want to, you can implement it in SQL.

------
jaequery
sql is great but i am still waiting for the succesor to sql. sql was made for
relational data. but a relational data with nested data structure kind of like
postgres and jsonb built in mind from the ground up is what id really like to
see.

~~~
TheRealPomax
Why? What _problem_ do they solve that RE can't, other than putting the
"consuming service" data structure into the database, instead of putting the
data into the database and selecting appropriately? I've not seen a good
justification for these things yet, other than "convenience" so the client
"needs less code to structure the data", which is almost always a false
saving.

------
emersonrsantos
Previous post/discussion:
[https://news.ycombinator.com/item?id=16898827](https://news.ycombinator.com/item?id=16898827)

------
shrumm
Or maybe you can do ML with SQL.... Postgres can do basic linear regression, I
did this a couple of times for an analysis and found it pretty handy.

~~~
zkomp
What? not having to choose either or, actually using good tools for the job,
solving the problem instead of mindlessly following the hype? /s

------
swanson
What was the Twitter thread/HN thread referenced about using "boring"
approaches to solving problems?

~~~
minimaxir
This one:
[https://news.ycombinator.com/item?id=16898827](https://news.ycombinator.com/item?id=16898827)

------
visarga
Counting items by value is a maximum likelihood estimation method too. It's
still ML if you do a count, group by, max or threshold - just a less
sophisticated way of doing things. The Naive Bayes algorithm is implemented by
counting, at its base.

~~~
anothergoogler
So any reduction is ML?

~~~
visarga
Only if it fits to an existing statistical model.

~~~
anothergoogler
How about counting the elements of a linked list.

------
flatfilefan
When you already know the query logic or the logic is easy to derive - use SQL
if you can. For more complex stuff ML may work as your rule derivation
mechanism.

------
slifin
I thought graph databases were the canonical implemention for recommendation
systems, one of the few use cases I'd not go straight to sql

