Ask HN: Where is AI/ML actually adding value at your company? - mkrecny
======
altshiftprtscrn
I work in manufacturing. We have an acoustic microscope that scans parts with
the goal of identifying internal defects (typically particulate trapped in
epoxy bonds). It's pretty hard to define what size/shape/position/number of
particles is worthy of failing the device. Our final product test can tell us
what product is "good" and "bad" based on electrical measurements, but that
test can't be applied at the stage of assembly where we care to identify the
defect.

I recently demonstrated a really simple bagged-decision tree model that
"predicts" if the scanned part will go on to fail at downstream testing with
~95% certainty. I honestly don't have a whole lot of background in the realm
of ML so it's entirely possible that I'm one of those dreaded types that are
applying principles without full understanding of them (and yes I do actually
feel quite guilty about it).

The results speak for themselves though - $1M/year scrap cost avoided (if the
model is approved for production use) in just being able to tell earlier in
the line when something has gone wrong. That's on one product, in one factory,
in one company that has over 100 factories world-wide.

The experience has prompted me to go back to school to learn this stuff more
formally. There is immense value to be found (or rather, waste to be avoided)
using ML in complex manufacturing/supply-chain environments.

~~~
whistlerbrk
This is brilliant, would love to read a full write up on it. I hope you get a
big raise.

~~~
shas3
Surely it would be guarded as a trade secret, as it usually happens in large
companies.

~~~
altshiftprtscrn
Yup - To do a proper write-up that would actually be interesting to read would
require divulging IP.

------
sidlls
The entire product I built over the last year can be reduced to basic
statistics (e.g. ratios, probabilities) but because of the hype train we build
"models" and "predict" certain outcomes over a data set.

One of the products the company I work for sells more or less attempts to find
duplicate entries in a large, unclean data set with "machine learning."

The value added isn't in the use of ML techniques itself, it's in the hype
train that fills the Valley these days: our customers see "Data Science
product" and don't get that it's really basic predictive analytics under the
hood. I'm not sure the product would actually sell as well as it does without
that labeling.

To clarify: the company I work for actually uses ML. I actually work on the
data science team at my company. My opinion is that we don't actually need to
do these things, as our products are possible to create without the
sophistication of even the basic techniques, but that battle was lost before I
joined.

~~~
Bartweiss
It's interesting to me that with all the ML hype, it's still not clear what
constitutes ML. A basic k-means or naive Bayes approach will show up in ML
textbooks, but those aren't clearly different from "use some statistics to
make a prediction".

There's an interesting group of marginal approaches that have existed as-is
for years, but have increasingly focused their branding on machine learning as
its profile has risen.

~~~
jorgemf
> but those aren't clearly different from "use some statistics to make a
> prediction"

You can reduce 90% of ML to this. Even neural networks are based on
statistics.

If I have to draw a line between statistics and ML is that ML learns, it means
it can predict things, however statistics only gives you information about the
data you have. But for sure statistics and ML overlap a lot.

~~~
Bartweiss
Even that doesn't seem like a clear distinction?

If you ask me for the most likely new value for a dataset, I won't know. But
if I graph a few things and then write a function to spit back the current
mean or median, is that machine learning?

I'm not trying to be snarky there, I agree that the bulk of ML tools are
fundamentally just statistical tricks with some layer of abstraction. As a
result, I have a lot of trouble knowing how much abstraction justifies the ML
title. I see some people using "statistics to produce unintuitive solutions"
as a standard, but that just begs that we ask _unintuitive to who_?

~~~
BickNowstrom
I feel like it is foremost a matter of attitude of the practitioner. An
applied statistician and a machine learning engineer may deliver exactly the
same end product, just the reasoning and assumptions differ. Machine learning
uses little to no assumptions, where statisticians do. I also feel that
machine learning engineers have a bit less fear of building black boxes.

Caruana showed the cartoon of the difference between a statistician and a
machine learning practitioner by showing a cliff. The statistician carefully
inches to the edge, stomping her feet to see if the ground is still stable,
then 10 meters before the edge she stops and draws her conclusions. The
machine learning practitioner dives headfirst from the cliff, with a parachute
that reads "cross-validation".

See also:

[http://norvig.com/chomsky.html](http://norvig.com/chomsky.html) On Chomsky
and the Two Cultures of Statistical Learning.

And
[http://projecteuclid.org/euclid.ss/1009213726](http://projecteuclid.org/euclid.ss/1009213726)
Statistical Modeling: The Two Cultures by Leo Breiman.

and this joke:

> Norvig teamed up with a Stanford statistician to prove that statisticians,
> data scientists and mathematicians think the same way. They hypothesized
> that, if they all received the same dataset, worked on it, and came back
> together, they’d find they all independently used the same techniques. So,
> they got a very large dataset and shared it between them.

> Norvig used the whole dataset and built a complex predictive model. The
> statistician took a 1% sample of the dataset, discarded the rest, and showed
> that the data met certain assumptions.

> The mathematician, believe it or not, didn’t even look at the dataset.
> Rather, he proved the characteristics of various formulas that could (in
> theory) be applied to the data.

------
ekarulf
Amazon Personalization.

We use ML/Deep Learning for customer to product recommendations and product to
product recommendations. For years we used only algorithms based on basic
statistics but we've found places where the machine learned models out perform
the simpler models.

Here is our blog post and related GitHub repo:
[https://aws.amazon.com/blogs/big-data/generating-
recommendat...](https://aws.amazon.com/blogs/big-data/generating-
recommendations-at-amazon-scale-with-apache-spark-and-amazon-dsstne/)
[https://github.com/amznlabs/amazon-
dsstne](https://github.com/amznlabs/amazon-dsstne)

If you are interested in this space, we're always hiring. Shoot me an email
($my_hn_username@amazon.com) or visit
[https://www.amazon.jobs/en/teams/personalization-and-
recomme...](https://www.amazon.jobs/en/teams/personalization-and-
recommendations)

~~~
brianwawok
So is this like the Amazon "feature" where I buy a coffee table on Amazon,
then I get suggested to buy a coffee table EVERY DAY for 3 months. Literally
row after row of coffee table? Because there must be a big pool of people who
buy 1 coffee table buying more coffee tables immediately after?

~~~
ekarulf
It's a hard problem to determine the repeat purchase cadence of a product. At
one end of the bell curve you have items re-purchased frequently, e.g. diapers
or grocery, and on the other end you have items that are rarely repurchased.

I haven't looked at coffee tables specifically, but I know when I've looked at
home products in the past I've been surprised at how frequently people will
buy two large items, e.g. TVs or furniture, within a short period. That said,
I agree there is room for improvement here. We're constantly running
experiments to improve the customer experience, I have faith that in the limit
things will improve. Again, we have no shortage of experimental power so if
you'd like to join in the experimentation let me know :)

~~~
nimchimpsky
"It's a hard problem to determine the repeat purchase cadence of a product."

I don't think it is.

~~~
gberger
Do you work at Amazon, or do you have experience in this area? Care to
elaborate?

------
strebler
We're a computer vision company, we do a lot of product detection +
recognition + search, primarily for retailers, but we've also got revenue in
other verticals with large volumes of imagery. My co-founder and I both did
our thesis' on computer vision.

In our space, the recent AI / ML advances have made things possible that were
simply not realistic before.

That being said, the hype around Deep Learning is getting pretty bad. Several
of our competitors have gone out of business (even though they were using the
magic of Deep Learning). For example, JustVisual went under a couple of months
ago ($20M+ raised) and Slyce ($50M+ raised) is apparently being sold for
pennies on the dollar later this month.

Yes, Deep Learning has made some very fundamental advances, but that doesn't
mean it's going to make money just as magically!

~~~
dchuk
Can you expand more on "we do a lot of product detection + recognition +
search, primarily for retailers" please? Is that something like identifying
products in social media images or something?

~~~
strebler
We have several products, each of which serves different departments within
retailers.

The exact things we do depends entirely on which department(s) are licensing
it. Basically, anywhere there's a product image (from their own inventory to
mobile to social) and we can provide some kind of help, we do. Every
department needs totally different things, so it varies quite a bit...but it's
all leveraging our core automated detection + recognition + search APIs.

------
jngiam1
From Coursera - we use ML in a few places:

1\. Course Recommendations. We use low rank matrix factorization approaches to
do recommendations, and are also looking into integrating other information
sources (such as your career goals).

2\. Search. Results are relevance ranked based on a variety of signals from
popularity to learner preferences.

3\. Learning. There's a lot of untapped potential here. We have done some
research into peer grading de-biasing [1] and worked with folks at Stanford on
studying how people learn to code [2].

We recently co-organized a NIPS workshop on ML for Education:
[http://ml4ed.cc](http://ml4ed.cc) . There's untapped potential in using ML to
improve education.

[1] [https://arxiv.org/pdf/1307.2579.pdf](https://arxiv.org/pdf/1307.2579.pdf)

[2] [http://jonathan-
huang.org/research/pubs/moocshop13/codeweb.h...](http://jonathan-
huang.org/research/pubs/moocshop13/codeweb.html)

~~~
zardeh
I'm curious, because this is something that I was interested in doing for
brick and mortar universities, what aspects do you use to do your
recommendations. That is, is it just a x/5 rating per user that is thrown into
a latent factor model, or do you do anything else (like dividing course
'grade' vs. opinion along two axes manually?)

------
jakozaur
At Sumo Logic we do "grep in cloud as a service". We use machine learning to
do pattern clustering. Using lines of text to learn printfs they came from.

The primary advantage for customer is easier to use and troubleshoot faster.

[https://www.sumologic.com/resource/featured-videos/demo-
sumo...](https://www.sumologic.com/resource/featured-videos/demo-sumo-logic-
log-reduce-next-generation-log-analytics-featured-video/)

~~~
wastedbrains
just a happy Sumologic user, saying hello and Thanks! Most of your product is
great (I am ex splunk user)... The biggest complaint is that I can't cmd+click
to open anything in new tabs as everything is so JS crazy front end.

overall the pattern matching stuff is pretty cool. Also, would like a see raw
logs around this for when I am trying to debug event grouping errors based on
the starting regex.

~~~
jakozaur
Can you elaborate on your improvement proposals?

E.g. With LogReduce you can click on group and see log lines that belongs to
it. IS that something that solves your problem, or are you looking for
something else.

Feel free to send me an email (it is on my profile).

------
ksimek
Here at Matterport, our research team is using deep learning to understand the
3D spaces scanned by our customers. Deep learning is great for a company like
ours, where so much of our data is visual in nature and extracting that
information in a high-throughput way would have been impossible before the
advent of deep learning.

One way we're applying this is automatic creation of panoramic tours. Real
estate is a big market for us, and a key differentiator of our product is the
ability to create a tour of a home that will play automatically as either a
slideshow or a 3D fly-through. The problem is, creating these tours manually
takes time, as it requires navigating a 3D model to find the best views of
each room. We know these tours add significant value when selling a home, but
many of our customers don't have the time to create them. In our research lab
we're using deep learning to create tours automatically by identifying
different rooms of the house and what views of them tend to be appealing. We
are drawing from a training set of roughly a million user-generated views from
manually created guided tours, a decent portion of which are labelled with
room type.

It's less far along, but we're also looking at semantic segmentation for 3D
geometry estimation, deep learning for improved depth data quality, and other
applications of deep learning to 3D data. Our customers have scanned about
370,000 buildings, which works out to around 300 million RGBD images of real
places.

~~~
anantzoid
Interesting. What is your training objective in deciding which view of the
room would be the most appealing? Also, are you looking into generative models
for creating new views from different angles based on existing views?

~~~
ksimek
Our users have manually done a lot of the tasks we want to eventually do
automatically, which effectively becomes data annotations for us to train on.

------
malisper
One of my coworkers used basic reinforcement learning to automate a task
someone used to have to do manually. We have two data ingestion pipelines. One
that we ingest immediately, and a second for our larger customers which is
throttled during the day and ingested at night. For the throttled pipeline, we
initially had hard coded rate limits, but as we made changes to our
infrastructure, the throttle was processing a different amount than it should
have been. Sometimes it would process too much, and we would start to see
latency build up in our normal pipeline, and other times it processed too
little. For a short period of time, we had the hard coded throttle with a
Slack command to override the default. This allowed an enginneer to change the
rate limit if they saw we were ingesting to little or too much. While this
worked, it was common that an engineer wasn't paying attention, and we would
process the wrong amount for a period of time. One of my coworkers used
extremely basic reinforcement learning to make the throttle dynamic. It looks
at the latency of the normal ingestion pipeline, and based on that, decides
how high to set the rate limit on the throttled pipeline. Thanks to him, the
throttle will automatically process as much as it can, and no one needs to
watch it.

The same coworker also used decision trees to analyze query performance. He
trained a decision tree on the words contained in the raw SQL query and the
query plan. Anyone could then read the decision tree to understand what
properties of a query made that query slow. There's been times we're we've
noticed some queries having odd behavior going on, such as some queries having
unusually high planning time. When something like this happens, we are able to
train a decision tree based on the odd behavior we've noticed. We can then
read the decision tree to see what queries have the weird behavior.

~~~
jcoffland
It sounds like a simple PID loop would be sufficient to solve this problem.
You have a control valve and an error signal. No need for anything more
complicated.

~~~
malisper
It is a PID loop, which I guess may not be considered to be actual
reinforcement learning.

------
antognini
At Persyst we use neural networks for EEG interpretation. Our latest version
has human-level performance for epileptogenic spike detection. We are now
working on bringing the seizure detection algorithm to human-level
performance.

~~~
bluetwo
I was wondering the other day if anyone had applied this technology to EKGs.
Do you also do that?

~~~
antognini
Funny you should ask, detecting QRS complexes has been my first project since
starting here. I know of a few papers where the authors have applied neural
networks to EKGs, but the applications have been purely academic. I'm not
aware of any other companies that use NNs in practice. (There may well be
some, but they tend to be secretive about how their algorithms work.) At any
rate, the false positive rate of our software is now about an order of
magnitude lower than anything else on the market.

~~~
bluetwo
Congrats on your application. Sounds very useful.

And thanks for the info. I worked years ago on a training program for EKGs and
it seemed like a field ripe for application of ML and AI.

------
Flammy
The startup I'm part of uses ML to predict which end users are likely to churn
for our customers.

We work with B2B and B2C SAAS, mobile apps and games, and e-commerce. For each
of them, it is a generalized solution customized to allow them to know which
end users are most at risk of churning. The amount of time range varies
depending on their customer lifecycles, but for longest lifecycles we can,
with high precision, predict churn more than 6 months ahead of actual
attrition.

Even more important than "who is at risk?" is "why are they at risk?". To
answer this we highlight patterns and sets of behavior that are positively and
negatively associated with churn, so that our customers have a reason to reach
out, and are armed with specific behaviors they want to encourage, discourage,
or modify.

This enables our customers to try to save their accounts / users. This can
work through a variety of means, campaigns being the most common. For our B2B
customers, the account managers have high confidence about whom they need to
contact and why.

All of this includes regular model retraining, to take into account new user
events and behaviors, new product updates, etc. We are confident in our
solution and offer our customers a free trial to allow us to prove ourselves.

I can't share details, but we just signed our biggest contract yet, as of this
morning. :)

For more [http://appuri.com/](http://appuri.com/)

A recent whitepaper "Predicting User Churn with Machine Learning"
[http://resources.appuri.com/predicting_user_churn_ml/](http://resources.appuri.com/predicting_user_churn_ml/)

~~~
davedx
We're a very retention focused energy company. I just signed up for a trial.
Count me interested! :)

------
johndavi
We exclusively rely on ML for our core product at Diffbot: automatic data
extraction from web pages (articles, products, images, discussion threads,
more in the pipeline), cross-site data normalization, etc. It's interesting
and challenging work, but a definite point of pride for us to be a profitable
AI-powered entity.

~~~
infinite8s
Are you guys familiar with the DeepDive work from Christopher Re's group at
Stanford?

~~~
LolWolf
Or his company Lattice for that matter.

~~~
johndavi
Yes to both!

------
HockeyPlayer
Our low-latency trading group uses regression widely. We have experimented
with more complex models but haven't found a compelling use for them yet.

------
iamed2
We use ML to model complex interactions in electrical grids in order to make
decisions that improve grid efficiency, which has been (at least in the short
term) more effective than using an optimizer and trying to iterate on problem
specification to get better results.

Generally speaking, I think if you know your data relationships you don't need
ML. If you don't, it can be especially useful.

~~~
huevosabio
Interesting, do you have a write up for someone interested in the field? What
company do you work for?

------
got2surf
My company builds software to analyze customer feedback.

We use "real" ML for sentiment classification, as well as some of our natural
language processing and opinion mining tools. However, most of the value comes
from simple statistical analysis/probabilities/ratios, as other commenters
mentioned. The ML is really important for determining that a certain customer
was angry in a feedback comment, but less important in highlighting trending
topics over time, for example.

~~~
activatedgeek
What do you mean by "real"?

~~~
got2surf
Sorry, using "real" in quotes wasn't too descriptive.

A few machine learning-based classifiers (we've used Bayesian and SVM
approaches). Word embeddings and topic modeling (similar to word2vec) which
are based on shallow neural networks.

Those are a few of what I would consider the "real" machine learning tools we
use. Most of the application, though, is statistics/pattern
recognition/visualizations on top of the data calculated by the ML approaches.

The interesting thing is (in my opinion/experience) that a 10% improvement in
some of the ML performance (a 10% increase in accuracy, for example) will
translate to a 1-3% improvement in end user experience (they see slightly
better insights and patterns, but it is a marginal improvement). On the other
hand, layering a new visualization or statistical heuristic on top of the data
can lead to a significant boost in user experience.

Again, this is just for our specific application/domain, but we focus on
making the ML results more _accessible_ to users instead of focusing on the
marginal accuracy of the ML results themselves.

------
quantumhobbit
Detecting fraud. I work for a credit card company.

Not really a new application though...

------
BickNowstrom
FinTech: Credit risk modeling. Spend prediction. Loss prediction. Fraud and
AML detection. Intrusion detection. Email routing. Bandit testing. Optimizing
planning/ task scheduling. Customer segmentation. Face- and document
detection. Search/analytics. Chat bots. Sentiment analysis. Topic analysis.
Churn detection.

~~~
collyw
I can imagine that fin tech will love it. Everything will go wrong one day in
the future and no one will know the reason.

~~~
BickNowstrom
Why would they love something that goes wrong?

------
fnovd
We've been using "lite" ML for phenotype adjudication in electronic health
records with mild success. Random forests and support vector machines will
outperform simple linear regression when disease symptoms/progression don't
neatly map to hospital billing codes.

------
NumberCruncher
In my last job at a big telco I was working with/on a scorecard driven next-
best-offer system steering 80-90% of all outbound callcenter activities. I
would not call it AI/ML because the scorecards were built with good old
logistic regression and were pretty old (bad) but the process made us 25 M
€/year (calculated NPV). I don't know how much of it was added by the scoring
process. We also had a real-time system for SMS marketing built on the top of
the same next-best-offer system making 12+ M €/year (real profit).

On the other hand I found an internal fraud costing us 2-3 M €/year applying
only the weak law of big numbers. Big corp, big numbers.

Now I build a similar system for a smaller company. I think we will stick
mainly to logistic regression. I actually use "neural networks" with hand-
crafted hidden layers to identify buying patterns in our grocery store
shopping cart data. It works pretty well from a statistical point of view but
it is still a gimmick used to acquire new b2b partners.

------
iasondemiros
Here at Qualia (qualia.ai) we process mostly textual data from online sources
(news, blogs, social media, internal data). Our background is in NLP when back
in the days AI meant deep parsing, HPSG, tree-adjoining grammars, synsets,
frames and speech acts, discourse, and different flavors of knowledge
representations. It also meant LISP and Prolog. The domain quickly evolved
from knowledge and rule-based to data-driven and statistical, mostly thanks to
Brown and the IBM MT team in the 90s (that are now part of the Renaissance
Fund).

We use hierarchical clustering for topic detection. We also work on topic
models (Blei and his legacy). We use word embeddings for information retrieval
and various ML algorithms for different applications of mood and emotional
learning: Bayes, SVM, Winnow (linear models) and sometimes decision trees and
lists. We also learn from past events and crises in order to create models,
mostly statistical, and try to estimate how an event might evolve in the
future. We have also tried graph-based community detection algorithms on
Twitter (min-cut). Finally we have experimented with non-linear statistical
analysis on micro-blogging data, by applying methods such as correlation
functions, escape times, and multi-step Markov chains (but with limited
success).

I 'd like to add here that I feel ML is well defined (supervised, semi-
supervised, unsupervised and using unlabeled data), statistical learning is
more fuzzy (a good starting point is Vapnik's work) and regarding AI, I am not
sure I know any more what it means! I am always open to discussion and ideas.
Let me know.

------
ilikeatari
We leverage machine learning in the asset replacement modeling space.
Basically there is an optimum time to sell your vehicle and purchase a new one
based on our model. Our company works with large fleet organizations and
provides analytics suite for vehicle replacement, mechanic staffing,
benchmarking, telematics and other aspects of fleet management.

~~~
pacey
Is this useful for individuals also? I would really like to know the optimal
time to sell my car. Or is this more like chart analysis which only works as
long as the people having access to that information is limited?

~~~
ilikeatari
Theoretically it could be, but most fleets gather much more data about their
vehicles than an average consumer. For example all repairs, parts and labor
costs, maintenance, mileage, engine hours and much much more. In addition,
majority now leverage telematics which greatly improves the resolution and
depth of this data. This data is quite necessary to make our model work. From
high level perspective though most consumers sell their vehicles way before
that optimal time frame.

------
AustinBGibbons
I work at Periscope Data - we do our own lead scoring using home-baked ML
through SciPy. It was interesting to see it play out in the real-world -
interpretation of features/parameters was definitely important to the people
driving the marketing/sales orgs.

We also support linear regression in the product itself - it was actually an
on-boarding project for one of the engineers who joined this year, and he
wrote a blog post to show them off: [https://www.periscopedata.com/blog/movie-
trendlines.html](https://www.periscopedata.com/blog/movie-trendlines.html)
About 1/3rd of our customers are using trendlines, which is pretty good, but
we haven't gotten enough requests for more complex ML algorithms to warrant
focusing feature development there yet.

------
AndrewKemendo
We use Convolutional Networks for semantic segmentation [1] to identify
objects in the users environment to build better recommendation systems and to
identify planes (floor, wall, ceiling) to give us better localization of the
camera pose for height estimates. All from RGB images.

[1]
[https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn...](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)

------
Schwolop
Once an analyst has manually reviewed something, a software system updates a
row in a database to mark it as done. Our marketing team calls this machine
learning, because the system "learns" not to give analysts the same work
twice.

We also use ML to classify bittorrent filenames into media categories, but
it's pretty trivial and frankly the initial heuristics applied to clean the
data do more of the work than the ML achieves.

------
saguppa
We use deep learning at attentive.ai to generate alerts based on unusual
events in surveillance video.

We use neural nets to generate descriptors of videos where motion is observed,
and classify events as normal/abnormal.

------
splike
Based on past experimental data, we use ML to predict how effective a given
CRISPR target site will be. This information is very valuable to our clients.

~~~
infinite8s
That sounds interesting, especially given a good enough physical model could
compute that de-novo.

------
peterhunt
Machine learning is great for helping you understand a new dataset quickly. I
often train a basic logistic regression classifier and introspect the
coefficients to learn what features are important, which are unimportant, and
how they are correlated.

There are a number of other statistical techniques you can use for this but
scikit-learn makes this very very easy to do.

------
mattkrea
Pretty basic here.. we are a payments processor so we check volume, average
ticket $, credit score and things of that nature to determine the quality and
lifetime of a new merchant account.

------
sbashyal
\- We use a complex multivariate model to predict customer conversion and
prioritize lead response \- We use text analysis to improve content for
effectiveness and conversion Among other things

~~~
ichiragmandot
Can you please explain complex multivariate model in detail? I am curious to
learn about it

~~~
nickpsecurity
Type Introduction/Tutorial Multivariate Statistics into Google. I saw quite a
few with those words in title. Probably what you want.

------
CardenB
I would suspect AI/ML profits come largely from improving ad revenue at very
stable companies.

------
jc4p
I think a lot of the real benefits from ML "at work" is more in just cleaning
of data and running through the gauntlet of simplest regressions (before
jumping onto something more magical whose outputs and decision making process
you can't exactly explain to someone).

I would classify something like this blog post as ML, would you?
[http://stackoverflow.blog/2016/11/How-Do-Developers-in-
New-Y...](http://stackoverflow.blog/2016/11/How-Do-Developers-in-New-York-San-
Francisco-London-and-Bangalore-Differ/)

~~~
Bartweiss
When people talk about the growth (or sometimes 'excess') of ML solutions
these days, I always wonder about this.

A basic linear regression probably isn't ML, a backprop neural net clearly is,
but somewhere between the two is a very fuzzy line between "statistics and
data cleaning" and "actually machine learning". I think a lot of people have
just pushed the ML angle of an already-reasonable approach to tie into that
popularity.

~~~
yessql
ML courses often start with linear regression, and if you build up complicated
polynomials to find a nonintuitive model of your problem, I would definitely
consider that machine learning.

~~~
sidlls
I wouldn't. I'd call it "basic computational statistics." But I think I might
be in the minority on that.

~~~
Bartweiss
I've always thought of regressions, even high-order ones, as just a
statistical tool. They're present at the start of ML courses, sure, but as a
tool used in ML techniques or a good alternative to them.

It looks like that's not the standard view, though.

~~~
LolWolf
The whole deal seems weird to me.

Neural networks are just functional approximators, so why isn't a linear
regressor of k-th order (e.g. Taylor expansion up to k-th order) also
considered "ML"? What's the distinction here?

------
taytus
_Raising money from clueless investors_

~~~
smoyer
I've got some ocean-front property in Arizona I'd like to sell ... I know it's
a premium price but it's worth it!

~~~
iamroot
"Our AI models show that California will sink into the ocean, and you'll be
ahead of the curve with your own Arizona Bay beachfront property!"

~~~
smoyer
Numerai is analyzing HN comments as a metric for choosing stock trades and has
just shorted all west coast companies.

------
lost_name
Nothing in my department yet, but we actually have a guy actively looking for
a reason to implement some kind of ML so we can say our product "has it" I
guess.

~~~
eon1
Yep, our tech guys are constantly looking for ways to implement things that
may or may not be useful, or even understood; we've just gotta be able to say
we have the latest in machine learned blockchain-based buzzword doodads to
constantly reinforce our reputation as the most "high tech" organisation in
our sector.

~~~
kmikeym
Sounds like you work at Xerox!

------
agibsonccc
I run a deep learning company focused on a lot of banking and telco fraud
workloads like [1]. We have also done dl to predict failing services to auto
migrate workloads before server failure.

The bulk of what we do is anomaly detection.

[1] [https://skymind.io/case-studies](https://skymind.io/case-studies) [2]
insights.ubuntu.com/2016/04/25/making-deep-learning-accessible-on-openstack/

------
jgalloway___
We realized that by adjusting training models we could incorporate autonomous
recognition of not only images but intent and behavior into our application
suite.

------
sgt101
Deep learning to identify available space in kit from images! We are dead
proud of it !

Trad learning for many applicatons : fault detection, risk management for
installations, job allocation, incident detection (early warning of big
things), content recommendation, media purchase advice, others....

Probabilistic learning for inventory repair - but this is not yet to impact,
the results are great but the advice has not yet been ratified and
productionised.

------
garysieling
I'm using some of the pre-built libraries to find/fix low hanging fruit of
data quality issues for
[https://www.findlectures.com](https://www.findlectures.com), for instance
finding speaker names.

The first pass is usually a regex to find names, then for what's left run a
natural language tool to find candidate names, and then manual entry.

------
brockf
At our data science company, we're building a marketing automation platform
that uses deep reinforcement learning to optimize email marketing campaigns.

Marketers create their messages and define their goals (e.g., purchasing a
product, using an app) and it learns what and when to message customers to
drive them towards those goals. Basically, it turns marketing drip campaigns
into a game and learns how to win it :)

We're seeing some pretty get results so far in our private beta (e.g., more
goals reached, fewer emails sent), and excited to launch into public beta
later this month.

For more info, check out [https://www.optimail.io](https://www.optimail.io) or
read our Strong blog post at [http://www.strong.io/blog/optimail-email-
marketing-artificia...](http://www.strong.io/blog/optimail-email-marketing-
artificial-intelligence).

~~~
mitbal
That's very interesting case. In my company, we would also like to optimize
email marketing campaign using RL. However, based on my little experience
using RL, (please correct me if I'm wrong) wouldn't it take long to iterate
and update the V and policy function (or Q function if we use Q-learning), so
I'm a bit skeptical if it can be used for real world case where we need to
wait days to get the email response as feedback from the environment.

~~~
brockf
Great points. It's definitely more challenging than learning to play a simple
arcade game or something, where feedback is invariant and often instantaneous.
To address these challenges, we use a combination of (1) heuristics tailoring
our RL algorithms to the problem at hand, (2) many converging sources of
feedback. Most importantly, as with any machine learning implementation, it
works in practice — our AI-driven campaigns beat randomized, control
conditions!

------
lmeyerov
At Graphistry, we help investigate & correlate events, initially for security
logs. E.g., Splunk & Sumo centralize data and expose grep + bar charts, then
we add visual graph analytics that surfaces entities, events, & how they
connect/correlate. "It started here, then went there, ..." . We currently do
basic ML for clustering / dimensionality reduction, where the focus is on
exposing many search hits more sanely.

Also, some GPU goodness for 10-100X visual scale, and now we're working on
investigation automation on top :)

------
room271
Helping to moderate comments on theguardian.com!

[https://skillsmatter.com/skillscasts/9105-detecting-
antisoci...](https://skillsmatter.com/skillscasts/9105-detecting-antisocial-
comments-an-adventure-in-machine-learning-at-theguardian-com)

(We're still beginners as will be apparent from the video but it's proving
useful so far. I should note, we do have 'proper' data scientists too, but
they are mostly working on audience analysis/personalisation).

------
lowglow
We're building models of human behavior to provide interactive intelligent
agents with a conversational interface. AI/ML is literally the backbone of
what we're doing.

------
lnanek2
Providing users the best recommendations so they participate more, get more
from the service, and churn less. Detecting fraud and so saving money.
Predicting users who are about to leave and allowing us to reach out to them.
Dynamic pricing to take optimum advantage of the supply and demand curve.
Delayed release of product so it doesn't all get reserved immediately and
people don't have to camp the release times.

------
tspike
Wrote a grammar checker that used both ML models and rules (which in turn used
e.g. part-of-speech taggers based on ML).

Wrote a system for automatically grading kids' essays (think the lame
"summarize this passage"-type passages on standardized tests). In that case it
was actually a platform for machine learning - ie, plumb together feature
modules into modeling modules and compare output model results.

------
tomatohs
At ScreenSquid we use statistical analysis to find screen recordings of the
most active users on your website. This saves our customers a ton of time
avoiding playing with filters trying to find "good" recordings.

[https://screensquid.com/2016/12/introducing-star-
ratings/](https://screensquid.com/2016/12/introducing-star-ratings/)

~~~
sappapp
Hierarchical clustering?

------
plafl
Predict probability of car accidents based on the sensors of your smartphone

~~~
mywittyname
How do you turn these predictions into cash?

~~~
soared
Highly targeted ads for lawyers and healthcare after a crash.

------
solresol
Our main product uses machine learning and natural language processing to
predict how long JIRA tickets are going to take to resolve.

(www.queckt.com is anyone's interested)

Without AI/ML, we wouldn't have a product.

------
wmblaettler
I have a follow on question to this to all the respondents: Can you briefly
describe the architecture you are using? Cloud-based offering vs self-hosted,
software libraries used, etc...

------
katkattac
We use machine learning to detect anomalies on our customers' data and alert
them of potential problems. It's not fancy or cutting edge, but it provides
value.

------
iampims
We use RNNs for voice keyword recognition.

------
vskr
Slightly tangential, but how do you collect training data for AI/ML models you
are developing

------
moandcompany
We are using machine learning to identify software as benign software or
malware for customers.

------
pfarnsworth
Sift's product is based on ML.

------
Tankenstein
Lots of KYC things, like fraud, AML and CTF. Helps with finding new patterns.

------
Radim
I run a company that specializes in design & implementation of kick-ass ML
solutions [1]. We've had successful projects in quite a few industries at this
point:

LEGAL INDUSTRY

Aka e-discovery [2]: produce digital documents in legal proceedings.

 _What was special_ : stringent requirements on statistical robustness! (the
opposing party can challenge your process in court -- everything about way you
build your datasets or measure the production recall the has to be absolutely
bullet proof)

IT & SECURITY

Anomaly detection in system usage patterns (with features like process load,
frequency, volume) using NNs.

What was special: extra features from document content (type of document being
accessed, topic modeling, classification).

MEDIA

Built tiered IAB classification [3] for magazine and newspaper articles.

Built a topic modeling system to automatically discover themes in large
document collections (articles, tweets), to replace manual taxonomies and
tagging, for consistent KPI tracking.

 _What was special_ : massive data volumes, real-time processing.

REAL ESTATE

Built a recommendation engine that automatically assembles newsletters, and
learns user preferences from their feedback (newsletter clicks), using multi-
arm bandits.

 _What was special_ : exploration / exploitation tradeoff from implicit and
explicit feedback. Topic modeling to get relevant features.

LIBRARY DISCOVERY

Built a search engine (which is called "discovery" in this industry), based on
Elasticsearch.

 _What was special_ : we added a special plugin for "related article"
recommendations, based on semantic analysis on article content (LDA, LSI).

HUMAN RESOURCES (HR)

Advised on an engine to automatically match CVs to job descriptions.

Built an ML engine to automatically route incoming job positions to hierarchy
of some 1,000 pre-defined job categories.

Built a system to automatically extract structured information from (barely
structured) CV PDFs.

Built a ML system to build "user profiles" from enterprise data (logs, wikis),
then automatically match incoming help requests in plain text to domain
experts.

 _What was special_ : Used bayesian inference to handle knowledge uncertainty
and combine information from multiple sources.

TRANSPORTATION

Built a system to extract structured fixtures and cargoes from unstructured
provider data (emails, attachments).

 _What was special_ : deep learning architecture on character level, to handle
the massive amount of noise and variance.

BANKING

Built a system to automatically navigate banking sites for US banks, and
scrape them on behalf of the user, using their provided username/password/MFA.

 _What was special_ : PITA of headless browsing. The ML part of identifying
forms, pages and transactions was comparatively straightforward.

\--------------

... and a bunch of others :)

Overall, in all cases, lots of tinkering and careful analysis to build
something that actually works, as each industry is different and needs lots of
SME. The dream of a "turn-key general-purpose ML" is still ways off, recent AI
hype notwithstanding.

[1] [http://rare-technologies.com/](http://rare-technologies.com/)

[2]
[https://en.wikipedia.org/wiki/Electronic_discovery](https://en.wikipedia.org/wiki/Electronic_discovery)

[3] [https://www.iab.com/guidelines/iab-quality-assurance-
guideli...](https://www.iab.com/guidelines/iab-quality-assurance-guidelines-
qag-taxonomy/)

------
chudi
We use ML for recommendation systems (I work at a Classifieds company)

------
the-dude
PCB autorouting

~~~
gtsteve
It strikes me that you could do this with an algorithmic approach - is there
some additional factor when building PCBs that's specifically hard?

Is this one of those things like the bin packing problem [1] where on first
glances you'd expect it to have a definitive solution but it's actually
deceptively very hard?

[1]
[https://en.wikipedia.org/wiki/Bin_packing_problem](https://en.wikipedia.org/wiki/Bin_packing_problem)

------
fatdog
Can't say for what/where, but, yes. Use it to super-scale work of human
analysts who evaluate the quality of some stuff.

