
Ask HN: Briefly and succinctly – when is ML helpful for engineering? - photon_off
Given how ML is now the cool thing, I realize this may make me look, um, not cool:<p>How would you describe what ML can offer me, a non-ML-saavy tech organization?<p>When does it work better than manually programming something?<p>Which kinds of data work best?<p>How extreme of data can the model be fed before giving non-sense results? And will it &quot;know&quot; how hard it is guessing?<p>What is the format of its results?<p>... etc ...<p>Tangentially, I may have many, many more follow-up questions.  Are there any freelance &quot;AI Consultants&quot;? And if so, where best to find them?
======
tlack
I'm still learning too, so take the following hand wavey reply with a grain of
salt..

What can ML offer?

ML allows you to more easily make predictions from data you already have. An
example is taking search terms and grouping them into their subject matter, or
taking images and identifying things inside them visually. If you deal in
sales, you could predict future sales based on other factors that you believe
are correlated (weather? clicks?). If you deal with customers, you could more
quickly identify what type of issues people are having (hardware issues? card
failures?), or proactively suggest solutions.

Better than programming:

It works better when, to code a new feature, you'd need an abundance IF
statements whose logic you'd have to work out by hand. In a very broad way,
the machine is figuring out how to approach your task just with lots of
examples and scary math. You can get a lot of different "answers" without
having to write that much more code, just by trying different data and
structures.

Practically it's easier than programming to keep updated as well, because if
you can find example inputs that it guessed wrong, you can get automatic
improvements to results by retraining with correct predictions.

But of course, the ML system itself is very complex, and it involves a lot of
resources to design it, curate input data, and train on costly GPUs. This
combination of the system's design, and the learned state of the systems
innards, is called "the model".

Input data:

There are many different ML systems designed to use different kinds of input.
Some use pure text, image data, audio data, graphs of interconnected thingies.
The easiest is when the data is very uniform in structure, grouped into
labeled columns (called "features"), with each column having a meaningful
value. To train the system, you must also supply a "target" feature for each
record, which are examples of what you'd like the system to predict. Anything
that fits naturally in an Excel sheet might be a starting point. Generally,
you want quite a few examples, but the exact amount of data you'd need varies
with how difficult your task is.

Bad input, bad results:

Varies. From the practical side - ML systems usually produce confidence scores
which you can use to avoid embarrassment. You can then manually label those
confusing examples and feed them back in for training. In terms of the model
itself - there are many ways of interpreting and evaluating accuracy, and the
system can give you examples that confuse it.

Results, two common examples:

Some systems produce a number as their output - called regression. (Picture a
sales prediction.)

Others group things into a set of pre-defined categories - called
classification. (Picture a system that can tell if there's a giraffe in an
image.)

What those outputs mean, and how they are to be used, is part of the design of
the model.

~~~
photon_off
This was extremely helpful, thank you!

Perhaps you can help me with some follow-up questions:

\- Let's imagine a standard excel sheet (a 2d array), with columns "A", "B",
"C", "D" ... "Z".

1) Let's say I want to create a model, that takes this input: A single row
where all columns have values, except for some (random) columns. I want it to
autopopulate those columns with values that are most probable according to the
data I trained it with. That is, _every_ column is both a "feature" and a
"target". Is this possible?

2) Can I train the model by telling it this: "This input should DEFINITELY NOT
confuse you" \-- that is, can I "weigh" the inputs (or do I just put in more
of them?)

~~~
tlack
1\. You can predict multiple values, but it still has to be trained on target
values for those features. So, you could predict "D", "P", and "Z", but not
any at random - you'd have to design it that way. Look into "multidimensional
regression"

2\. That seems logical -- supplying a "confidence level" with the training
data itself -- but I haven't heard of it, and can't seem to find anything on
the search engines.

