
Ask HN: Ways to automatically make inferences from data? - akudha
Suppose I have sales data - it is easy to run a few queries to find out top selling items, top regions etc.<p>But, is it possible to do this on <i>any</i> set of data, without knowing anything meta about the data at all? In other words, can this be generalized? Are there any models, theories I can learn to achieve this?
======
JPLeRouzic
I am not a specialist of this field but here is how I would deal with this
problem:

My understanding is that it is possible to find the main components (items,
regions, etc) in a dataset, for example with PCA [0]. However it will not name
those components, but it might be quite easy to infer the name of each
component. Once you know the data in one component, you can find their
min/max. I guess there are several similar mathematical techniques to do the
same job and there are also several user friendly Business Intelligence
software.

[0]
[https://en.wikipedia.org/wiki/Principal_component_analysis](https://en.wikipedia.org/wiki/Principal_component_analysis)

~~~
akudha
If I had a CSV, I can import it into a database and simply run queries,
systematically, isn't it?

~~~
JPLeRouzic
You can alway do that, the question is (as the other comments told) "will your
users get meaningful answers?"

You could simply use Excel (it has PCA) also to test your ideas. There are
many resources on Internet.

------
thedevindevops
Yes - but not in the way you think.

A program can be written to read in a dataset and correlate each set of data
with every other set of data in that dataset (think producing a set of graphs
comparing every combination of 2 properties of that dataset).

Somewhere in that mess of graphs there will be useful ones but the program
won't be able to tell the difference.

You still need a human for that.

------
dhkxh
Your title is about inferences, but your text describes a summary, a
descriptive statistic or an aggregation - they are very different problems.
It's quite straightforward to "find a top item" regardless of the dataset but
I don't know why you would want this automated at all.

------
natalyarostova
As a general answer to a general question: not really.

Domain knowledge and prior knowledge on how the data is associated with
reality exists in the brain of the human, who expresses that using code and
science for the problem at hand.

Otherwise the computer doesn't know the difference between sales data. And any
other data.

