Ask HN: What do I need to learn to be data analyst? - x____x
======
CuriouslyC
Stats: Understand common distributions (gaussian, exponential, beta, gamma,
laplace, bernoilli, etc). Understand how various goodness of fit tests such as
chi squared, t-test and ks-test work, and when they're applicable (or not).
Understand linear regression, and its basic extensions such as logistic
regression, and generalized linear models (also, what sort of data breaks
them). Principle component analysis is also useful, if you take the time to
understand how it works.

Computer science: Clustering algorithms (k-means, hierarchical clustering,
etc), some basic graph theory and graph distance/shortest path algorithms.

Validation: Learn to identify non-stationary data and autocorrelation of model
errors. K-fold cross validation and ROC curves are also a good idea.

Programming: Enough knowledge to efficiently extract information from semi-
structured data. Basic tabular data manipulation/transformation. At least one
data visualization library. Python, pandas and matplotlib are probably your
best bets here.

Data management: SQL is a safe bet. Smaller shops may use excel. Learning
map/reduce with spark may be helpful as well.

Domain knowledge: You need to understand the domain you're analyzing
reasonably well. Pick an area and learn it. If you're not sure what you want
to work on, I suggest starting with biology or finance.

Writing: To be successful as an analyst, you need to be able to turn
visualizations and the output of statistical tests into a story that's
accessible for a lay audience. Few analysts take this part as seriously as the
technical side, but it's tremendously important.

~~~
jmcminis
This is a really nice summary of some of the technical components required.
You also need to know how to do different kinds of analysis to answer
different kinds of questions. A few more things:

0\. Scientific method - probably true for all domains. Not really a kind of
analysis, more an approach to doing analysis.

1\. Cohort analysis - used in aquisition and retention analysis.

2\. Model building - used in all kinds of financial analysis.

3\. A/B/... testing - determining the difference between 2 or more
populations.

4\. Exploratory - understanding the relationships in your data to develop
intuition about it.

There are plenty of analysis techniques in use. You can learn more about these
and others if you survey blogs and other literature. One that I find
interesting is Tom Tunguz. He has a particular theme, but his analysis is very
good. The methods and way of thought are transferrable.
[http://tomtunguz.com/](http://tomtunguz.com/)

------
stewbrew
You should be more precise on what kind of role you're aiming at. I know
plenty of "data analysts" who use Excel and little else. It really depends on
the task at hand.

------
fedecaccia
First at all, a huge background on maths and statistics. Then, select a
programming language and become as good as posible. I recommend you to choose
python because it has a lot of libraries of data science (I recommend you to
learn numpy, pandas, scipy and sklearn). After all, you should consider to put
tensorflow in your learning curriculum.

