
Statistical Computing for Scientists and Engineers - mubaris
https://www.zabaras.com/statisticalcomputing
======
melling
I’ve got this one and a few others on a Github repo:

[https://github.com/melling/MathAndScienceNotes/tree/master/s...](https://github.com/melling/MathAndScienceNotes/tree/master/statistics)

I haven’t listened to the Notre Dame course yet. How would others rate it?

I have gone through the UC Berkeley Irvine 131a class. I have high-level
notes:

[https://github.com/melling/MathAndScienceNotes/blob/master/s...](https://github.com/melling/MathAndScienceNotes/blob/master/statistics/uc_irvine_131a/2013_stats_131A_uc_irvine.md)

and detailed notes in PDF for the first four classes;

[https://github.com/melling/MathAndScienceNotes/blob/master/s...](https://github.com/melling/MathAndScienceNotes/blob/master/statistics/uc_irvine_131a/stats_131a_lecture_01.pdf)

Actually, wrote this up in a blog yesterday:

[https://h4labs.wordpress.com/2017/12/30/learning-
probability...](https://h4labs.wordpress.com/2017/12/30/learning-probability-
and-statistics/)

~~~
koprulusector
Thank you so much for sharing this!!!

------
Myrmornis
This lecture course appears to attempt to cover way, way too much material in
each lecture.

~~~
apohn
The professor himself probably agrees with you. In lecture 18 the professor
states

At 3:40 - For this topic he actually has has 6 or 7 lectures, 100 pages of
notes. But it's compressed to an introduction.

At 7:10 - He states HMM is 2-3 lectures, but he's going to compress it to half
a lecture.

Honestly, I had courses like this in grad school. The were typically seminars
and usually graded on a curve because people crammed stuff into their brains
as quickly as possible and barely understood anything! They were meant to give
you a broad coverage of a field, not comprehensively cover any particular set
of topics.

~~~
j9461701
Cryptography in undergrad was a course like that at my school. Here is the
most basic understanding of group theory we can possibly get away with, here
is how it applies to RSA. Next lecture now we're moving on to elliptical
curves....

Perhaps some topics are simply too big to really teach effectively through
lectures. You need to go digging on your own to understand all the myriad
details, and put in the hours alone with a textbook.

~~~
adrianN
The point of those lectures, especially in undergrad, is to reduce the
"unknown unknowns". You don't know that you could look up how some technique
actually works if you haven't ever heard about it. If you want all the details
you need good textbooks, or, for more advanced topics, the original papers.

------
MichailP
I can never shake off the feeling that statistics is somewhat lacking compared
to the rest of "fundamental" sciences. To me it just lacks sort of brutal
honesty that is present say in physics. And how come that the math often
"looks" scary? To me this also seems intentional, like someone is trying to
hide the lack of real content. Honestly, do we really need a whole field to
run a curve through a cloud of points? Please, dispute me in comments below, I
would really like to be wrong about this.

Edit: Let the down-voting begin

~~~
radford-neal
There are some problems with the field of statistics. They may stem partially
from its role in "certifying" results in other sciences, which induces a
certain conservatism, and a desire by many less scrupulous practitioners to
just crank out the result they want, never mind understanding how the method
works, or whether it is appropriate. It also hasn't (perhaps until recently)
been that "sexy" a field, so it may attract fewer really bright people.
Finally, there is a tendency for bright people who come into the field from
math to think that statistics is a sub-field of math, when in fact there are
philosophical issues of inductive inference that are not just math, and
practical skills in data analysis that are not necessarily easy just because
you're good at math.

Some particular problems:

1) Scary math and pointlessly obscure terminology are indeed a problem. For
unnecessarily scary math, the early literature on Dirichlet process mixture
models is a good example - almost like they were designed to be
incomprehensible to most people who do actually have enough background to use
the results. At a lower level, there are pointlessly obscure and misleading
terms like "score function", "coefficient of determination", and worst of all,
"standard error of the estimate" for the estimated standard deviation of
residuals in a regression model (worst since it is not in fact a "standard
error" by the general definition of that term).

2) Introductory statistics is generally taught from a naive "frequentist"
perspective, because that's been the tradition for the last century or so. The
justifications offered in such courses for using p-values and confidence
intervals are not defensible - they just sound plausible if you don't know
better. There is no good solution, since more sophisticated frequentist
arguments will be beyond the level of the course, and shifting to a Bayesian
perspective cuts the students off from the scientific literature with
p-values, etc. that they will need to be able to read.

3) Outsiders coming into the field often have strange ideas. You might think
that physicists capable of building a billion dollar accelerator would be able
to recognize when a statistical method they think of is nonsense, but you'd be
wrong. There's a tendency for anyone who learns information theory before
statistics to think that information theory is tremendously relevant - but no,
rephrasing maximum likelihood or Bayesian methods in information theory terms
may sometimes be slightly helpful in thinking about them, but doesn't really
add anything fundamental. And no, there's nothing particularly special or
interesting about distributions that maximize entropy subject to some
(generally arbitrarily selected) constraint.

4) There's a tendency to want more than you can get. There is no one
"objectively correct" model/prior/analysis for a data set. Subjective
assessments are unavoidable. But a lot of people don't want to accept this
fact, and devote great efforts to ways of trying to pretend otherwise.

However, if you think statistics is just a simple matter of running a curve
through a cloud of points, you're very wrong. Even running a curve through a
could of points is a complicated and subtle enough task that deep issues
arise, and these issues become much more obvious if you're trying to fit a
function of hundreds of variables rather than just one. And if you're trying
to not just "fit" data but come to valid conclusions about cause and effect,
or about underlying latent variables that will provide useful information in
new contexts, then you really do need to know a lot.

~~~
gone35
_There 's a tendency for anyone who learns information theory before
statistics to think that information theory is tremendously relevant - but no,
rephrasing maximum likelihood or Bayesian methods in information theory terms
may sometimes be slightly helpful in thinking about them, but doesn't really
add anything fundamental.

And no, there's nothing particularly special or interesting about
distributions that maximize entropy subject to some (generally arbitrarily
selected) constraint._

Wow, some two heavyweight opinions there. Care to elaborate?

~~~
radford-neal
Some ideas like the Minimum Description Length principle derived from
information theory turn out to be just rephrasings of already-known
statistical ideas. This can occasionally provide more insight, but also can
lead to ridiculously irrelevant things like looking in detail at how one might
produce a code for data, rather than just saying the code would have length
equal to log2 of the probability of the data, which of course leads to just
forgetting about the code altogether and looking at probabilities instead.

The maximum entropy idea is just wrong (in general), in that there is no good
argument for doing it. Actually, it's "not even wrong", since maximizing the
entropy subject to the observed values of some expectations is just not
possible, since we do not observe expectations, but rather particular finite
data sets.

