For writing R packages: https://github.com/jtleek/rpackages
For writing scientific reviews: https://github.com/jtleek/reviews
Or on the future of stats: https://github.com/jtleek/futureofstats
Bayesian statistics can be very powerful, but it would be a terrible idea to prefer Bayesian approaches to Frequentist ones in all situations.
As If Frequentists somehow didn't need priors. Everyone starts with prior knowledge. We might as well use it. Or do you advocate not using every scrap of knowledge available to you? That would be stupid.
Sure, prior knowledge can be shaky, or difficult to justify. But at least, a Bayesian will be explicit about it, instead of, like, sweeping normal probability distribution assumptions under the linear regression rug.
> it would be a terrible idea to prefer Bayesian approaches to Frequentist ones in all situations.
Name three examples that doesn't involve the Frequentist using better prior information than the Bayesian.
By the way, Bayesians know that using probability theory correctly is sometimes intractable (combinatorial explosion and all that). In those cases, they will use approximations. But at least, they will know it's an approximation.
You really should read chapters 1 and 2 of Probability Theory: the Logic of Science. They give a good feel of why Bayesians are correct as a simple matter of fact.
I'm testing the effectiveness of a drug. Drugs of this class have a certain likelihood of working, the noise in my data is known, the experimental group did this much better than the control... does the drug really work? So far so trivial, in either Bayesianism or Frequentism. Now, I happen to mention that I tested 10000 variants of this drug and only sent data for the one that seemed to work. The rest aren't interesting after all. Under Frequentism, it's easy to take this into account. Under Bayesianism, it requires complex definitions of observations, and is easy to overlook as there's no space for it in the formula.
I have a collection of unfair dice. Unfortunately, they all look the same and got dumped on the floor. Now someone grabbed one off the floor at random and wants to make bets with me about it. Even experienced Bayesians are likely to mix up their propositions in a case like this. I say that from having read discussions of similar problems. Yes, if you do it right, it comes out correctly, but Frequentism makes sure you've thought about what you're asking in the same way Bayesianism makes sure you've thought about your priors.
Somebody else will have to give a third example.
Bayesianism and Frequentism are based on the same math, and math is math. If you use them correctly, they'll get you the same answer every time. The difference is what they make easy, and what mistakes they protect you against.
There are very many real-world problems that have fast and accurate frequentist solutions, but slow and difficult Bayesian solutions. Despite my personal bias -- my research primarily relies on Bayesian inference -- I can't fathom how one can reasonably argue that frequentist approaches are always inferior, even in applied statistics.
My original claim is broader than I wanted it to be. The fact is, a Frequentist approach will always be less accurate than the correct application of probability theory. But of course,
> Bayesians know that using probability theory correctly is sometimes intractable (combinatorial explosion and all that). In those cases, they will use approximations. But at least, they will know it's an approximation.
The key to the Bayesian outlook is to remember that no matter what, there is a correct answer, even if you can't afford to compute it. As Eliezer Yudkowsky put it, there are laws of thought. Want to use Frequentist tools? Sure, why not. Just remember that they often violate the laws of ideal though. Some inaccuracy inevitably ensues.
Second example: Okay, Bayesian statistics are harder. That's a disadvantage.
> If you use [Bayesianism or Frequentism] correctly, they'll get you the same answer every time
If they gave invariably the same answer, then, why the endless debates? By the way, here is an apparent factual disagreement bettwen Bayesianism and Frequentism:
> There are both really good and horrible Frequentist and Bayesian statisticians.
Yeah. If I had to choose between Fisher and Anonymous Bayesian, I may chose Fisher.
However, unless both kind of statistics yield the same results (I don't think they do), then at least one of them is bogus, by application of the non-contradiction principle. So, while I can imagine there are good Frequentists Statisticians out there, I insist that frequentism itself is bogus.
They don't show that the frequentist approach is wrong. If both methods result in the same answer and frequentist methods are easier to use then what is the problem?
Also, the Frequentist approach is not wrong. It's inaccurate, to the extent its results differ from the Bayesian ones. This inaccuracy tend to go down as we gather more data. Which is a good thing, or else science itself wouldn't work.
One more thing. You said "in some sense". Are you seriously suggesting that the assumptions behind Cox's Theorem can reasonably be challenged?
Sure: probability is continuous.
Well… To me, it is obvious.
(though one does have to specify the hypotheses to be tested more exactly than, say, "the effect is not zero.")
Both Newtonian mechanics and Frequentists statistics are approximations. Which makes them both useful, and inaccurate (how inaccurate depends on the situation at hand).
I saw you posted a link to something on lesswrong, which is confusing an application of Bayes Rule with Bayesian statistics. Maybe you should read what an actual statistician has to say on the matter: http://normaldeviate.wordpress.com/2012/11/17/what-is-bayesi... instead of a Harry Potter fanfic author.
I have read the link you speak of. I have already responded here: http://normaldeviate.wordpress.com/2012/11/17/what-is-bayesi...
I don't think my LessWrong link confuses Bayes' Rule with Bayesian statistics. Why do you think Eliezer responded 1/2 to the brain teaser? He does not say it, but I'm pretty sure he just assumed that a mathematician who has 2 boys is twice as likely to spontaneously say "I have at least one boy", compared to a mathematician who has only one boy.
The disagreement between his inference (which he did not know was "Bayesian" at the time) and "Orthodox" statistics didn't come from Bayes' Rule. It came from the use of a non-uniform prior to begin with. Which Frequentism rejects, because it's "subjective". So, instead of using this highly relevant prior information, it just uses a uniform prior. (By the way, this is nuts. A scientist should never throw away relevant information.)
Both methods then use Bayes' Rule. They just start with different priors: best guess and objective-looking, respectively. (I'm still wondering why anyone would use antything but one's best guess.)
Here's an example of a real-world problem I had to solve in my last job. We wanted to compare two treatments on an object, OLD and NEW, and evaluate which was better. We got a bunch of volunteers and showed them each a random sample of objects, the OLD treatment, and the NEW treatment (suitably blinded of course) and asked them to rate each on a scale of 1-5. The goal was to determine if the OLD treatment should be replaced by the NEW treatment.
What's your Bayesian approach to solving this? I was the closest thing to a domain expert, and I certainly didn't have any prior beliefs about what the ratings would be (except for a weak expectation that OLD>NEW).
Tell me the tests you would have run, and then I'll tell you the non-Bayesian thing I did that gave a very solid answer and took well under a second of computation.
Condescension doesn't impress me. Explanations do. Could you please
explain his "perfectly clear" reply to me?
> What's your Bayesian approach to solving this?
Applying probability theory as best I could. If it turns out to be
too complicated, or computationally intractable, I'll resort to
For this particular case… Well, first, we don't have 2 treatments,
we have as many treatments as we have objects. It's like flipping a
coin. When you flip a coin in the obvious way, you have no idea which
side will eventually come up, because even if you know it came up
heads the previous time, and you tried to perform the same move again:
in fact, you don't perform the same move, and you don't know how
exactly your second move differed from the first.
So first, there is an unpredictable variability in the way the
treatment will be performed. This is the first source of uncertainty.
Second, the objects aren't all the same. I expect them to be more or
less clean to begin with, and other characteristics such as sharp
angles may more or less hinder the treatment. This is the second
source of uncertainty.
Once you perform a treatment on an object, it had a definite effect.
Unfortunately, it is hard to define how well it actually went. Maybe
you can come up with a definite criterion, but apparently, since you
needed to ask a sample of human volunteers, that criterion is either
not well understood, or hard to measure properly. Anyway, you have
access to an uncertain measure, which in this case is a human giving
you a scale from 1 to 5. This is the third and fourth sources of
uncertainty (akin to the first and second one respectively:
variability in how a given human will assess an object; and
variability across humans).
From what I have understood of what you told me, there were 2 sets of
objects, and 2 groups of people. One group looked at every object
before any treatment. Then you applied OLD to half objects, and NEW
to half the other. You then showed the newly cleaned up objects to
the second group. Of course, you don't tell the group which objects
they get to examine. An alternative would be to photograph every
objects before and after the treatment, then show both sets of photos
to everyone, and ask them to rank the objects (both before and after),
and the perceived efficiency of the treatment. I don't like it much
however, because photos add an additional source of uncertainty.
This looks like a complicated version of a compound estimation
problem, described in chapter 6 of Jayne's PT:TLOS (elementary
parameter estimation). Pretty basic stuff.
We're not finished yet. You wanted to know which treatment should be
chosen: OLD or NEW. You basically have 3 alternatives: always apply
OLD, always apply NEW, or look at the object then decide which
treatment you apply. For each alternative, you should have a
probability distribution over the "cleanness" distribution of the
objects. Decide a utility function for the cleanness distribution,
and chose the method that maximizes expected utility.
And now I'm stuck because I don't know probability theory deeply
enough to actually give you a non-ambiguous procedure in less than a
couple days. Not to mention that I don't have your prior information
about OLD and NEW (why you expect OLD to be better etc.)
Anyway, I believe a specialist would only need a couple minutes, or a
couple hours tops. (To find the procedure, that is. Actually running
the numbers in a computer may or may not be expensive. I personally
have no idea.)
And so the data I get is:
(object 1, SCORE OLD 1, SCORE NEW 1)
(object 2, SCORE OLD 2, SCORE NEW 2)
* I don't have any prior knowledge of how the scores are going to be distributed
* The primary reason I expected OLD to be 'better' than NEW is because OLD had theory behind it, and NEW was an ad hoc tweak of a previously rejected treatment
* The very initial text corpus was fixed
* The objects were selected from the initial corpus through a multistep process, at least one of those steps was a stochastic optimization of an objective function
* One of the treatments was also obtained through a constrained optimization
* The initial text corpus wasn't all that large, and the samples were pretty small
So what I did was look at all observations for which OLD SCORE was different than NEW SCORE. Say there were n of them, and say that for p of those OLD<NEW. If NEW wasn't producing results that scored better than OLD, we would expect p/n to be less than 0.5, up to some uncertainty from the random sample. This is just a straight up test of a binomial parameter, so I computed confidence intervals and found that 0.5 was several standard deviations below the observed p/n, from which I concluded that NEW was better than OLD and we should switch.
(as described here: http://en.wikipedia.org/wiki/Sign_test)
In a more responsible analysis I could have done something more sophisticated and modeled the scores as a sum of object effect, tester effect, treatment effect, fixed term, noise, etc. Given the magnitude of the sign test score, I didn't feel that was necessary.
Indeed, given the apparent massive evidence in favour of NEW, you probably didn't need to twist your brain further.
Now, if we're doing A/B testing on a commercial web site, I think we should use every scrap of information available to us, and go into full Bayesian mode if at all possible: if of A and B, one is worse, you don't want to make one too many test before you figure this out. The sooner you positively know which is best, the better.
How does sanskritabelt solve the problem, I wonder? Set an arbitrary number of volunteers for each group to say "this is the point we can stop collecting data", assume normalcy of ratings, take the means and use some test to see if the difference is significant or not?
Really the problem of the problem is defining what is meant by 'better'. Is 'better' a higher score? (Which scoring function do you use -- mean, median, most 5-stars, least 1-stars, something else that takes into account the number of ratings per rating bucket?) Does it have to be higher by a certain amount (how meaningful are deltas in your scoring function)? Once it's defined what is meant by "A is better than B", then you can go about the business of computing the likelihood of your data given "A is better than B" to fulfill the RHS of Bayes' theorem.
EDIT: its this kind of experience that make me reach for things like nonparametric tests and the bootstrap.
Each sample of the type of data that I'm often dealing with tends to be nested in nature. Yes, I do have a script that can flatten out the nested dicts into a regular table, but that always results in a blowup into hundreds of columns.
Nice suggestion to share the raw data. I've never seen a researcher do that, I think many don't even save the raw data to disk before extracting what they want, but I always try to.
actually, this format is also nice because iterating over the lines of a file is very similar to running through a mongo cursor. that makes it easy to reprise choose to work with both inputs.
The strange part is, the file and the script were provided by the same person.
Thanks very much for any help.
pgAdmin (Postgres) - http://www.pgadmin.org/
MySQL Workbench (MySQL) - http://dev.mysql.com/downloads/tools/workbench/
SQL Developer (Oracle) - http://www.oracle.com/technetwork/developer-tools/sql-develo...
I strongly recommend learning some SQL though; this will give you the ability to bulk edit columns and apply formulae.
That said, you can:
- create an external table in an RDBMS (any of the above will work), which allows you to work on a flat file in place: http://www.fromdual.com/csv-storage-engine
- import the TSV to an RDBMS, work with it, and export it again.
I'm sure someone has written what you're asking for, but I don't get the appeal. RDBMS aren't scary, you can install MySQL and work with it for free, and if you use the GUI tool you don't even have to touch SQL.
edit: Excel is a tool that can edit TSV files, and as a bonus it looks and works exactly like Excel. What exactly do you want?
As I say, the idea would be to edit a database/file directly on the web (i.e. no local excel file). Since the original post talked about TSV I thought there might be a niche for such a light JS editor.