
Philosophy and the practice of Bayesian statistics [pdf] - mitmads
http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf
======
chimeracoder
I never thought I'd see a 31-page paper by Andrew Gelman on the front page of
Hacker News. And certainly not a paper coauthored with a well-known
frequentist!

I was lucky enough to work with Prof. Gelman as his research assistant while I
was in school - I can't even being to tell you how prolific and brilliant that
man is. His name may not be known very much outside academic circles, but I'd
go as far as to say that he's the most important Bayesian statistician since
Thomas Bayes.

He used to be a contributor to FiveThirtyEight, back before the Times picked
it up. I used to explain FiveThirtyEight as 'one of the six blogs Andrew
Gelman writes for'. Now, I explain Andrew Gelman as 'a former contributor to
Nate Silver's blog'. How times have changed!

Gelman's approach to statistics is more wholly Bayesian than most people with
a moderate level of statistical training are likely familiar with. It was from
Gelman that I learned why I never need to perform an F-test[0]; at the same
time, it was from Gelman that I learned some of the potential pitfalls of pure
Bayesian reasoning[1] (and how to address them).

When people ask me where to get started with statistics, both of the books I
recommend are Gelman's: _Teaching Statistics: A Bag of Tricks_ and _Data
Analysis Using Regression and Multilevel/Hierarchical Models_.

Both have tremendously off-putting titles, but they're actually incredibly
accessible. Gelman is great at many things, but picking sexy titles is not
one.

If you're interested in understanding the concepts behind this paper, I'd
start there.

[0] <http://andrewgelman.com/2009/05/18/noooooooooooooo/>

[1] The linked paper provides a good analysis

~~~
realitygrill
Shalizi is no slouch either - his notebooks are _fascinating_.

<http://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/>

~~~
darkmethod
Thank you so much for sharing this.

Tidbits of resources like this is _the_ reason I frequent Hacker News.

~~~
realitygrill
You're welcome. You might try his blog, too - his views on subjects like the
"wisdom of crowds" and IQ are intellectual fun.

[http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/315.htm...](http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/315.html)

------
zenburnmyface
If you are interested in the _practical_ practice of Bayesian methods (and you
love Python), check out our open-source project/book _Bayesian Methods for
Hackers_ :

[https://github.com/CamDavidsonPilon/Probabilistic-
Programmin...](https://github.com/CamDavidsonPilon/Probabilistic-Programming-
and-Bayesian-Methods-for-Hackers)

We aim to empower the non-mathematician with really cool tools and methods to
solve otherwise very difficult problems. Plus it's all opensource, and every
plot/diagram is reproducible and extendable.

~~~
tmarthal
As an aside, I just want to thank you for making the project/book text
available as iPython notebooks. I haven't seen a mathematical writeup as
beautiful and interactive as the chapters that you've put out. I've only had
time to go through a couple of them, but it really is a treat.

Also, I've learned so much more about how people use python to do analysis and
all sorts of other things through ipynb files than reviewing traditional
python libraries/code. I wish more people would publish using them.

------
pseut
Haven't had time to read the whole article yet, but these two paragraphs from
the conclusion (p. 24-25) are excellent:

 _"In our hypothetico-deductive view of data analysis, we build a statistical
model out of available parts and drive it as far as it can take us, and then a
little farther. When the model breaks down, we dissect it and ﬁgure out what
went wrong. For Bayesian models, the most useful way of ﬁguring out how the
model breaks down is through posterior predictive checks, creating simulations
of the data and comparing them to the actual data. The comparison can often be
done visually; see Gelman et al. (2004, Chapter 6) for a range of examples.
Once we have an idea about where the problem lies, we can tinker with the
model, or perhaps try a radically new design. Either way, we are using
deductive reasoning as a tool to get the most out of a model, and we test the
model – it is falsifiable, and when it is consequentially falsified, we alter
or abandon it. None of this is especially subjective, or at least no more so
than any other kind of scientific inquiry, which likewise requires choices as
to the problem to study, the data to use, the models to employ, etc. – but
these choices are by no means arbitrary whims, uncontrolled by objective
conditions._

 _"Conversely, a problem with the inductive philosophy of Bayesian statistics
– in which science ‘learns’ by updating the probabilities that various
competing models are true – is that it assumes that the true model (or, at
least, the models among which we will choose or over which we will average) is
one of the possibilities being considered. This does not fit our own
experiences of learning by finding that a model does not fit and needing to
expand beyond the existing class of models to ﬁx the problem."_

And section 4, which discusses issues that arise in Bayesian statistics when
working with multiple candidate models, is interesting and agrees with my
limited experience, especially 4.3: "Why not just compare the posterior
probabilities of different models?"

ps (to the submitter), it might be helpful when submitting a 30 page paper to
mention what part of the paper you'd like to discuss. It makes it easier to
get started.

------
tunesmith
As someone who didn't study statistics in college, a paper like this is right
in that uncomfortable no-man's land between what I understand and what I'm
interested in - it seems to tie into several subjects I have layman's interest
in.

For instance, there is the controversy over how useful models are - are they
worthwhile goals we can actually draw conclusions from, or are they simply
shortcuts on our way to a more reductionist understanding of a phenomena? Is
"emergence" a meaningful concept or an empty concept? Is "systems thinking" a
valid concept or just a lack of discipline in the effort to understanding
things in a reductionist manner?

People here seem to hate Stephen Wolfram but his writing has made some
concepts approachable to me that I might not have grasped otherwise - for
instance, that computational irreducibility means that even if the world is
entirely reductionist, it still doesn't mean that we can deduce the
reductionist reality/inputs from an output. And so therefore, models are
useful even though they are wrong. This is also a point that Paul Krugman
often makes about economic models - people who disregard models on the grounds
that they are wrong just don't grasp the value of them, he argues.

Most of what I've learned about Bayesianism is what I've read from the first
few articles over at lesswrong.com - but I noticed pretty early on that I had
a discomfort in using probability as a description of what I _believed_ to be
true. It seems the general point of this paper is that Bayesianism is useful
for deductive techniques - as a tool in a toolset to support a frequentist
view? - but not so much as an expression of a subjectivist philosophy. I
appreciated this point:

"Beyond the philosophical difficulties, there are technical problems with
methods that purport to determine the posterior probability of models, most
notably that in models with continuous parameters, aspects of the model that
have essentially no effect on posterior inferences within a model can have
huge effects on the comparison of posterior probability among models."

More generally the paper seems to be making the point that using the Bayesian
philosophy to address models is improper in general since the premise of
Bayesianism is to update _beliefs_ based off of evidence/data, while we know
that belief-in-a-model is pointless since models are wrong. But past that
point I got pretty lost.

~~~
loup-vaillant
> _I noticed pretty early on that I had a discomfort in using probability as a
> description of what I_ believed _to be true._

If it can ease your discomfort: what else? In colloquial language, we already
do have pretty good descriptors of subjective beliefs, such as "I don't think
so", "I'm damned sure", "maybe"… It is only natural to call the quantitative
version of those "probabilities" —at least to me.

As for frequency properties of seemingly "random" phenomenons, they're a
property of the real world, ready for the study. I'd have much more discomfort
calling _that_ "probabilities".

~~~
tunesmith
I agree that colloquially speaking, it aligns most with what we believe. What
I have trouble with is that you and I could assign different probabilities
(via beliefs) to an external event that has only one probability of happening.
Just in that sentence I used the word "probability" correctly both times, yet
it has two different conflicting meanings.

~~~
loup-vaillant
Then just use different words. Break it down to "degree of subjective belief"
and "frequency property" if you have to.

Still, I prefer to use the short hand "probability" to mean the former, for
two reasons: first, "degree of subjective belief" is what most lay-people will
understand when we say "probability". Second, even scientists have this
meaning messing up with their intuitions, even if when they write papers, they
really do mean "frequency property". That can be a major obstacle for science.
Let me quote Edwin T. Jaynes (long, but worth it):

 _Those who cling to a belief in the existence of "physical probabilities" may
react to the above arguments by pointing to quantum theory, in which physical
probabilities appear to express the most fundamental laws of physics.
Therefore let us explain why this is another case of circular reasoning. We
need to understand that present quantum theory uses entirely di fferent
standards of logic than does the rest of science._

 _In biology or medicine, if we note that an eff ect E (for example, muscle
contraction, phototropism, digestion of protein) does not occur unless a
condition C (nerve impulse, light, pepsin) is present, it seems natural to
infer that C is a necessary causative agent for E. Most of what is known in
all elds of science has resulted from following up this kind of reasoning. But
suppose that condition C does not always lead to e ect E; what further
inferences should a scientist draw? At this point the reasoning formats of
biology and quantum theory diverge sharply._

 _In the biological sciences one takes it for granted that in addition to C
there must be some other causative factor F, not yet identi ed. One searches
for it, tracking down the assumed cause by a process of elimination of
possibilities that is sometimes extremely tedious. But persistence pays off ;
over and over again medically important and intellectually impressive success
has been achieved, the conjectured unknown causative factor being finally
identified as a de finite chemical compound. Most enzymes, vitamins, viruses,
and other biologically active substances owe their discovery to this reasoning
process._

 _In quantum theory, one does not reason in this way. Consider, for example,
the photoelectric effect (we shine light on a metal surface and find that
electrons are ejected from it). The experimental fact is that the electrons do
not appear unless light is present. So light must be a causative factor. But
light does not always produce ejected electrons; even though the light from a
unimode laser is present with absolutely steady amplitude, the electrons
appear only at particular times that are not determined by any known
parameters of the light. Why then do we not draw the obvious inference, that
in addition to the light there must be a second causative factor, still
unidentified, and the physicist's job is to search for it?_

 _What is done in quantum theory today is just the opposite; when no cause is
apparent one simply postulates that no cause exists —ergo, the laws of physics
are indeterministic and can be expressed only in probability form. The central
dogma is that the light determines, not whether a photoelectron will appear,
but only the probability that it will appear. The mathematical formalism of
present quantum theory —incomplete in the same way that our present knowledge
is incomplete— does not even provide the vocabulary in which one could ask a
question about the real cause of an event._

 _Biologists have a mechanistic picture of the world because, being trained to
believe in causes, they continue to use the full power of their brains to
search for them —and so they find them. Quantum physicists have only
probability laws because for two generations we have been indoctrinated not to
believe in causes —and so we have stopped looking for them. Indeed, any
attempt to search for the causes of microphenomena is met with scorn and a
charge of professional incompetence and `obsolete mechanistic materialism'.
Therefore, to explain the indeterminacy in current quantum theory we need not
suppose there is any indeterminacy in Nature; the mental attitude of quantum
physicists is already sufficient to guarantee it._

This one has been quite an eye opener, making me doubt even Many Worlds, which
for one still doesn't explain the Born statistics. Still, thanks to Eliezer's
Quantum Physics sequence, I'm now convinced that to the best of Science's
knowledge (and despite what many physicists say) the laws of physics are most
probably deterministic. Which would instantly solve the conflict by rendering
"probability" nonsensical when applied to physical phenomena.

~~~
tunesmith
That's a lot to unpack - let me make sure I have this right.

Quantum theorists see that light is necessary but not sufficient for some sort
of electron behavior. And I'm guessing that it's very hard to find an
additional contributory/necessary cause that, when combined with light, would
reliably predict the behavior.

So they use probability to communicate their findings.

If that additional contributory/necessary cause exists (but is just really
hard to discover, or perhaps light itself is just highly contributory but not
necessary/sufficient) then the probability is an effective way to communicate
degree-of-belief, and helps to combine information about what they know with
what they don't know. In other words, they are using probability to
communicate partial findings.

If that additional contributory/necessary cause doesn't exist, then the
probability is a fixed, physical part of the science. In other words, they're
communicating full findings; that light actually creates probability of an
effect.

And that confusion about "probability" is making people believe that the
second case is true when they haven't necessarily disproved the first case
yet? Which then serves as a curiosity-stopper and makes the science
"mysterious". And this messes with scientific intuitions.

Forgive me if I'm way off, I've only read the first few articles of the first
couple lesswrong sequences.

~~~
Jach
I thought I'd pop in and mention this set of lectures (now a book, huh) from
Scott Aaronson on quantum mechanics and quantum computing:
<http://www.scottaaronson.com/democritus/> Particularly Lecture 9 goes into
the basic mathematics of quantum mechanics, and explains why it's more
appropriate to consider it an extension to probability theory using complex
numbers and call them amplitudes, rather than classic probabilities. Not being
a physicist myself I don't know if this is really helpful in clearing up
professional scientists' intuitions, but it's helpful for me at least in
grasping the basics.

You're on the right track by noting that quantum theorists can use probability
to communicate their findings. A single classical probability like 0.7 is a
_summary_ , not the whole picture. For the whole picture, you need a log of
observations with time stamps, which is a tremendous amount of data because
taken to the extreme it's a balancing act between maximizing the data you know
about you and your surroundings and minimizing the number of planck unit time
steps to collect the data. Even with more realistic amounts of data it's still
a lot to pack around, so you can summarize it by constructing a probability
distribution which is more convenient mathematically, and you can summarize
that probability distribution into a handful of numbers if you need to because
those are even easier to pack around. (Like the single probability 0.7, or two
numbers u and s that characterize a Normal distribution, etc.)

So if you think of probability as just summarizing your past observations and
constraining your expectations for future observations (because what else
can?), I think it's easy to see how two people can have different
probabilities that represent a belief in a proposition or an abstract system
or something else. If both are being honest (I'm packing a lot of assumptions
in that word since this is basically Aumann's agreement theorem) and the
probabilities are different, there are a couple reasons why: either at least
one of them has information (observations) the other one does not, or they
started out with different prior probabilities. In the former case, they can
reach agreement by sharing data. In the latter case, they can eventually reach
agreement by obtaining more data or choosing a common prior. With enough
successive Bayesian updates, you can arrive at the same conclusion regardless
of where you started. (In practice, of course, things are more hairy and not
this simplistic, which is what the submitted pdf is addressing in part.)

I find it hard to grok what it means for something to have a physical
frequency property. I can understand it as a statistic of an abstract system,
i.e. a probability, but in light of Many Worlds, I don't think a frequency can
fundamentally be a property of something in the same way energy can be. But
since the energy of a photon is related to the photon's "frequency" via its
"wavelength", is energy really fundamental or is the relation just a
convenient means of making the wave mechanics useful? I've read Feynman tell
how the whole idea of forces is just a somewhat circular simplification that's
not really fundamental, even though sometimes it seems like it is. While I
have a high degree of certainty about the physics accurately describing
reality as she is, my meta-belief / confidence about that high certainty isn't
nearly as high simply because I feel a lot of confusion about the ultimate
nature of reality. But at least at a higher level than fundamental physics,
it's clear to me that saying a "fair coin" has p=0.5 per unique side is really
just summarizing we don't have enough knowledge of the flipper and the
environment it is flipped in to predict the result with certainty in the way
we can predict the result of the output of a logic OR gate with low- and high-
voltage inputs with certainty. This is different than the uncertainty of which
branch you are within a Many Worlds universe, where it's more of a physical
resolution problem rather than one of needing more knowledge, similar to not
being able to find out what happens between planck unit time steps (if
anything happens and if the question means anything).

Ben Goertzel's book Probabilistic Logic Networks is another resource I'd
recommend. Sorry if my branching reply is a bit big, but I'm constantly trying
to clarify things for myself as well. ;)

~~~
loup-vaillant
I have not read the Aaronson's book, but I'm already confident his use of the
word "probability" is more confusing than helpful. Okay, the universe is made
up of a wacky "configuration space" of "complex amplitudes" (in lay terms, a
many-dimensional sea of complex numbers). _And_ , there is a direct
relationship between complex amplitudes and the degree of subjective belief of
the physicist doing such and such experiment: take the norm of the amplitude,
square it, compare that to the other relevant squared norms, and voilà, you
have a probabilistic prediction for the result of your next experiment. (I
sweep a lot under the carpet, but it really works like this.)

Something to know however: assuming you made no mistake, then the result of
the first experiment tells you _nothing_ about the result of any subsequent
trials. Here's an example:

Let us send light through a half-sieved mirror. Have detectors on either side
of the mirror to detect the light. Now, you can see that each detector detects
half as much light as you originally sent (of course). Things gets interesting
when you send photons _one by one_. The set up of the experiment is such that
when you make your amplitude calculations, you find that the squared norm of
the amplitudes at each detector is the same. Here, this means that when you
send a single photon to the mirror, your should expect to see a single
detector go off, but you have no idea which one (subjective belief of 1/2 for
either side).

But there's more. Once you made the calculation, you know all you could
possibly know about the initial conditions of the experiment (according to
current physics). Imagine you sent a photon for the first time, and you see it
going through the mirror. Will that change anything to your expectation on the
second trial? Not one bit. You will still bet 1/2 chances on either side. So
this experiment is the perfect coin toss.

Or is it?

Imagine you send _twenty_ photons, and you see they _all_ made it past the
mirror. Initially that's one chance in a million. In that case, you should be
strongly tempted to expect the 21th to go through as well. The question is
why?

If you are _absolutely sure_ you set up the experiment right, then you
shouldn't change your mind, and still bet 1/2 on either side. But are you so
sure of your accuracy that you could set up several millions experiments, and
get _all of them_ right? Probably not. The more likely explanation is that
your so-called "half sieved mirror" is just a piece of glass. Maybe you were
tired, and piked up the wrong one. Or maybe some graduate student played a
joke on you.

Conversely, an actual coin toss _could_ be perfectly fair: there's a lot going
on in a coin toss, and depending on the chosen procedure, you can't expect to
have sufficient knowledge of the initial conditions to predict anything but
1/2 for heads, 1/2 for tails, even after seeing the the coin landing tails
three times in a row. There is, even then, some kind of "frequency property"
in the experiment. It's not because the laws of coin tossing are somehow
indeterministic, but because the way the coin is tossed systematically hides
relevant information about the exact initial conditions.

 _(The formal Aumann agreement theorem is a bit stronger than that: two
perfects Bayesians cannot have common knowledge of any disagreement. In
practice, this means you can't start from different priors, because a prior
embody background information, which can be shared. I know that there is
controversy about "fully uninformed" priors, but in practice, they don't
change posterior probabilities very much.)_

~~~
tunesmith
As an outgrowth of this discussion I tried submitting a question to Ask HN but
it went over the character limit, so for want of a better place to put it,
maybe someone will react here - anyone want to weigh in on my various
interpretations of truth and falsehood?

    
    
       +------------+------------------+--------------------+-------------------+---------------+
       |    T/F     |     Fuzzy        |     Frequentist    |     Bayesian      |   Bayesian    |
       |            |                  |                    |   Subjectivist    |  Objectivist  |
       +----------------------------------------------------------------------------------------+
       |    0/0     | Ambiguous/Vague. |    I am ignorant.  |  I am uncertain.  | No one can be |
       |            | I am apathetic.  |                    |                   | certain.      |
       +----------------------------------------------------------------------------------------+
       |    0/.5    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
       |            |                  | still testing my   | but it may be     | partial       |
       |            |                  | hypothesis.        | false (partial    | knowledge of  |
       |            |                  |                    | knowledge)        | falseness.    |
       +----------------------------------------------------------------------------------------+
       |    0/1     | Completely false | It never happens   | I am certain it   | Everyone      |
       |            |                  | in the physical    | will never happen | should be     |
       |            |                  | world.             |                   | certain it    |
       |            |                  |                    |                   | will never    |
       |            |                  |                    |                   | happen.       |
       +----------------------------------------------------------------------------------------+
       |    .5/0    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
       |            |                  | still testing my   | but it may be     | partial       |
       |            |                  | hypothesis.        | true (partial     | knowledge of  |
       |            |                  |                    | knowledge)        | truthiness.   |
       +----------------------------------------------------------------------------------------+
       |   .5/.5    | It is partly     | Intrinsically      | I am 50% certain  | All should be |
       |            | true and partly  | 50/50.  Given 100  | it is true, 50%   | 50% certain   |
       |            | false.  (partial | bottles, half are  | certain it is     | it's true;    |
       |            | truth)  The      | full.  Given 100   | false.  I am 50%  | 50% certain   |
       |            | bottle is half   | of her, half are   | sure the bottle   | it's false.   |
       |            | full.  She is    | pregnant.          | is full.  I am    | All should be |
       |            | partly pregnant. |                    | 50% sure she is   | 50% certain   |
       |            |                  |                    | pregnant.         | bottle is     |
       |            |                  |                    |                   | full.  There  |
       |            |                  |                    |                   | is 50%        |
       |            |                  |                    |                   | certainty she |
       |            |                  |                    |                   | is pregnant.  |
       +----------------------------------------------------------------------------------------+
       |    .5/1    |       N/A        | Don't bother me,   |       N/A         |     N/A       |
       |            |                  | my hypothesis is   |                   |               |
       |            |                  | broken.            |                   |               |
       +----------------------------------------------------------------------------------------+
       |     1/0    | Completely true  | It always happens  | I am certain it   | Everyone      |
       |            |                  | in the physical    | will always       | should be     |
       |            |                  | world.             | happen.           | certain it    |
       |            |                  |                    |                   | will always   |
       |            |                  |                    |                   | happen.       |
       +----------------------------------------------------------------------------------------+
       |    1/.5    |      N/A         | Don't bother me,   |       N/A         |     N/A       |
       |            |                  | my hypothesis is   |                   |               |
       |            |                  | broken.            |                   |               |
       +----------------------------------------------------------------------------------------+
       |     1/1    | Equally          | My hypothesis is   |       N/A         |     N/A       |
       |            | confident.       | meaningless.       |                   |               |
       |            | Torn.            |                    |                   |               |
       |            | Ambivalent.      |                    |                   |               |
       +----------------------------------------------------------------------------------------+

~~~
loup-vaillant
I'm not sure what to make of this… More precisely, I'm not sure what your
fractions on the left actually mean. Personally, I'm tempted to throw
"undefined" at each row that does not sum to 1, leaving only "1/0", ".5/.5",
and "0/1".

I haven't heard about any distinction between "subjectivists" and
"objectivists" Bayesians. I'd say my own position is a little bit of both:

First, background knowledge fully constraint the world view. That means that
two rational agents whose knowledges are the same must hold the same beliefs.
More practically, two persons having the same _relevant_ knowledge about any
given question should believe the same answer (the difficulty here is to sort
out what's relevant). Second, agents generally don't have the same background
knowledge. Nothing stops us for knowing different relevant information about
any given subject. So of course we won't hold the same beliefs.

Fuzzy. Ah, fuzzy… the more I look at it, the less sense it makes. Strictly
speaking, there is no such thing as "half truth". A proposition is binary:
either true or false. _Beliefs_ on the other hand are not binary: a belief is
a proposition _plus_ a probability. If I say the bottle is more than 75% full
with probability 84%, then I'm 16% right if it really is only half full. Quite
wrong, but not hopelessly so.

With probability functions and utility function, we can make the same set of
decisions we could have made with fuzzy logic, which looks like it mixes the
two. I personally prefer the modular approach.

~~~
tunesmith
The fractions are supposed to represent partial degrees of truth. 0.5 is just
an example, could be 0.4, 0.6 - anything other than absolute true or false.
There are examples of probability systems that try to factor in uncertainty,
beyond truth belief or false belief. I think it was possibility theory that
was talking about a 0.5 value assigned to "the cat is alive", a 0.2 value
assigned to "the cat is dead", and a 0.3 uncertainty - then it would be 0.8
plausibility/possibility of alive, but only 0.5 probability.

The main distinction I saw about subjectivist vs objectivist is that the
objectivist still communicates in terms of degree of belief, but the belief is
that it is based on objective empirical data and so therefore it's not an
expression of _personal_ belief - more a measure of belief for the entire
system or knowledge base. More relevant for things like machine learning
rather than updating ones own personal beliefs. (Maybe this just assumes
common priors.)

I've gone round and round on fuzzy because some people talk about fuzzy as if
it is degrees-of-belief, like the bayesian interpretation. But I think it's
different - I mean, if I'm partly correct on describing a concept, you
wouldn't say that I am right 50 times out of 100, or that you're 50% certain
I'm right, right? You'd say I'm half-right.

I'll have to think about probability combined with utility function. I'm not
sure how to make it square with fuzzy math. It seems you'd be 16% right
whether the bottle was half-full, quarter-full, or empty.

~~~
loup-vaillant
Okay, so I got your fractions mostly right.

Well, when you use a computer to make a probabilistic calculation, you have at
some point to feed it with the relevant information, or it won't know what to
make of the data. _The data alone is not enough._ And if you are absolutely
certain your program is (i) correct, and (ii) fed with all the relevant
information you know about, then you should definitely believe the result. (Of
course, this absolute certainty is not attainable, so any sufficiently
surprising result should lead you to think that something went wrong.)
Assuming common priors on the other hand, seems unreasonable, unless we're
only talking about "fully uninformed" priors such as those based on Kolmogorov
complexity.

Yes, I would say you're half right. The example of the temperature given by
the Wikipedia is really good. The "degree of coldness" _is_ a worthy notion.
What bothers me is that fuzzy logicians don't seem to run probability
distributions over those degrees of truths. I mean, I can surely make up a
probability distribution on the outside temperature for tomorrow at 10AM.
Assuming that temperature maps to degrees of truth to the sentence "it's cold"
(say, 100% true below 10°C, 0% true beyond 20°C, and a linear interpolation in
between), then I naturally come up with a probability distribution over the
degrees of truth (I should get a Dirac spike at both extremities).

Your last one is exactly right. I'm only 16% from empty to 75% full
(excluded). That's the punishment I get for not producing a theory of the
fullness of the bottle which describes my actual beliefs about it.

I assume that your ultimate goal in life is to make decisions, or to create a
machine that will take decisions. To make sound decisions, you need to asses
their consequences. Take the example of the bottle: I need a full bottle of
water to hike today. Let's assume for the sake of the argument that it is
either empty, or full. If it's empty, I need to re-fill it, or I would faint
from the thirst (not good). If it's full, checking is only a nuisance (not
good either). Now the question is, how can I minimize my discomfort, on
average?

Well I need two things: first, a probability distribution. Let's say I'm
pretty sure I filled the bottle yesterday, let's call it a 90% chance. Second,
a utility function. Baseline, the bottle is full, _and_ I did not bother
checking it, utility zero. Checking costs me 1 utility point, and fainting
from thirst costs me 50 points (let's assume my brain is actually capable of
performing those utility assessments). _Should_ I check the bottle?

Oh yes. Not checking means a 10% chance of losing 50 points, or 5 on average.
Checking means a certainty of losing only one point, which costs much less.
Conclusion: check your water supply before you go hiking. And re-check just to
be sure.

Now this problem didn't call for fuzzy logic. We can however come up with a
full probability distribution over the quantity of water in the bottle instead
of just two possibilities. From then, fuzzy logic should be able to naturally
step in. But frankly, I prefer to run a a separate (dis)utility function over
the state of thirst that the lack of water will provoke (from light thirst to
actually faint), and combine it with my probability distribution to make my
decision. (Though at that point, I'd rather just check than frying my brain
trying to solve this cost-benefit calculation.)

~~~
tunesmith
BTW, I just saw something about this regarding fuzzy uncertainty.

They made a distinction between vagueness and ambiguity. Both of which might
make it difficult to assign a probability distribution among the possible
values.

Ambiguous is when the boundaries of the sets are not clearly defined. In the
thermometer example, it would be not knowing exactly what temperatures are
meant by "cold". Or maybe like the supreme court definition of porn, you know
it when you see it.

Vague is when there are clear definitions, but you're not sure how well your
data fits into those definitions. For the temperature example, it would be
what a crappy thermometer would tell you. It'd roughly correlate (it wouldn't
return boiling if it's freezing), but it'd be pretty inaccurate.

In true/false values, maybe 0/.5 and .5/0 could be construed as ambiguous,
while 1/.5 and .5/1 could be construed as vague.

------
olympus
Frequentist here. This paper makes me hate Bayesians a little less. The reason
is because a general thrust of the paper (since I have only had time to give
it a once-over) seems to be that just because you are a Bayesian it doesn't
mean that you have to get rid of model adequacy checks. Not having model
adequacy checks is why I think Bayesians run around with a magic wand saying,
"poof! there's an optimal model." After proving a theoretical optimality they
never check to see if the real world data supports their arguments. So I'm
glad to see a prominent Bayesian saying that you don't have to throw model
checking out the window.

On a secondary note, I have to lament the use of philosopy in a math paper. I
realize that many prominent mathematicians are/were also philosophers and that
the two subjects are somehow linked at some level. But really I think that
putting philosophy in a math paper is an excuse to use more big words and
sound smart. Most of us would like to have a set of formulas to apply and not
worry about-- forgive me if I for what I'm about to say-- fuzzy non-science
like philosophy and the implications that it might have on our cold hard
numbers.

Can each 31 page paper that combines math with philosophy come with a 5 page
companion paper that leaves out the philosopy and just has the applicable math
stuff?

~~~
rafcavallaro
I think your objection misses the fundamental point of the paper which is that
blindly applying formulae (in this case Bayesian ones) without considering the
part these computations play in the whole scientific process leads to bad
science. Specifically, it leads to assuming that the correct model is already
among those being considered. The authors say this is often not the case and
assuming so often leads to stagnation in the relevant discipline.

This isn't an article that hands the readers a set of rote instructions, it's
a warning that rote application is bad science because good science doesn't
merely compare existing models with the data, good science proposes new and
better models. This is fundamentally an article on the philosophy of science
so leaving out the philosophy would make the article pointless.

In general, people who are good at symbolic manipulation are often in search
of a methodology that allows them to only do symbolic manipulation and
relieves them of the difficult task of doing semantics as well. The article is
saying that there's no free lunch here - we have to do the semantics as well,
we have to understand why models don't perform well and come up with the
creative insights necessary to replace them with new ones.

------
dean
The author's point that the most successful forms of Bayesian statistics
accord much better with sophisticated forms of hypothetico-deductivism is
reminiscent of the epistemology of normative value(s) which furnish a
provisional lens for the analysis of the systemization of statistical
transparency.

OK, half of that sentence is from an academic bullshit generator. I won't tell
you which half.

Unfair, I know. That paper is clearly not meant for the general public, but
still, learn how to communicate.

------
pertinhower
I... uh.... How did this get here?

