You may be interested to know that this exact objection has been made in the philosophical literature. See "Causal Fundamentalism in Physics" by Zinkernagel (2010). Available here: https://philsci-archive.pitt.edu/4690/1/CausalFundam.pdf
At the end, the author notes (as you do) that if you consider a finite difference equation with small time steps, there are no pathological solutions. He also mentions that Newton takes this difference equation approach when solving problems in his Principia.
See also "The Norton Dome and the Nineteenth Century Foundations of Determinism" by van Strien:
>> Abstract. The recent discovery of an indeterministic system in classical mechanics, the Norton dome, has shown that answering the question whether classical mechanics is deterministic can be a complicated matter. In this paper I show that indeterministic systems similar to the Norton dome were already known in the nineteenth century: I discuss four nineteenth century authors who wrote about such systems, namely Poisson, Duhamel, Boussinesq and Bertrand. However, I argue that their discussion of such systems was very different from the contemporary discussion about the Norton dome, because physicists in the nineteenth century conceived of determinism in essentially different ways: whereas in the contemporary literature on determinism in classical physics, determinism is usually taken to be a property of the equations of physics, in the nineteenth century determinism was primarily taken to be a presupposition of theories in physics, and as such it was not necessarily affected by the possible existence of systems such as the Norton dome.
Somewhat interested, but tbh, the reason I thought of it is the same reason that they thought of it -- that it's obvious to anyone who studies physics from a philosophical angle that it really has to be that way. They just put it in a lot of fancy words so that it has academic rigor.
The author is a professor at Ohio State University, specializing in condensed matter theory. His CV list two works on superconductors and a dissertation on supercapacitors.
AIC is an estimate of prediction error. I would caution against using it for selecting a model for the purpose of inference of e.g. population parameters from some dataset (without producing some additional justification that this is a sensible thing to do). Also, uncertainty quantification after data-dependent model selection can be tricky.
Best practice (as I understand it) is to fix the model ahead of time, before seeing the data, if possible (as in a randomized controlled trial of a new medicine, etc.).
And it is not uncommon that an intentionally bad model (low AIC) will be used for inference on a parameter when one wants to test the robustness of the parameter to covariates.
I think if hypothesis testing is understood properly, these objections don't have much teeth.
1. Typically we use p-values to construct confidence intervals, answering the concern about quantifying the effect size. (That is, the confidence interval is the collection of all values not rejected by the hypothesis test.)
2. P-values control type I error. Well-powered designs control type I and type II error. Good control of these errors is a kind of minimal requirement for a statistical procedure. Your example shows that we should perhaps consider more than just these aspects, but we should certainly be suspicious of any procedure that doesn't have good type I and II error control.
3. This is a problem with any kind of statistical modeling, and is not specific to p-values. All statistical techniques make assumptions that generally render them invalid when violated.
Your points are theoretically correct, and probably the reason why many statisticians still regard p-values and HNST favorably.
But looking at the practical application, in particular the replication crisis, specification curve analysis, de facto power of published studies and many more, we see that there is an immense practical problem and p-values are not making it better.
We need to criticize p-values and NHST hard, not because they cannot be used correctly, but because they are not used correctly (and are arguably hard to use right, see the Gigerenzer paper I linked).
The items you listed are certainly problems, but p-values don't have much to do with them, as far as I can see. Poor power is an experimental design problem, not a problem with the analysis technique. Not reporting all analyses is a data censoring problem (this is what I understand "specification curve analysis" to mean, based on some Googling - let me know if I misinterpreted). Again, this can't really be fixed at the analysis stage (at least without strong assumptions on the form of the censoring). The replication crisis is a combination of these these two things, and other design issues.
I can understand why you see it this way, but still disagree:
(1) p-values make significance the target, and thus create incentives for underpowered studies, misspecified analyses, early stopping (monitoring significance while collecting data), and p-hacking.
(2) p-values separate crucial pieces of information. It represents a highly specific probability (of the observed data, given the null hypothesis is true), but does not include effect size or a comprehensive estimate of uncertainty. Thus, to be useful, p-values need to be combined with effect sizes and ideally simulations, specification curves, or meta-analyses.
Thus my primary problem with p-values is that they are an incomplete solution that is too easy to use incorrectly. Ultimately, they just don't convey enough information in their single summary. CIs, for example, are just as simple to communicate, but much more informative.
I don't understand. CIs are equivalent to computing a bunch of p-values, by test-interval duality. Should I interpret your points as critiques of simple analyses that only test a single point null of no effect (and go no further)? (I would agree that is bad.)
Yes, I argue that individual p-values (as they are used almost exclusively in numerous disciplines) are bad, and adding more information on effect size and errors are needed. CIs do that by conveying (1) significance (does not include zero), (2) magnitude of effect (mean of CI), and (3) errors/noise (width of CI). That's significantly better than a single p-value (excuse the pun).
I think part of the problem with p-values and NHST is that it encourages (or doesn't discourage) underpowered studies. That's because p-hacking benefits from the noise of underpowered studies. If you can test a large number of models and only report the significant one then an underpowered study with high type I error rate gives you a greater chance of a significant result.
So I think you are correct that properly powering studies is the crucial thing, but the incentives are against fixing this as long as lone p-values are publishable.
But here the issue is the uncorrected multiple testing and under-reporting of results, not the p-values themselves. Any criterion for judging the presence of an effect is going to suffer from the same issue, if researchers don't pre-register and report all of their analyses (since otherwise you have censored data, "researcher degrees of freedom," and so on). This is really a a problem with the design and reporting of studies, not the analysis method.
That sounds like if you write proper C code correctly you don’t make memory errors when in reality it’s very common to not write correct code.
That’s why rust came along, to stop that behaviour, you simply can’t make that mistake, and hence the point is maybe there’s a better test to use than p value as a standard.
I haven't looked for stuff from the last 20 years but I will ask around. I've got a book at home that you could figure out the bit about the Toda lattice if you read between the lines and I will look it up this evening.
A few extra points:
(1) People don't get academic credit for improving the exposition of things, only for making discoveries that are "new"
(2) You often see Poincare sections with anomalies such as tori that are folded over, the problem here is that the energy surface is not flat but has a topology like a sphere or toroid so you see strange things when you project these, I found in some cases you could make a three-dimensional visualization of the energy surface and draw the Poincare section on that and it would be more clear what was going on.
(3) The KAM theorem is stronger with N=2 oscillators (two q and two p coordinates) than it is with N>2 oscillators. With N=2 the tori are solid walls that constrain chaotic motion whereas with N>3 they don't have enough dimensions to really hold the trajectories inside so there are probably trajectories that go from one regular area to a chaotic area and then another regular area. One consequence of this is that we can't really say the solar system is table. Arnold talked about this in a hypothetical way a long time ago but this 'Arnold Diffusion' is still an undiscovered country.
(4) Despite that, people from NASA have done some very nice work at mapping chaotic trajectories that make it possible to get from one planet to another with much less energy that you would otherwise. See
you get these areas that look like television static that certainly have more structure than they appear. The proof that these areas are chaotic is based on the knowledge that, paradoxically, these are filled with stable periodic orbits and there is a fractal network of resonances that resonate with resonances and so forth that determines the motion in that area which is not visible because of the sensitivity to initial conditions and practical problems that come from doing the math. There were people talking in the 1980s that there ought to be a way to make better pictures with interval math but I don't think it's been done since then. Lately I've been interested in revisiting it not so much because I care about the science but because I'm always on the lookout for algorithms to draw interesting images.
I don’t understand how because a mathematical technique has existed since the 1800s, it makes sense to use it in a physics setting. This does not seem like an argument.
Also, you say that his calling the Karpus and Kroll result a fraud is “libel,” but the paper references “Karplus and Kroll confessed that they had not independently reached the same result; instead, they had reached a consensus result.”
There’s even a Feynman quote saying, “It turns out that near the end of the calculation, the two groups compared notes and ironed out the differences between their calculations, so they were not really independent.”
You seem to be trying to discredit this paper. Why is that so? Trying to frame it as “libel”, in particular, is quite ridiculous in this setting and seems to betray some personal wish for the author to be punished for his audacity of criticizing physics research.
If a mathematical technique helps understand and predict nature, why not use it? The quote criticizing the method misunderstands it. It's certainly not "wrong" or "absurd."
I dislike the paper because it is sensationalist and highly misleading. For example, as far as I can tell, Karplus and Kroll didn't "confess" to anything (no direct quote is provided in the paper); we just have a secondhand assertion by Feynman. Nothing they did is "fraud," as the author claims - this is false and defamatory. Further, the issue got wrapped up by Petermann in 1957; the author is just annoyed that he didn't publish full details of the calculations. The suggestion that there is somehow lingering uncertainty is just wrong.
I no particular wishes for the author, other than that he cease writing bad papers.
I don't think that's a bad paper, it was quite interesting to me.
I see no valid argument in what you're saying, sorry. I also just want to point out that you used the word "just" three times in four phrases, to minimize and psychologize the author's arguments. For me, not being able to neutrally expose someone's point of view is a sign of emotional bias.
My argument is that key claims of the paper are factually incorrect. For example, there was no "confession" by Karplus and Kroll. (At least, none is cited as far as I see - please let me know if I am missing something.)
Johnnie and math Ph.D. here. It's a bad, or at least slow, way to prepare for math or math-heavy graduate school as you'll need to take additional coursework elsewhere in order to be competitive. On the other hand, freshman math at St. John's (Euclid, a bit Apollonius, start into Ptolemy) was for a significant fraction of students in my year the first math class they had ever enjoyed. Euclid's Elements, despite its many inefficiencies, and faults, does not heavily rely on mastery and/or acceptance of earlier curricular material, nor is it designed in service to later curricular material that most students are in no position to anticipate or appreciate.
On the third hand, reading Newton is a really bad way to learn calculus; the college uses a lab manual which more closely resembles a modern calculus course.
It appears to be a liberal arts program. Is this substantially different, with respect to the rigor of mathematics, than most other comparable programs? They might cover more calculus (maybe at a theoretical level?) than most liberal arts programs.
I'd expect the rigor is fine, but the particulars that are learned differ.
I doubt the distinction matters at all for the vast majority of grads, especially ones who don't intend to become mathematicians. Learning how to math is probably more important than the specific material, outside a handful of things. You can pick up the rest as-needed, and for the vast majority of people, "the rest" that is in fact ever needed for the entire rest of their lives, will be very little. Especially if they're pursuing a classics-based liberal arts degree.
I doubt many of their grads are planning to become actual computer-scientists or mathematicians or mech. engineers or any of that. Lawyer, maybe doctor, maybe writer, maybe an ordinary computer programmer, that sort of thing. As long as you're not afraid of math, you'll be fine in any of those not having had a typical PDE class or whatever.
Yes, it is substantially different with respect to content than standard undergraduate mathematics programs. It covers a few historically important texts and does not teach (if those texts are any indication) most of what is usually taught in an undergraduate math degree. (A poster above writes: "Freshman math was almost entirely the study of Euclid and Nicomachus.")
So this is the books used in an undergraduate liberal arts degree (your degree is IN liberal arts). These are the math tagged books in a quirky bachelors in philosophy degree, essentially. They do not have a math degree (or any degrees aside from bachelors in liberal arts?).
I see - I understood "liberal arts program" above to mean a liberal arts college in general (typically offering a mathematics major). I agree that this reading list is better suited for something like "history of math for humanities students."
The marshmallow effect replicated (IIRC), and is probably correct, unlike the power posing stuff. That behavior in a single trial in childhood moderately predicts success in adulthood seems like an interesting fact about humans.
What's the applicable result to this? Put children who eat marshmellows too quickly in special "delayed gratification" classes? Is the result such that all children who eat marshmellows too quickly should give up, as their success is predetermined, or is it the impact is barely statistically significant?
If I were the parent of a child who ate the marshmellow too quickly, I would do nothing different, as the whole thing is meaningless and doesn't change anything. It's more like, "Oh, OK, my son is 4% less likely to be in the top 20% of the income distribution (according to this barely coherent theory)." Why do we spend so much money, time, and effort discussing and researching this?
The experiment is part of a larger research program to study delayed gratification. It was never intended to help parents raise their children. Asking for immediate applications is an odd standard for basic science research. (Unless you reject the value of basic science research entirely.)
Further, the result in a recent replication [1] was a correlation of .28 between the time to ring the bell (to get the marshmallow) and academic achievement. That's not exactly the trivial "4% less likely" effect in your caricature.
(You may also wish to double check your spelling of "marshmallow.")
What is the skill they should teach
? To always eat lunch before doing a psych experiment? To like sugar, or marshmallows in particular, just slightly less? That high self-control in extremely low-reward situations is good?
Teaching a child to do basic budgeting with pocket money is likely to have a vastly higher impact than slight delayed gratification with candy.
I know it's super taboo to consider especially in the current climate but is it possible that people who have high self control are more likely to become rich because of their ability to delay gratification and that therefore those genes will be passed on to their children?
Biology will select against this. All primate females seem to be much more attracted by bold males taking risks, and more prone to use sex to appease dominant males showing violent episodes. The opposite argument could be equally valid.
The only thing that children of rich parents share is to have rich parents. One of the parents being over the mean in the "beautiful" trait, is also very common.
If I understand correctly, you're asserting that that's causal--that being born into affluence causes children to develop in a way that leads to them "passing" the marshmallow test.
Do you have evidence to exclude other possible interpretations, such as the possibility that children resisting the temptation of a marshmallow and their parents being rich are actually the effects of some other underlying single cause?
> If I understand correctly, you're asserting that that's causal--that being born into affluence causes children to develop in a way that leads to them "passing" the marshmallow test.
Stop and think.
If you are rich that marshmallow is probably very bland or you've eaten very well today. So maybe two makes more sense.
If you aren't, that marshmallow is super tasty and you probably didn't eat well today. Plus do you believe that grownup when your mom was lying. Someone might come and eat it.
Then some time passes. Rich kids grow up and are more successful. Delayed gratification is key to success!