> no p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical nonsignificance lead to the association or effect being improbable, absent, false, or unimportant. Yet the dichotomization into “significant” and “not significant” is taken as an imprimatur of authority on these characteristics. In a world without bright lines, on the other hand, it becomes untenable to assert dramatic differences in interpretation from inconsequential differences in estimates. As Gelman and Stern (2006 Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI: 10.1198/000313006X152649. ) famously observed, the difference between “significant” and “not significant” is not itself statistically significant.
I don't mind if things just can't really be explained intuitively because they are fundamentally technical, but your explanation and the parent's both do this thing where it sounds like it's explaining things in plain common language, but isn't actually because it isn't clear what those plain words mean in this context.
As for using up a degree of freedom, the easiest way to build intuition for why this is a useful concept is to think about very small samples. Let's say I draw a sample of 1 item. By definition the item is equal to the mean so I receive no information about the standard deviation. Conversely, if someone had told me the mean in advance, I could learn a bit about the standard deviation with a single sample. This carries on beyond one in diminishing amounts. Imagine I draw two items. There's some probability that they're both on the same side of the mean, in that case, I'll estimate my sample mean as being between those number and underestimate the standard deviation. Note that I'd still underestimate it even with the bias correction, it's just that that factor compensates just enough that it balances out over all cases.
A simple, concrete way to convince yourself that this is real is to consider the standard deviation of a variable that has an equal probability of being 1 or 0. The standard deviation is 0.5. But if we randomly sample two items, 50% of the time they'll be the same and we'll estimate the standard deviation as zero. The other 50% of the time, we'll get the right answer. Hence, our average is half the right answer (n/(n-1)=2/1). The correction makes the standard deviation double what it should be half the same while remaining zero in the other cases. This also suggests why dividing by n is referred to as a the maximum likelihood estimator.
This is an example of a “shrinkage estimator”, which comes up a lot - introduce some bias but get a smaller MSE. For more, see: https://en.wikipedia.org/wiki/Bessel%27s_correction
"To be precise, when this mean is calculated, the sum of the squared deviations is divided by one less than the sample size rather than the sample size itself. There's no reason why it must be done this way, but this is the modern convention. It's not important that this seem the most natural measure of spread. It's the way it's done. You can just accept it (which I recommend) or you'll have to study the mathematics behind it. But, that's another course."
I tried in vain to go to the likely one and still couldn't find where the variance gets introduced!
The section called "A Valuable Lesson" does show that doing multiple tests with the same threshold of P<0.05 does cause inexistant effects to be reported as statistically significant, but the section on correcting for that is present much later in the section about ANOVA.
That's actually a pretty severe flaw, especially for a handbook that is likely to be read partially.
 Statistics Done Wrong: The Woefully Complete Guide https://www.amazon.com/dp/1593276206/ref=cm_sw_r_cp_apa_i_u....
If you're interested in machine learning, Andrew Ng Coursera course is almost a right of passage at this point - it's very accessible:
Kutner is the bible on regression models (tho, not a super fun read):
This was one of my favorites as an undergrad: