It's fine to say "some small studies suggest this is true", and then use it practice because doing a better study is beyond your means or simply not important. It's fine to do small studies, because it's a cheaper way to look for interesting effects that might then be investigated in more detail. But let's not pretend "science" has decided this issue beyond reasonable doubt.
"Science", some amorphous entity quite disconnected from real science, has become a religion of sorts for some non-scientifically minded people. Put "science" is a post title and you'll fool many into thinking you speak from authority. I feel the author is well intentioned here, but has gone far beyond what the science suggests.
This is a huge issue I have with a lot of the blog posts I see. Especially related to design.
So much of design has been shaped by a consensus that's been accrued over time and established through intuition, feeling and opinion. A lot of visual design is aesthetic preference that experienced designers have learned through trial and error (speaking in broad terms). The knowledge gets passed on to more inexperienced designers, who tend to append the scientific justification on a post-hoc basis. Since most designers really don't have that scientific background, you really see poor articles like this, even when the literature is hardly conclusive
 Good review of related concepts: http://alexpoole.info/blog/which-are-more-legible-serif-or-s...
We should consider effects as random variables -- different portions of the population will respond in different amounts to the experiment. Smaller samples increase the variance in the measurement of the effect. This wouldn't be a problem if we replicated experiments many times, but we don't, and we only see results of experiments we conduct ourselves or those that are published. Add in the tendency for only significant results to be published and you have a biased sampling of effect size.
As for really small vs small -- I guess I'm used to web experiments, which typically have 1000+ participants.
If you fix a binary hypothesis, and then run your experiment for that specific hypothesis and only that hypothesis, then you're right.
In practice, though, people look at data such as from such experiments, and then invent a hypothesis that fits. The space of potential plausible-sounding hypotheses is huge (especially since, in many cases in psychology and related fields, both A and not-A may sound plausible). So the chance that such a small sample appears to show evidence for some plausible-sounding but incorrect hypothesis is actually very high.
We've already agreed that a small sample size doesn't make it any more likely to find a false positive for a given hypothesis. This is true for H1, H2, H3, etc., where each of these is a hypothesis. Therefore the aggregate effect of testing N different hypotheses is that you're no more likely to find a false positive with a small sample size vs a large sample size. You are more likely to have false negatives with small samples, though.
It does. Try and test a die for load. Let's say your prior probability of the dice being loaded is 50%, because this is a real shady place you're gambling in. You further know (based on the game you're playing) that if your die is loaded, it will land with these frequencies:
1: 1/3 of the time.
2: 1/6 of the time.
3: 1/6 of the time.
4: 1/6 of the time.
5: 1/6 of the time.
6: almost never
Now what is the probability for false positive? Well… With only one throw, you will land 1 one times out of six, giving you a posterior probability distribution of 2/3 loaded, 1/3 genuine (this is as close as you will get to a false positive).
With 2 throws, it's a bit more complicated:
1 , 1 : 1/36 : loaded with 80% probability
1 , [2-5]: 8/36 : loaded with 67% probability
6 , [1-6]: 11/36 : definitely genuine
[2-5], [2-5]: 16/36 : no evidence
Okay, this is a contrived example. But sufficiently large sample sizes do indeed reduce the risk of false positives. It's just that some result are so clear cut that they don't need large sample sizes to reach a conclusion reliably.
First, the statistical tests used for these experiments don't make use of Bayesian stats, so the prior 50%-loaded probability simply isn't factored in. The standard is to use null-hypothesis testing, which says roughly, that if the null hypothesis is true -- that is, if there is no actual difference between the populations (experimental groups A and B, for example) -- what is the probability that you'd see a pattern like the one observed in the data. And the tests take sample size into account in calculating this probability.
If you throw the die once, the test that you'd use here (Chi-square) would _never_ give you a false positive, that is a p-value of <.05. With small samples, there is too little power to get a the requisite p-value. (And I'll note that Chi-square is one of the tests used in these papers.)
There's a whole other debate about whether p-values and null hypothesis tests are the right thing to use, whether the standard 0.05 threshold p-value is small enough, whether Bayesian stats should be used, etc. These are legitimate issues. But they're separate from the claim that small samples will increase the likelihood of a false positive.
(I know of the debates. For all I care Bayesians have won by an overwhelming margin. The only advantage of Frequentist statistics is their relative ease of use. But in the search for truth, you just can't escape Probability Theory. Period. My method wouldn't be accepted in a paper? Then fuck the papers. I'm not trying to get published, I'm trying to get to the truth.)
I don't have the proof nailed down, but based on the examples I can come up with, I'm extremely confident that as long as you use probability theory correctly, small sample sizes do increase the chance of false positives. On the other hand, those false positives will be weaker than the exceptional false positive you might get from larger sample sizes. (Imagine I throw the dice 30 times, and I get zero 6 and 10 ones? It's very rare, but it would make me all the more confident the die is loaded.) If you use that crappy outdated Frequentist junk, however, all bets are off.
Note however that in a sense, you are correct: by conservation of expected evidence, the weighted average of evidence you expect is exactly zero: if it were not, you would already have changed your belief at the point of equilibrium. Which means that if you expect lots of weak evidence in one direction, you also expect a little, and very strong, evidence on the other side.
I'm not sure this is what you where getting at, though.
When we do null-hypothesis testing, we do assume a prior: using smaller p-values means we're more skeptics towards the competing hypothesis —we have a stronger prior belief for their fallacy. But we don't speak the word "prior", so we can pat ourselves on the back for our "objectivity", and scold the Bayesian for his "subjectivity". Priors, what arrogance. Who is he to believe so and so in the first place? We do science, not faith.
Only we're blind to our own priors.
And if you are interested in the sampling error, that's reduced with the square root of n. So in order to halve the error, you've got to square the sampling size. That increases costs markedly (and is why sample-based statistical methods are so powerful -- they work, if done correctly).
The original article appears to have been posted on the author’s blog quite some time ago: https://ooomf.com/blog/the-science-behind-fonts-and-how-they...
As such criticising contradictions between website design and what the article advocates makes even less sense than it usually does.
I do think there is still more handwaving in the article than I'd care for. In very few cases was it a small change in font that made a difference. And, even then, it is still artistic in nature. Pretty things make people more trusting and happy. Some fonts are prettier than others.
Admittedly, there's still a presentation difference between places like Cracked (fuggly blog style) and sites with similar motives (Reddit / HN). In that case though, the deciding feature is mostly minimalism (list presentation, no huge ads, no content tiles, limited social content links, very light scripting). Similar to why I prefer them over traditional blog layout news sites (Reuters, CNN, ect...) as the initial step of surveying the days events has Way less overhead.
The elements I attack:
⚫ Anything that moves. Animations, sliders, pop-ups.
⚫ Fixed elements: headers, footers, fixed-position social bars.
⚫ Fonts. My preferred reading font is 15pt (about 20px on this display). Most sites seem to run between 12-14px, which is painfully small.
⚫ Crappy contrast. Backgrounds should be light.
⚫ Anything "social". If I want to share your content, I've got a perfectly good URL with which to do it.
⚫ Interstitials. If you're relying on those for advertising or messaging, I'll see them precisely once.
⚫ Sidebars. I either nuke them entirely, or de-columnize the page. See: http://www.reddit.com/r/dredmorbius/comments/1tniu3/user_sit...
⚫ I find call-outs and images floated to the right rather than the left less annoying (at least in ltr languages). So I move those elements to float right, clear right, and pad with 10px (or 0.5 em), add a 1px border, and a 20px (or 1em) margin. Often a shadow drop for images just for grins (see above).
⚫ I find a blue link color is far less distracting than other alternatives. Many sites seem to prefer red/orange links (they stand out), my usual preference these days is for #1e6b8c (or something close to it).
⚫ I've found both drop caps and bold leading lines useful in some cases for affordance -- especially in streams of aggregated content where practice of the originating source doesn't include strong ledes, these at least allow you to find the start of a post easily.
The saddest overall impression is that most web design is actively hostile to reading.
> However, as more reading shifts to digital and screen resolutions improve, the way we read content is changing. Many designers mention that 16pt font is the new 12pt font. A recent study has also shown that larger font sizes can elicit a stronger emotional connection.
This is an interesting one. I have actually found myself frequently using the zoom-out feature of web browsers to make the font type smaller. I viewed the Kickstarter "2013 year" slideshow that's been making the rounds at about 20%, because I found the huge font they used extremely uncomfortable to read.
I didn't zoom out this particular article, but playing with the zoom level, I find 80% works about best for me.
I wonder why I feel so differently about this issue than current designers do.
> By changing the font and increasing it’s size, our email content felt much better.
bad grammar can ruin your article faster than any font choice :)
You can accomplish that, though, either through your browser preferences, by using a userContent.css override stylesheet, or with site-specific stylesheet overrides (I use Stylebot for this, Stylish is another option).
But, absolute agreement on the crappy design decisions many sites make.
Otherwise, this was a pretty decent overview. A lot of programming-focused sites ignore good practices about line lengths and font size.
Do yourself a favour and read it here as it was intended.
“It appears that this website bought the rights to publish this article from the author.
As such criticising contradictions between website design and what the article advocates makes even less sense than it usually does.”
Since the majority of internet users have much less pixels per inch as compared to the retina, I think it should be best for websites to use san serif.
However, the historical reason why sans-serif are generally recommended over a serif on a computer screen was a matter of practicality. The computer monitor simply couldn't replicate the tiny detail required by serifs. Higher resolution screens are becoming more common place, which is why you'll see more serifs in use online (IMO... correlation vs causation and all that). Somewhere along the line someone cited scientific articles to help justify this practice, but I'm convinced it became common place was really for practical reasons/aesthetics more than anything.
 I haven't personally done a literature review, but this post is great in discussing the conclusiveness around the research (and readability vs legibility vs....) http://alexpoole.info/blog/which-are-more-legible-serif-or-s...
Here's an interesting anecdote I just remembered, though. A psychology professor of mine was attempting to replicate another researchers work. The result couldn't be replicated when the stimuli was displayed in sans-serif fonts, but it was replicated when they finally used the font the initial researchers did: a serif.
I'd bet serifs/sans may have some sort or psychological implications but I doubt it's as straightforward a relationship as often presented