Shakespeare coined (or at least first wrote down) hundreds of neologisms presumably just to make the verses scan. His English was never the best, honestly. Like, in terms of being normal English. Much of his poetry was creatively odd, but also grammatically odd at the time. He sounded unusual even to his contemporaries. That's part of why he made such a big splash.
Stressed "the" and unstressed "the" have different implications for meter. (And meaning.) "The" has two pronunciations in English. "Thee" and "thuh". The former is stressed, the latter not. While part of it follows the same pattern as a/an governed by by the initial sound, some of the rules are complicated. "Thee" is also used for emphasis as a demonstrative.
"Give me thuh cat toy." (Some ordinary toy.)
"Give me thee cat toy." (The one with special powers.)
The pattern of the articles the and a/an being affected by the sound of the following word dates to Old English. The use of the stressed form as an emphatic does as well. It probably goes back further. There are similar traits in the other Germanic languages. German routinely reduces its definite article die ("dee"), to approximately "duh", and its indefinite article ein ("ayn") to approximately "uhn", as well. Except when emphasized. We even have traces of emphatic "a" in English which is now completely archaic -- except still irresistible in "an historic moment". That probably counts as a fixed expression now. But etymologically speaking, when someone says that they're saying "one historic moment" and emphasizing its distinctiveness. Germans would say "ein historischer Moment" and it's a safe bet they'd say ein (one) and not the usual "uhn" (a).
Sure, but the article doesn't attempt to classify the stress, and the stress is often flexed as a convenience to help the meter or to avoid hard -to-pronounce double vowels, not to change emphasis.
"I don't what that one; I want thee other one. Thuh yellow one, not thee orange one."
The conclusions of the article seem a bit far-fetched to me, and seem to ignore the rhetorical style of poetry and theatre at the time. One of the examples the author gives (where they missed a contracted instance of "the"):
> [...] Look like th' innocent flower,/But be the serpent under ’t.
It is still acceptable in modern English to say something like
> Seem like the innocent flower, but be as the serpent underneath it.
Certainly not casual, everyday speech -- but using a rhetorical strategy of referring to an archetypal innocent flower, or an archetypal serpent. I think it's an enormous stretch to claim that Lady Macbeth and Macbeth had a specific innocent flower in mind when they were speaking.
I think this is just the way Shakespeare wrote, not anything specific to Macbeth's 'creepiness.' I was able to quickly find a similar example in Julius Caesar:
"Then he offered it to him again; then he put it by again; but, to my thinking, he was very loath to lay his fingers off it. And then he offered it the third time; he put it the third time by;"
Or Much Ado About Nothing:
"I have the toothache."
It is not surprising that the way (a way?) articles are used has changed in the last 400 years.
I've heard the phrase "I've got the cancer". As if there's only one cancer to get, and everyone will know of it.
Then again, the English say "I'm going to hospital" where an American would say "I'm going to the hospital", so maybe the Bard used up all of the (how do you pluralize the??) so that the English use it less? At least the TFA author might theorize as such.
As a US English speaker "I'm going to hospital" sounds a little bit off in that British English excessively sophisticated way. But then I realized that most of the time if you're sick the main thing is you're going to the institution to get treatment, not trying to specify which location you are going to. Its almost like if we went around saying "I go to the school" or "I'm in the college" which sounds pretty dumb.
In that American English way, "I'm going to ____" has a different meaning if the following word is a noun or verb. So reading "hospital" as a verb really distorts that meaning. "How does one hospital?" In Texas, "going to" is pretty much always assumed about to go to a place where "going to" as in about to do something is solved by using "fixin' to" ;-)
But isn't the article saying that Macbeth uses the word "the" more frequently than other plays by Shakespeare? In other words it is not the way Shakespeare typically wrote, and that difference gives the play a peculiar flavor
The statistical evidence shows that "the" is used more frequently in Macbeth, but the anecdotal evidence used to connect that fact to the play's spookiness relies on the idea that there are differences in the grammatical use of "the" that are specific to Macbeth.
The fact that "the" happens to be used more often in Macbeth than in other Shakespeare plays seems to me more likely to be noise with no deeper meaning.
Similar to the second quote, English speakers nowadays (at least where I live) say things like "I have the flu." In this phrase, I believe "the flu" refers to the ailment in general as opposed to what affects the speaker in particular. Maybe this is also the case with your second quote?
There's a little too much amateurish analysis here which doesn't even attempt to distinguish between Early Modern English and 21st century English, but ultimately I think the conclusion is interesting and probably correct, that this helps set the tone of the play.
I’m going to be a bit snobbish here: I can’t take the article seriously when it misquotes “Double, double toil and trouble” as “Bubble, bubble toil and trouble”. Not so much because the mistake is egregious (it could make sense if you’re going by ear), but because the article claims to be about the language used, and this is one of the most famous lines in the play.
Sod that mate. The real problem is the weird quotes thing hereabouts. English uses 66 and 99 or this:". We never use double backticks and whatever the other thing is.
US QWERTY keyboards like the author is almost certainly using have the backtick above the Tab key, shared with ~, and it’s commonly used in programming.
I blame Microsoft, as it was their software where I was first introduced to "smart" quotes using the extending characters/glyphs of fonts. I can also tell when importing someone's data that was copy/pasted from an Office type program with its auto"correcting" to smart typography.
I got curious about how all the other plays ranked. For example, which plays used those words the least? So I downloaded the text files from the Folger Shakespeare Library (https://shakespeare.folger.edu/downloads/txt/shakespeares-wo...) and ran this command to get a rough count of "the" and "th'" vs. the total words:
Ignoring the whole log-likelihood stuff and just looking at the simple frequencies, I'm not completely sure that I buy the article's argument. Macbeth does come out on top by my analysis. But some of the other plays seem to use "the" or "th'" nearly as frequently without being particularly creepy. In terms of ratios of the frequencies, Henry V, a history, is only 2.6% lower than Macbeth. And the first comedy, Love's Labors Lost, is just 5.2% lower.
What would be the correct way of going about assessing statistical significance of these frequencies?
Like if we assumed that all English language is generated from a weighted distribution of all words and “the” is 3.5%, is a 4.3% occurrence rate even significant? (And what even would be the base occurrence rate?)
It’s quite something that the Scottish play comes out on top, however. It would work with a hypothesis that, for example, Shakespeare was using this pattern subconsciously, whenever the situation called for an eerie mood.
I’d also be interested in seeing if the 2:1 difference isn’t larger than for other authors?
It might be worth noting that all the other plays that scored anywhere close to the Scottish play (427) are much longer. You have to go down to 17 (344) to get to a shorter play; only 6 (397) and 15 (345) approach it. If we scale by length twice (count/length^2), the contrast becomes more stark
(retaining original order):
with only 17 and 27 breaking 200, and still well shy of 252.
But the real point of the article is that the oddity of "the" in the frequency table attracted their attention to that word, and led them to identify an actual peculiarity in its usage. To say henry-v demonstrates anything similar, you would need to check if usage in that play is similarly peculiar (which I have not done either).
It seems odd to suggest (as some commenters have done) that the difference was subconscious. My null hypothesis is that peculiarities in usage by a professional wordsmith are deliberate. I expect to see actual evidence that the author didn't know what he was up to.
I’m not sure I understand why it’s not only valid but more precise to square the length of the play.
If we assume that length of a play has an influence on the frequency of stop words, shouldn’t we compare samples of each play? (First x pages or y randomly sampled words)
It is an observed phenomenon. That doesn't make it a theory, it makes it a generator of hypotheses. It is another chore to figure out ways to test the hypotheses, and more chores testing them.
After one passes several different tests, it might be worth publishing, along with the list of rejected hypotheses. Then somebody else might identify a test that it fails, and another that could be tested, and might publish that.
Or, more likely, nothing comes of it, and you move on to other phenomena and other hypotheses for them. That's science. It always starts with, "that's odd, I wonder what it means." And, most usually, it seems to just mean "huh."
Macbeth is certainly a creepy play, and certainly there is a lot of creepy language (e.g. Lady Macbeth's "Have pluck'd my nipple from his boneless gums, And dash'd the brains out"). But I never felt that this was due to repetition of particular words, so this was interesting to read about.
That said, although these days it is uncommon to use generic nouns with the definite article ("the"), I understand that this was a lot more common in Shakespeare's day. I wonder if this is more common in Macbeth than in Shakespeare's other plays, whether it was a deliberate choice, and whether Jacobean audiences would have felt the same sense of creepiness.
The article literally says: "So they compared word-usage in Macbeth to Shakespeare’s overall writing" - and found several creepy words in the top 15, also the word "the" occurring more frequently compared to his other plays.
But I agree, we don't know what the real reason was for Shakespear's choice in this case, just that it explains what contributes to the creepiness for the modern readers.
This analysis appears to be of the modern shakespear text. That can differ from "first folio" versions closest to what was actually performed. The modern text has been sanitized, the poetry cleaned up. I wouldnt dig too deep into specific word counts as they are likely more a result of later rewordings. An extra the here or there may have been added by publishers to perfect the meter. That happens sometimes when a play for the stage starts being sold as a poem to be read in drawing rooms.
The First Folio is the only surviving source of the text and more importantly minor edits aren't going to change statistical properties of text much at the counts-of-the-definite-article level.
The opening sentence sort of broke my mind. It reads:
> Macbeth is a creepy play.
But my brain wanted really badly to swap the emphasised word:
> Macbeth is a creepy play.
Edit: Maybe that was the point of the author. A couple of paragraphs later they say “Actors and critics have long remarked that when you read Macbeth out loud, it feels like your voice and mouth and brain are doing something ever so slightly wrong. There’s something subconsciously off about the sound of the play, and it spooks people.”
Macbeth isn't actually being emphasised in that sentence, rather it is using italics to demarcate the title of a work. A music album or movie's name would also appear italicised. (If they've done it right Macbeth should only be in an <i>, or <span class="art-title">, rather than an <em>, though who knows what their CMS allows/is capable of)
To me, this is a very pretentious take. Nothing about the usage of "the" in the examples is unusual or creepy. Saying "an eye" or "my eye" that winks at "an" or "my" hand results in a different meaning. "The" here means a general expression.
That's a really interesting observation and fun analysis. I'm going to reread Macbeth with this in mind and whether or not I agree with all of the articles conclusions its a unique lens to examine the text with.
Cute. I did something similar at a-level with another book for a stats course
Mining data can be surprisingly satisfying way of going back to look at something you liked.
Yeah, that's the chart in the middle of the article of the log likelihood of each word. The higher that score, the more a word is used in Macbeth relative to the rest of his plays. However ... the author blips right over the fact that "our" and "she"--two very common words--both have a higher log likelihood than "the".
I read this a few times, looking for evidence I missed that it's a joke. Apparently not, but I can't be sure. If it's serious, it's possibly the worst article I've ever read.
> But fans of Macbeth often say its freaky qualities are deeper than just the plot devices and characters. For centuries, people been unsettled by the very language of the play.
> Actors and critics have long remarked that when you read Macbeth out loud, it feels like your voice and mouth and brain are doing something ever so slightly wrong. There’s something subconsciously off about the sound of the play, and it spooks people. It’s as if Shakespeare somehow wove a tiny bit of creepiness into every single line. The literary scholar George Walton Williams described the “continuous sense of menace” and “horror” that pervades even seemingly innocuous scenes.
> For centuries, Shakespeare fans and theater folk have wondered about this, but could never quite explain it.
The article claims to explain it - in the play, the word "the" is used a lot!
Um.. gee, in high school it was pointed out to me how constant themes in Macbeth are how unnatural things have become, how everything is strange, qualities/values reversed from normal - fair is foul and foul is fair etc, animals doing weird things, bad omens etc. It never stops, all the way through. I had to go through the play and list how many animals are mentioned, doing strange things. It's constant. People meet and it's not "Lovely day isn't it" but an anecdote about how so-and-so saw something incredibly weird and impossible happen. Over that background is the quickly escalating paranoia and madness of Macbeth & Lady Macbeth. Etc. Can't be bothered writing more, I didn't want to say just "This is total nonsense.", but it is. (Flagged.)
You read the article several times, yet didn't understand it.
As noted in the article, the Scottish play displays other oddities. But this one had not been commented upon before. To demonstrate vacuity, you would need to identify other plays that do not come off as creepy, but use "the" in the way noted. For an academic presentation, we might expect the authors to have checked for it in other plays, but this article was not.
This isn't a data science question. Especially if the data science is blind to meter and to phonetics.