Some years ago I worked on an accessibility project for an app and website designed for people with disabilities. One of the team members had low vision, and used a screen reader that must have been set to 3x or even higher. I usually listen to YouTube and podcasts at 1.5-2x and I could barely understand the audio. He seemed surprised, which indicated to me that 3x+ was the norm for people in his circle.
I wonder if his ability was trained through years of using fast screen readers, vs. a lower visual processing load leads to better audio processing, or some other explanation.
1219 is a record as far as I know. We measured it explicitly by getting the screen reader to read a passage and dividing. I spent months working up from 800 to do it and lost the skill once I stopped (there was a marked level of decreased comprehension post 1000, but I was able to program there; still, in the end, not worth it). When I try to described the required mental state it comes out very much like I'm on drugs. Most of us who reach 800 or so stay there, though not always that fast for i.e. pleasure reading (I do novels at about 400). it's built up slowly over time, either more or less explicitly. I did it because I was in high school doing muds and got tired of not being able to keep up; it took about 6-8 months of committing to turn the synth faster once a week no matter what, keeping it there and dealing with a day or two of mild headaches. Note that for most blind people these days, total synthesis time per day is around 10+ hours; this stuff replaces the pencil, the novel, etc. Others just seem to naturally do it. You have little choice, it's effectively a 1 dimensional interface, so from time to time you find a reason to bump the knob. And that's enough.
Whether and how much the skill transfers to normal human speech, or even between synths, is person-specific. I can't do Youtube at much beyond 2x. Others can. It's definitely a learned skill.
To me, the fact that this does in fact seem to be bidirectional at least some is more interesting than that I can listen fast.
I always wanted to know whether it's true what I heard some years ago: a friend told me that older Danish people are complaining that the younger Danish are harder to understand because there has been a trend in recent decades to swallow consonants even more than in the past
(that's something hard to image for me as a non-Danish speaker :)
Living in Malmö I also find this quite hard to believe! Besides, isn't old people complaining about the younger generation butchering their language something that happens all the time in all cultures?
It was about how Danish linguists were worried about how the language was getting more unintelligible. The example I remember was about the ever-increasing number of words that are pronounced, basically, “lää'e”: läkare, läge, läger, lager... Those are written in Swedish here; some Dane may translate. (In English: healer, situation, camp [laager], layer...) There were more that I don't remember; in all, I think they mentioned at least half a dozen words that had become basically indistinguishable from each other.
I don't think serious linguists have had the same fear about most other languages.
Edit: Hah, just saw your post on talking faster to other people who have the same audio skills.
However most blind people I know who do this start hating audiobooks, start hating talks, and generally by far prefer the text option. audiobooks aren't annoying, but they're below my baud if you will. Net result: boredom/falling asleep to whatever it is and the need to actively make an effort to listen. Some things which require active listener participation--math lectures for example--are different. I guess the best way I can put it is that speed is inversely proportional to the amount of, I guess let's call it active listening, required.
I've given a lot of thought to this stuff, but we don't really have the right words for me to communicate it properly. A neuroscientist or linguist might, but I'm not either of those.
Also, how does your "baud" vary with your familiarity with the ideas? I can't imagine it's independent. As a ~35 year old programmer with a decade of professional experience and a decade of hobby experience before that, I cracked open SICP for the first time and found almost everything familiar. I had digested the ideas from other sources, so I could read at a "natural" rate. If I had read it as a teenager, it would have been a mindfuck, and I would have taken multiple slow readings to understand. When you talk about numbers like 800, are you talking about writing that challenges you and changes the way you think, or are you talking about stuff you do for a living that is just information you're already primed to accommodate?
With movies I don't bother with them unless they have descriptive audio, at which point you've got music, sound effects, and two somewhat parallel speech streams going on. That's high informational content.
I did an entire CS degree at 800 words a minute. I program in any programming language you care to name (including the initial learning) at that speed as well. For more complicated concepts I stay at that speed, but pause after every paragraph or so to chunk the content as needed. I'm doing this thread at that speed. Pretty much the only time I slow it down is pleasure reading or sometimes articles when i want to go off and do chores while I listen, but even then it's still faster than human speech.
In general i think answering these sorts of questions needs research that we don't have to my knowledge. Nothing in my personal experience or background really allows me to give you good definitive answers. The sample size to work with is pretty small and in all honesty there's not a lot of good research around blindness day-to-day in the first place.
That sounds like a similar description to what it's like for people with IQs significantly higher than the average.
espeak -s 800 "Things to say."
I will attempt to remember and find the time to take my demo recording of this on Rust compiler source code that's currently in dropbox and put it up somewhere more permanent. I doubt Dropbox will care for me much if I allow HN-volume traffic to hit my account. It's Espeak using an NVDA fork with an additional voice that some of us like, so vanilla espeak is in the ballpark.
What I don't remember is if vanilla non-libsonic espeak softcaps the speech rate. It might. I believe new versions of espeak integrate libsonic directly, but that old versions just silently bump the speaking rate down if it's over the max. I haven't used command line espeak directly for anything in a very long time.
Libsonic is an optimized library specifically for the use case of screen readers that need to push synths further: https://github.com/waywardgeek/sonic
There is a range slider that maxes as 450, which is the maximum speed according to the manual.
I tried listening to a Wikipedia article at 450, I am so amazed you can comprehend that. Perhaps that's equivalent of me visually scanning the text instead of reading, however when I do that, I tend to focus on interesting parts for long stretches of time. With espeak, how do you focus? Can you pause it at will?
If anyone is curious, here is the NVDA keystroke reference: https://www.nvaccess.org/files/nvdaTracAttachments/455/keyco...
As an interesting sidenote, screen readers have to co-opt capslock as a modifier key, then there's fun with finding keyboards that are happy to let you hold capslock+ctrl+shift+whatever.
I find that the maximum understandable rate varies a lot between speakers. For some speakers 2.5x is possible, but just 1.5x for others.
One advantage synths has, is that they can more easily control the speed at which words are spoken, and the pauses between words independently. When watching/listening pre-recorded content I often find that I'd want to speed up the pauses more than the words (because speeding up everything until the pauses are sufficiently short make the words intelligible).
If someone knows of a program or algorithm that can play back audio/video using different rates for speech and silence, please share.
If so, have you considered using an EQ plugin to maybe turn down the harsher high frequencies a few notches? Just a thought.
But more to the point there is nowhere to really plug that in to a screen reader, so we can't try it anyway. The audio subsystems of most screen readers are much less advanced than you'd think.
I just don't get the point. If you can process content much faster than it was meant to be played, it doesn't mean you're learning much faster than you could, it means the novel information density is low. Any content that can be sped up that much without loss is not worth listening to in the first place. You're just skipping the trite cliches, filler, and obvious facts.
I can read fast, and I typically go through fluffy NYT bestseller nonfiction at 600 WPM. But when I do this I constantly have a sneaking suspicion that I'm just wasting my time. When I read a good book full of new ideas, I barely go at 150 WPM, but the time always feels well-spent.
This isn't "how fast can I go through this" but "what is a comfortable pace"?
So I bump the speed up, though usually fairly modestly: 1.25x - 1.5x is generally enough.
I've noticed that preferred speeds vary tremendously with the quality of the work and speaker -- high-density information and an exceedingly good speaker, and I'll slow down. Slapdash redundant content and poor speaker, I'll speed up.
The degree of polish in the production matters tremendously. I've listened to CPG Grey's YouTube videos (highly polished) and podcasts (a lot of chit-chat with his co-host). The videos work well at normal speed, or perhaps slighly sped up. The podcasts I find nearly unlistenable, though they improve at much higher speeds (1.75x - 2x).
It undoubtedly has a relatively low density of information however I wouldn't speed it up, I would slow it down!
Tip there are 10s of chrome extentios that allow you to change all html videos to 2x speed or even higher (including most ads on sites like YouTube and Hulu)
It is like the infromation doesn't have the time to settle in my memory, despite me understanding it.
It's maybe because when things are slow, I can use the dead time to think about the implication/corner cases of what's being said.
For some reason I do not yet understand, the motion of physically putting pen(cil) to paper helps ingrain that information into your brain, in a way that typing it into a computer does not.
My personal guess is that, with fast typing speeds, it's too easy to just copy things word for word. With writing, you have to at least rephrase it and reorganize it to fit the notes reasonably on the page, which forces some processing to occur. I take notes solely by typing, but I only have retention if I do it slowly and reflectively.
In other words, we can "store" knowledge more efficiently when written in our own handwriting than when typed into a neutral/generic text editor.
Although, anecdotally, I find that writing notes by hand is better for recall than typing.
I'm not sure if this is because writing activates different parts of the brain, or simply because writing is slower and forces me to therefore think harder and comprehend better because I need to be 2x as selective about what makes it into my notes.
My money's on the latter, but who knows?
I can roughly keep up in real time with speaking on the keyboard. Even though speaking is a bit faster than my max rate around 90 wpm, by dropping filler words and gaps I can mostly avoid summarizing and quote verbatim. When transposing at full speed like that, I feel like a conduit at times, very little sticks deeply in memory.
Contrast with writing notes, where I'll only write down things I find particularly important. Most of the time I'm just trying to actively listen.
Purely anecdotally, I feel like when I am typing and summarizing to the same compression ratio as writing, my retention with typing is better, because I can do it without looking, it's faster, and I can tune back in to the speaker.
On the other other hand, with a laptop open, I'm much more likely to get distracted with emails/tasks.
Bottom line, I think we need to study more axes of this problem, if only because it gives neat insights into cognition.
Just spending time in the moment with an enjoyable story is not wasting it.
I think I have a similar thing visually, where above a certain text size, I have a very hard time reading. The optimal text size for me is pretty small, even on a standard resolution display.
While I could get my comprehension percentage quite high with a bit of training, I lost all connection to the characters and story, stopped imagining the scenes and felt like the reading the book was a waste of time.
Novels should be read at a natural pace to give room to your imagination and dive into the story. You can still quickly scan over boring/repetitive filler text, but I did that without caring about WPM already.
With other things like textbooks / articles / reports cranking up your WPM and applying your attention more selectively by focusing on or re-reading critical parts is a very helpful skill though.
I 'read' your comment using TTS at 3x. What does that say about the information density of your comment?
(Little to nothing. TTS at that speed is still marginally slower that I normally read with my eyes. Human speech is generally much slower than is necessary to be understood.)
What doesn't make sense to me is consuming things that have a _set_ speed, such as video/TV or lectures, at a dramatically faster speed.
You could just read the plot synopsis or watch the highlights, but sometimes those don't convey build-up, suspense, or other data that are hard to losslessly compress.
Being comfortable with the "boilerplate" of a given medium or genre usually lets you skim or skip it to jump right into the good stuff.
"That's just, like, your opinion, man."
Obviously the people who are getting something out of it, otherwise they wouldn't do it?
"Don't yuck my yum."
I do the same, but with Hacker News comments :)
Every couple of months, take a moment to reflect on your comprehension. Is it currently easy for you to understand the audio? If yes, then crank it up a little bit until it's noticeably more difficult. Repeat this process periodically over a year or so and before you know it, it'll be set pretty damn quick.
There are also extensions of course.
I find I can typically listen at 1.25 - 1.75 readily. Exceptionally poor content I'll bump to 2x. I don't generally go much above that unless I'm fast-forward scanning video for specific content.
My theory/experience with this phenomenon is, that a speech synthesizer never makes any errors. When it pronounces a word, it will do so exactly the same way everytime the same word comes up. So the learning effect after a while is a bit higher then when you listen to a human. Humans will always have slight variation in how they pronounce the same word. So, as I understand it, you can "learn" to listen to your speech synthesizer on a fast rate more effectively then you would be able to listen to a fast human speaker.
And yes, I also listen to YouTube talks and audiboosk at about 1.5-2x rate. So I guess 80 bits per second is relatively easily doable for the receiver.
I think anyone can get to 3x but it takes some time to adjust to faster and faster speeds. It also depends on what you are doing while listening. Distractions or listening while doing something else (driving for example) lowers my ability to comprehend. For example on the interstate without much traffic I'll listen to audiobooks at 3x, but in a city or a crowded highway I have to slow it down.
If it's a technical talk or something I'll still pause often too reflect on what was said, but I can hear full sentences just fine at >3x with my eyes closed.
Consider what you quote:
> we’re likely limited by how quickly we can gather our thoughts
Now the amount of relevant info on a screen is typically small enough that a sighted person can zero in on it at a glance and perhaps just click a button without thinking.
I.e. the amount of info that deserves "gathering our thoughts" is typically very small. So if that is the bottle-neck, your colleague can keep cranking up the audio speed until low-level processing audio becomes the bottle-neck, which is a regime that sighted people never deal with even, not even the nerds who speed up their Joe Rogan podcasts.
Sometimes it's worth slowing down to 1.5x to give myself a bit of time to process the ideas, though slowing below that sometimes hurts comprehension.
Side note: I find that YouTube in Chrome has the best pitch-preserving time stretching filter, and I've neglected all this time to figure out what exactly they use to accomplish that. I'd love to add that to mpv, if it's not already there.
Personally when I did this I feel irritated when I speak because my sped-up audiobooks have conditioned me into thinking I should be speaking at that rate, but it's just not possible for my mouth and tongue to move that fast physically.
The deduction that is quoted does not follow: speeding up audio recordings with 120% results pressing both the auditory system as the language and thought systems (or any other potential bottleneck) to be sped up proportionally since it's a pipeline.
Similarily the posted article (I have yet to read the original one) states in the title that "human speech" has a universal transmission rate, but the research tested reading not speech, so this may or may not be true.
Perhaps the bottleneck is human speech, with the side effect that listening is never trained beyond the typical speech rate limit. (in this case the higher speed syllable languages would be easier to pronounce fast, and the lower speed ones harder to pronounce fast)
Perhaps the bottleneck was in the visual burden of reading, a language that encodes more bits per syllable implies more types of syllables, which irrespective of size or number of characters puts a classification demand on the visual system (classifying a symbol coming from a set of only 2 symbols will be easier, but will require more classification instances than classifying from a large set of characters but with fewer classification instances).
Perhaps the bottleneck was again in speech during reading by subconscious vocalizing of the text.
Perhaps the bottleneck was in the auditory "speech to syllable" classification.
Perhaps the bottleneck was in parsing text.
Perhaps the bottleneck was in "accessing thoughts" etc.
So it is rather hard to identify where the bottleneck is located without having a means of detecting where in the brain the "incoming queue is full" vs "incoming queue is waiting" during speaking, listening, reading. And which of these 3 causes this universal bottleneck (since I gave 2 examples of how an apparent bottleneck in reading could stem from not being trained beyond a possible universal bottleneck in speaking rate...)
There is no shortage of people training to receive a lot of information at once, and 39 bits per second seems to me on the lower end of what some video games require but in terms of constructed, linguistic output? They may be on to something there.
Fast chatters are not faster thinkers. I have yet to see people exchanging thought at a higher rate then usual.
While I'm sure his visual cortex picked up some slack, I'm willing to bet it's mostly just through training. We just aren't trained for faster communication. I've known blind people and they are the same way with their readers.
I also watch most video at 2/3 times the speed since the skills seem transferable.
What do i win???
And yes, I know information theory. It's language that these folks - many of them prominent and celebrated within their utterly normalized professions, just like in the days of phrenology - are fundamentally mistaken about. What quantity of information do you think there is in the word "trump," for instance? Is it the same over time, to bring up just one feature of how this funny thing called context informs human speech?
Wittgenstein's Philosophical Investigations is a good place to start if anyone's interested in understanding this issue.
I think it's you that has missed the point. Syllables have a very loose correlation to information. So great; we can stream out 39bits worth of syllables / second. In what way does that describe how information dense those syllables are? Context matters here.
Jokes aside, I agree that estimating the average absolute information content of a syllable seems pretty absurd.
However, if the primary goal here was to determine whether some languages convey more information per unit time than other languages, I think the authors did fine. To this end, they needn't define information per syllable in anything other than p.d.u. - procedurally defined units. If average Vietnamese speech has 2x the number of syllables/min as German, but it takes the same amount of time to recite War and Peace in both Vietnamese and German, it suggests that both languages convey the same high-level information 'per unit time', but not 'per syllable'.
And basically that's all they did... "We computed the ratio between the number of syllables [in the text passage] and the duration [it took to recite the passage]"
You clearly don't know linguistics though because the idea that a word conveys a constant quantity of information is hilarious.
Tor Norretranders book, The User Illusion, mentions some of the research:
W R Garner and Harold W Lake "The Amount of Information in Absolute Judgements" - Psychological Review 58 (1951) - they attempted to measure people's ability to distinguish stimuli (such as light and sound) in bits. Result: 2.2 to 3.2 bits per second.
W E Hick "On the Rate of Gain of Information" - Quarterly Journal of Experimental Psychology 4 (1952) - this experiment measured how much information a person could pass on if they acted as a link in a communication channel. That is, faced with a series of flashing lights, subjects had to press the right keys. Result: 5.5 bits per second.
Henry Quastler "Studies of Human Channel Capacity" - Information Theory, Proceedings of the Third London Symposium (1956). Measured how many bits of information are expressed by a pianist while pressing keys on a piano. Result: 25 bits per second.
J R Pierce "Symbols, Signals and Noise" (Harper 1961) - used experiments involving letters and symbols. Result: 44 bits per second.
Discussion of the research, Tor Norretranders book, and what the research may have missed here:
Glad this paragraph was in the article, clears up their methodology. I wonder if it applies to writing too, or if skilled writers work faster.
> researchers took their final step—multiplying this rate by the bit rate to find out how much information moved per second
Thank you for your explanation, worth a bag of gold!
Being "verbose" means that each letter you type communicates fewer bits of information. If the bottleneck is putting ideas together then you would expect someone writing in a more verbose language to type more letters per minute but still take a similar amount of time to communicate the idea.
In practice most Java programmers are using IDEs with good auto-completion, though, so aren't actually needing to type as many letters as you'd think.
This raises the question: if the IDE autocompletes the boilerplate for you, and also hides it, why is it needed in the first place?
The latter can trivially be used to output the former. The conclusion is obvious; some of these formats are objectively more verbose that others while having equivalent expressive power.
Two qualifying remarks.
1) The 'about the same' is important. Even in their data, there is still quite some variance. They found an average of 39bits, with a stdev of 5. That means that about 1/3 of the data falls outside of the range of 34-44bits.
2) Which brings me to the the uniform information density (UID) hypothesis. According to the UID, the language signal should be pretty smooth wrt how information is spread across it. For many years, the UID was thought to be pretty absolute: Even across a unit like a sentence, it was thought that information will spread pretty evenly. Now, there is an increasing amount of research that shows that esp. in spontaneous spoken language, there is a lot more variance within in the signal, with considerable peaks and troughs spread across longer sequences.
Also, can you explain more about how the information density was calculated? Anything at the bit level seems crazy small to me. Words convey a lot of information. They cause your brain to create images, sounds, emotions, smells, etc. I guess we're calling language a compression of that? But even still, bits seems small.
(see edit below; but i leave this up; it might be interesting, also)
you mean that even for smaller sequences, the UID holds, right? the assumption was that even for a single sentence, there are a lot of ways to reduce or increase information density so that you get a smoother signal. e.g.: "It is clear that we have to help them to move on.", you could contract it to "it's clear we gotta help them move on" and contract it even further in the actual speech signal ('help'em'). or you could stretch it: "it is clear to us that we definitely have to help them in some way to move on", or alike. the assumption was that such increases / decreases would even be done to 'iron out' the very local peaks and troughs, particularly in speech.
bits: yeah, that took me a while to get used to, as well. the authors used (conditional) entropy as a way to measure information density (which is a good measure in this instance imv). and bits is just per definition the unit that comes out of information theoretical entropy: https://en.wikipedia.org/wiki/Entropy_(information_theory) . btw: while technically possible, i don't think that the comparison in the summary article between 39 bits in language and a xy bit modem is a helpful comparison. bits in the context of entropy are all about occurence and expectation in a given context. bits of a modem/in CS, they represent a low level information content for which we do not check context and expectation.
edit: ah, i realise you are asking why most in our community assumed that this universal rate applied across languages, right?
i guess the intuition was that all of us humans, no matter what language we speak, use the speech signal to transmit and receive information and that all of us have the same cognitive abilities. so the rate at which we convey information should be about the same. sure, there are probably differences according to some factors (spoken vs written language, differences in knowledge between speakers, etc.). but when the only factor that differs is English vs Hausa, esp. in spontaneous spoken language, then the information rate should be about the same.
This is entirely non-intuitive to me. I would think with language evolving that some would be faster than others. If language starts as conveying extremely simple thoughts then it should take longer to convey certain things. I would then assume that as the language develops it gets better at conveying ideas. I would think that thoughts could go much faster than how we process it with language. Like I have constant thoughts that are really fast and can be complex. There's no internal dialogue there. But when I think with an internal dialogue it is much slower.
And the rate wouldn't have to be the exact same value for each individual, so long as the brain can attune its specific value to other reference points to time in nature.
What this guy told me is that it's just take time to adjust to it. So I basically started to listen for books at slightly higher speed. Then I gradually increased it and in a few days I could handle 2.0x speed no problem while listening for really complex fantasy (Malazan Book of the Fallen ). After two weeks I could handle 2.5x without a problem.
In the beginning it was harder to comprehend at high speed while walking or crossing the street since I lost attention, but in a few months I could do anything while listening without missing any information or emotions of narrator.
To give an example of how far this can go. This spring I was listening for The Expanse audiobook  at 4.0x speed. With some effort I could go even faster for like 5.x in case of these particular books, but obviously can not keep up for long.
I still usually listen books at 2.0-3.0x depend on narrator and quality of audio and this skill dont go away even if I have extended time between books like a month or so.
UPD: Edit. s/can keep up/can not keep up/
I have the opposite problem where I have trouble paying attention to an audiobook at 1x. I get bored in between words and my mind wanders making it very difficult to keep track of what is being said (as in I hear individual words but have trouble keeping sentences in memory when everything comes too slow)
I wish I had realized this in university and had been able to somehow record and playback lectures at 2x. I always got so little out of lectures because the information wasn't coming in fast enough for me to process correctly.
I don't really use audible, but if you looking for good audio player on Android here is one that can do this:
I was always curios to make actual research / paper on this kind of thing, but as non-scientist I simply have no time to do so. So I happy someone actually doing it.
Concentration is crucial here.
I only finished Malazan Book of the Fallen, first two Tales books and all The Path to Ascendancy books. Also started Forge of Darkness, but was too preoccupied with my life to finish it.
Honestly Esselmont books are just weaker overall. The Path to Ascendancy was much better, but 3rd book is just too rushed.
> I would be curious of the story still makes sense to you by the end when listening at that speed.
Speed have no effect on story at all. Basically after you practice it for a bit you even get every emotion narrator trying to put into his speech.
As for the story in general it's make more and more sense closer you get the the end. It's masterfully crafted world with great theme of compassion and even though I finished it more than a year ago I still have flashback or two from time to time since I loved some of characters. Malazan is certainly one of my favorite book series.
Yet keep in mind there is abundance of information and events as well as unreliable narrators which can confuse your view of story lines.
Malazan quickly became my favourite book series (and I am not even a fan of fantasy). It was hard initially. But it gets better.
However, I think that re-read is a must if you want to fully grasp the whole thing.
> Why would you want to do that though?
I totally get it when some people just love to read books slowly while enjoying their coffee or looking at nature, but I'm into books for the stories and format of fast-paced audio is fine for me.
> Isn't the experience of listening to it the point? If not, why listen to it at all instead of reading a detailed summary?
On other side detailed summaries are not the same thing that author designed, but someone else rehearsal which is usually far from perfect.
> In parallel, from independently available written corpora in these languages, we estimated each language’s information density (ID) as the syllable conditional entropy to take word-internal syllable-bigram dependencies into account.
But the experiment uses the same text translated into each language! Why introduce this extra variable (and source of error) of estimated language-wide information density, if you are controlling your experiment such that you have the exact same information encoded in each language? That is to say, why use an _estimated_ information density when you could measure it exactly for the texts that are being spoken? Or, conversely, why go to all the trouble of having the speakers read the same text translated into each language, if you aren't going to make use of that symmetry?
In the paper they want to know how much information is in a syllable in context. To do that they need to know the probability of each syllable given the previous syllable. To estimate that probability distribution, you need to look at a lot of text, much more than just the passages that the authors used to measure speech rate.
I suppose that the experiment wants to capture the actual 'information density' of the language, and hence looks at the full language. Then, they want to avoid any modification in speech rate due to the semantics of the spoken text.
This does not make sense for a hypothesis where the actual bit-rate of speech tends towards 39 b/s. That is, when your text happens to convey more bits, you slow down.
However, for an alternative hypothesis, this design does make sense. The idea here is that a language naturally converges to a speech-rate that gives 39 b/s. The idea here is that the actual speech-rate is much more constant, and just drops until it becomes too fast. For that, I'd argue you don't want the mean bit-rate but something like the 90th percentile bit-rate. Because it seems to me that speech-rate that is 'too fast' more than 10% of the time would not really be natural.
That said, we should be aware that a tech nerd audience will find simple answers to complex non-tech questions appealing, and we should not over-estimate our understanding here just because we have a number.
There is a large amount of data transmitted through sub-communication and context, particularly during an in-person interaction, which is what people are wired for.
Overall tone, body language, eye contact, and various social cues make up the bulk of data being transferred in many interactions. There's a reason why talking to some people feels exhausting and others invigorating, and it's not just the transcript.
- There is a measurable difference in information density based on the sex of the speaker
- Syllables were chosen as the base unit of measurement because morphemes (words) are too big/linguistically varied and phonemes (sound equivalent of letters) are too small and likely to be dropped in regular speech. I'd like to see the same analysis using phonemes to see how it changes, especially between dialects.
Every conversation acts as its own handshaking algorithm from which context is derived, and contexts will vary greatly in terms of amount of language required to convey concepts.
Jargon rich conversations between experts have the potential to transfer information at a rate far greater than average.
Which is kind of neat. Thank you Claude Shannon!
I feel like the whole "bits" calculation is a neat way to get into the media, but not actually related to "information density".
Edit: Been informed I'm deeply ignorant on Information Theory.
The field of Information theory effectively began with Claude Shannon. The same formalisms he developed are used outside computer science--linguistics, physics, microbiology, etc.
I guess if we consider non-real time communication, but in that case (e.g. in English, which is limited by the medium's rate) the reception rate is the main factor, which is probably not too far off the transmission rate.
I'd say Ithkuil is designed for information density and my guess is its actual max rate is pretty similar to the submission's 39bps.
 IIRC not even its creator is a fluent speaker.
>Language is universal, but it has few indisputably universal characteristics, with cross-linguistic variation being the norm. For example, languages differ greatly in the number of syllables they allow, resulting in large variation in the Shannon information per syllable. Nevertheless, all natural languages allow their speakers to efficiently encode and transmit information. We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties results in languages encoding similar information rates (~39 bits/s) despite wide differences in each property individually: Languages are more similar in information rates than in Shannon information or speech rate. These findings highlight the intimate feedback loops between languages’ structural properties and their speakers’ neurocognition and biology under communicative pressures. Thus, language is the product of a multiscale communicative niche construction process at the intersection of biology, environment, and culture.
As I understand it, the 31 bit/s transmission rate was chosen because it is close to the entropy that operators can generate by typing on their keyboards. PSK31 does not transmit 8 bit bytes, but instead uses what they call a Varicode, a kind of Fibonacci code. More frequent characters are encoded using fewer bits, thus the encoded bit rate is an approximation of the entropy in the text stream.
Some highlights from the introduction that are relevant:
Broadly speaking, two approaches to measuring the information rate of speech exist: the linguistic approach, and the acoustic approach. The linguistic approach describes speech as a sequence of discrete perceptual units such as phonemes, words, or sentences.
Taking the average talking speed as 12 phonemes per second , and using the English phoneme probabilities tabulated in , the lexical information rate is approximately 50 b/s. When the dependencies of the phonemes are accounted for the rate will be decreased further. The lexical information rate does not include information about
talker identification, emotional state, and prosody. However, these variables vary relatively slowly in time and contribute little to the overall information rate. As an example,  estimated that the total amount of talker-specific information (e.g., age, accent, sex) was of the order of 30 bits
I listen to most non-entertainment videos / podcasts at 1.5x and I would say about 80% of them are completely ok to comprehend from a "can I easily listen to this without struggling to figure out what they are saying at a language level". But as soon as I try 2x, that drops down to maybe 30-50% because the person speaking isn't speaking clear enough, they have an accent that overpowers any type of comprehension ability on my part or the audio quality is too poor and it introduces too many artifacts.
Sometimes I ask my friends to watch the same video at 2x to see if they can comprehend it and often times they can't (but sometimes they can). We're all in the same area.
I generally find a neutral accent and very clear annunciation helps the most. I've had a bunch of people say they've watched my videos / courses at 2x without issues because apparently I have no accent which is something I've heard from a number of people in different countries where English isn't their native language. I find it interesting because I've also heard a decent amount of people say I speak very fast at 1.0x speed, so I do believe accents and annunciation has at least some role in this.
Does anyone know of any software where you can feed it an English audio sample and it spits back the number of syllables per second? Seems like a pretty cool potential ML project.
I'm not trying to plug my channel but here's my latest public video from the other week: https://www.youtube.com/watch?v=Kq_khHWovl4
I do believe audio quality plays a -huge- role in this.
For comparison, here's the most recent Railscasts video from a few years ago by another screencast author: https://www.youtube.com/watch?v=urPi4qZJeOE
I can deal with him at 2x but it's mentally taxing because his audio has a metallic wispy sound at that speed and it makes his words sound blended together. I think he also talks slightly slower than me at 1.0x as well, so it's not just base talking speed. Does anyone else notice that metallic sound too?
Here's another sample of Joe Rogan and Bill Burr on a podcast: https://www.youtube.com/watch?v=cS1KWv0das8
Listening to them talk at 2x feels like a joy. They are talking a little slower since it's a casual conversation but the audio is crystal clear and both of them have very good word annunciation (not surprising since they are on stage talking for a living).
That said, I couldn't comprehend you well at all at 2x speed (using the Youtube controls). This might have just been to distortion caused by the Youtube player on my computer, I'm not sure. At 1.75x you were still very clear, though I suspect at that rate I would find myself pausing the video now and then to think about what you were saying.
Were you able to listen to Joe Rogan's podcast at 2x? Skip somewhere in the middle and listen for 15 seconds maybe.
You are right in that I speak slower in that video. Most of my more recent Youtube videos are unscripted so I'm just thinking about things with zero preparation, where as I script my courses word for word (which leads to faster speaking generally) but I don't have any course videos with the same audio equipment to compare side by side.
I didn't read the paper but I wonder what they classify comprehension as. Personally I wouldn't listen to hardcore technical things at 2x because understanding the words isn't usually the goal of listening to it. It would be to fully absorb and understand what you're listening to so you can apply it on your own later. There's a big difference between a mechanical understanding of the words and really "getting" what you're listening to.
I typically reserve 2x for listening to tech talks where my goal is to get a high level overview of something quickly.
Hmm, I skipped around to a few different spots. About half the time it was intelligible at 2x, but as soon as they started speaking faster it became a garble. Occasionally they would speak fast enough that I couldn't catch it even at 1.75x. So I'd say they have a lot more variability in their pacing than you do.
Of course, the information rate is a lot less, since there are fewer than 8 bits of information per character in English. The paper says "from 4.8 bits per syllable for Basque to 8.0 bits per syllable for Vietnamese" and there are multiple characters per syllable. So the typing information bit rate is probably somewhere around 10 to 15 bits/second.
Thus, if you're choosing a language to communicate in on the basis of how fast it is to get an idea across, English and French are likely your best choices! (Among languages in the survey.)
I doubt there is an objective way to ensure that no information is lost or smuggled in when you translate a text into another language. For example, English has more than 100 words for 'walk', whereas Toki Pona (a constructed language known for its extreme simplicity) has only one. But does 'stroll' encode the same amount of information as 'tawa'? Depends on what you want to use that information for, I guess. If you only want to know where I went this morning, they are equally good. If you want to know that my act of going there was a recreational activity, possibly part of my morning routine, they are not.
how can you encode 643 syllables using 5 bits? same for 6949 syllabes/7 bits?
> No matter how fast or slow, how simple or complex, each language gravitated toward an average rate of 39.15 bits per second
So does this mean that we are "understanding" only those 39 bits of syllables per second, or more like we are using those 39 bits to index something like an internal address space?
And if the latter is the case, how big would that address space be?
It would also be cool to see this complemented with the data rate (bits/second) of emotion communicated per second and see if that increases the total effective rate of communication between people.
So then, does it mean we "understand" 46.8 bits of information, or that we are using those bit to address some other, maybe bigger/more complex or detailed, memory space?
As well, the 'amount of information' conveyed depends on the environment, and the preparation of the speaker and the listener. Some speakers (not to mention any names) spew a lot of BS ( not information) to sort through.
I'd argue that, in a medical environment, the word 'sponge' conveys less information than the word 'ebola'.
For the prepared, the cascade of necessary reactions in the brain can take little or much time to process. Some authors/speakers (authoritative) can pack -a lot to think about- in a few words. Like 'Where is everybody?'
On the flip side, I found noted radio show host Diane Rehm to be virtually unlistenable her rate of speech is sooooo incredibly slow. Her guests sound like they are all at 2x speed compared to her.
I'm pretty sure that this number changes as we age and our processing faculties gear up and then down.
Her pattern of speech is in general among the slowest I've ever heard -- sometimes approaching single digit words per minute. She's not always so slow, there's interstitials and other moments when her speaking is just kind of slow not unbearable.
Despite my personal feelings, I think her pace of speaking is part of what her appeal was. After being assaulted by other media all day, her show can also be a very relaxing listen and was nearly always a very intelligent conversation.
1 - https://www.youtube.com/watch?v=SqzfsKMaLqk
So more like approximately 1.7 to 3.4 bits/s.
Therefore the transmission rate will simply be proportional to the time to read the story. This idea contradicts what their study found, no?
Language is symbolic, all words are pointers. Whether you collapse complexity through an Apollonian use of religious icons or through initialisations and acronyms likely matters little.
I expect some people are capable of some small multiple of this average, but probably not anything seriously dramatic.
(Of course there seems to be no lower bound, as in the case of involuntary stuttering.)